mvpa2.clfs.stats.match_distribution

mvpa2.clfs.stats.match_distribution(data, nsamples=None, loc=None, scale=None, args=None, test='kstest', distributions=None, **kwargs)

Determine best matching distribution.

Can be used for ‘smelling’ the data, as well to choose a parametric distribution for data obtained from non-parametric testing (e.g. MCNullDist).

WiP: use with caution, API might change

Parameters:

data : np.ndarray

Array of the data for which to deduce the distribution. It has to be sufficiently large to make a reliable conclusion

nsamples : int or None

If None – use all samples in data to estimate parametric distribution. Otherwise use only specified number randomly selected from data.

loc : float or None

Loc for the distribution (if known)

scale : float or None

Scale for the distribution (if known)

test : str

What kind of testing to do. Choices:
‘p-roc’

detection power for a given ROC. Needs two parameters: p=0.05 and tail='both'

‘kstest’

‘full-body’ distribution comparison. The best choice is made by minimal reported distance after estimating parameters of the distribution. Parameter p=0.05 sets threshold to reject null-hypothesis that distribution is the same. WARNING: older versions (e.g. 0.5.2 in etch) of scipy have incorrect kstest implementation and do not function properly.

distributions : None or list of str or tuple(str, dict)

Distributions to check. If None, all known in scipy.stats are tested. If distribution is specified as a tuple, then it must contain name and additional parameters (name, loc, scale, args) in the dictionary. Entry ‘scipy’ adds all known in scipy.stats.

**kwargs :

Additional arguments which are needed for each particular test (see above)

Examples

>>> from mvpa2.clfs.stats import match_distribution
>>> data = np.random.normal(size=(1000,1));
>>> matches = match_distribution(
...   data,
...   distributions=['rdist',
...                  ('rdist', {'name':'rdist_fixed',
...                             'loc': 0.0,
...                             'args': (10,)})],
...   nsamples=30, test='p-roc', p=0.05)