mvpa2.algorithms.group_clusterthr.GroupClusterThreshold¶

class
mvpa2.algorithms.group_clusterthr.
GroupClusterThreshold
(**kwargs)¶ Statistical evaluation of grouplevel average accuracy maps
This algorithm can be used to perform clusterthresholding of searchlightbased group analyses. It implements a twostage procedure that uses the results of withinsubject permutation analyses, estimates a per feature cluster forming threshold (via bootstrap), and uses the thresholded bootstrap samples to estimate the distribution of cluster sizes in groupaverage accuracy maps under the NULL hypothesis, as described in [R3].
Note: this class implements a modified version of that algorithm. The present implementation differs in, at least, four aspects from the description in that paper.
 Cluster pvalues refer to the probability of observing a particular cluster size or a larger one (original paper: probability to observe a larger cluster only). Consequently, probabilities reported by this implementation will have a tendency to be higher in comparison.
 Clusters found in the original (unpermuted) accuracy map are always included in the NULL distribution estimate of cluster sizes. This provides an explicit lower bound for probabilities, as there will always be at least one observed cluster for every cluster size found in the original accuracy map. Consequently, it is impossible to get a probability of zero for clusters of any size (see [2] for more information).
 Bootstrap accuracy maps that contain no clusters are counted in a dedicated sizezero bin in the NULL distribution of cluster sizes. This change yields reliable clusterprobabilities even for very low featurewise threshold probabilities, where (some portion) of the bootstrap accuracy maps do not contain any clusters.
 The method for FWEcorrection used by the original authors is not provided. Instead, a range of alternatives implemented by the statsmodels package are available.
Moreover, this implementation minimizes the required memory demands and allows for computing large numbers of bootstrap samples without significant increase in memory demand (CPU time tradeoff).
Instances of this class must be trained before than can be used to threshold accuracy maps. The training dataset must match the following criteria:
 For every subject in the group, it must contain multiple accuracy maps that are the result of a withinsubject classification analysis based on permuted class labels. One map must corresponds to one fixed permutation for all features in the map, as described in [R3]. The original authors recommend 100 accuracy maps per subject for a typical searchlight analysis.
 It must contain a sample attribute indicating which sample is
associated with which subject, because bootstrapping average accuracy
maps is implemented by randomly drawing one map from each subject.
The name of the attribute can be configured via the
chunk_attr
parameter.
After training, an instance can be called with a dataset to perform threshold and statistical evaluation. Unless a singlesample dataset is passed, all samples in the input dataset will be averaged prior thresholding.
Returns: Dataset :
This is a shallow copy of the input dataset (after a potential averaging), hence contains the same data and attributes. In addition it includes the following attributes:
fa.featurewise_thresh
Vector with featurewise clusterforming thresholds.
fa.clusters_featurewise_thresh
Vector with labels for clusters after thresholding the input data with the desired featurewise probability. Each unique nonzero element corresponds to an individual superthreshold cluster. Cluster values are sorted by cluster size (number of features). The largest cluster is always labeled with
1
.fa.clusters_fwe_thresh
Vector with labels for superthreshold clusters after correction for multiple comparisons. The attribute is derived from
fa.clusters_featurewise_thresh
by removing all clusters that do not pass the threshold when controlling for the familywise error rate.a.clusterstats
Record array with information on all detected clusters. The array is sorted according to cluster size, starting with the largest cluster in terms of number of features. The array contains the fields
size
(number of features comprising the cluster),mean
,median
, min``,max
,std
(respective descriptive statistics for all clusters), andprob_raw
(probability of observing the cluster of a this size or larger under the NULL hypothesis). If correction for multiple comparisons is enabled an additional fieldprob_corrected
(probability after correction) is added.a.clusterlocations
Record array with information on the location of all detected clusters. The array is sorted according to cluster size (same order as
a.clusterstats
. The array contains the fieldsmax
(feature coordinate of the maximum score within the cluster, andcenter_of_mass
(coordinate of the center of mass; weighted by the feature values within the cluster.
Notes
Available conditional attributes:
calling_time+
: Noneraw_results
: Nonetrained_dataset
: Nonetrained_nsamples+
: Nonetrained_targets+
: Nonetraining_time+
: None
(Conditional attributes enabled by default suffixed with
+
)References
[R3] (1, 2, 3) Johannes Stelzer, Yi Chen and Robert Turner (2013). Statistical inference and multiple testing correction in classificationbased multivoxel pattern analysis (MVPA): Random permutations and cluster size control. NeuroImage, 65, 69–82. [R4] Smyth, G. K., & Phipson, B. (2010). Permutation Pvalues Should Never Be Zero: Calculating Exact Pvalues When Permutations Are Randomly Drawn. Statistical Applications in Genetics and Molecular Biology, 9, 1–12. Methods
Initialize instance of GroupClusterThreshold
Parameters: n_bootstrap : int, optional
Number of bootstrap samples to be generated from the training dataset. For each sample, an average map will be computed from a set of randomly drawn samples (one from each chunk). Bootstrap samples will be used to estimate a featurewise NULL distribution of accuracy values for initial thresholding, and to estimate the NULL distribution of cluster sizes under the NULL hypothesis. A larger number of bootstrap samples reduces the lower bound of probabilities, which may be beneficial for multiple comparison correction. Constraints: value must be convertible to type ‘int’, and value must be in range [1, inf]. [Default: 100000]
feature_thresh_prob : float, optional
Featurewise probability threshold. The value corresponding to this probability in the NULL distribution of accuracies will be used as threshold for cluster forming. Given that the NULL distribution is estimated per feature, the actual threshold value will vary across features yielding a threshold vector. The number of bootstrap samples need to be adequate for a desired probability. A
ValueError
is raised otherwise. Constraints: value must be convertible to type ‘float’, and value must be in range [0.0, 1.0]. [Default: 0.001]chunk_attr :
Name of the attribute indicating the individual chunks from which a single sample each is drawn for averaging into a bootstrap sample. [Default: ‘chunks’]
fwe_rate : float, optional
Familywise error rate for multiple comparison correction of cluster size probabilities. Constraints: value must be convertible to type ‘float’, and value must be in range [0.0, 1.0]. [Default: 0.05]
multicomp_correction : {bonferroni, sidak, holmsidak, holm, simeshochberg, hommel, fdr_bh, fdr_by, None}, optional
Strategy for multiple comparison correction of cluster probailities. All methods supported by statsmodels’
multitest
are available. In addition,None
can be specific to disable correction. Constraints: value must be one of (‘bonferroni’, ‘sidak’, ‘holm sidak’, ‘holm’, ‘simeshochberg’, ‘hommel’, ‘fdr_bh’, ‘fdr_by’, None). [Default: ‘fdr_bh’]n_blocks : int, optional
Number of segments used to compute the featurewise NULL distributions. This parameter determines the peak memory demand. In case of a single segment a matrix of size (n_bootstrap x nfeatures) will be allocated. Increasing the number of segments reduces the peak memory demand by that roughly factor. Constraints: value must be convertible to type ‘int’, and value must be in range [1, inf]. [Default: 1]
n_proc : int, optional
Number of parallel processes to use for computation. Requires
joblib
external module. Constraints: value must be convertible to type ‘int’, and value must be in range [1, inf]. [Default: 1]enable_ca : None or list of str
Names of the conditional attributes which should be enabled in addition to the default ones
disable_ca : None or list of str
Names of the conditional attributes which should be disabled
auto_train : bool
Flag whether the learner will automatically train itself on the input dataset when called untrained.
force_train : bool
Flag whether the learner will enforce training on the input dataset upon every call.
space : str, optional
Name of the ‘processing space’. The actual meaning of this argument heavily depends on the subclass implementation. In general, this is a trigger that tells the node to compute and store information about the input data that is “interesting” in the context of the corresponding processing in the output dataset.
pass_attr : str, list of strtuple, optional
Additional attributes to pass on to an output dataset. Attributes can be taken from all three attribute collections of an input dataset (sa, fa, a – see
Dataset.get_attr()
), or from the collection of conditional attributes (ca) of a node instance. Corresponding collection name prefixes should be used to identify attributes, e.g. ‘ca.null_prob’ for the conditional attribute ‘null_prob’, or ‘fa.stats’ for the feature attribute stats. In addition to a plain attribute identifier it is possible to use a tuple to trigger more complex operations. The first tuple element is the attribute identifier, as described before. The second element is the name of the target attribute collection (sa, fa, or a). The third element is the axis number of a multidimensional array that shall be swapped with the current first axis. The fourth element is a new name that shall be used for an attribute in the output dataset. Example: (‘ca.null_prob’, ‘fa’, 1, ‘pvalues’) will take the conditional attribute ‘null_prob’ and store it as a feature attribute ‘pvalues’, while swapping the first and second axes. Simplified instructions can be given by leaving out consecutive tuple elements starting from the end.postproc : Node instance, optional
Node to perform postprocessing of results. This node is applied in
__call__()
to perform a final processing step on the to be result dataset. If None, nothing is done.descr : str
Description of the instance
Methods