This content refers to an unreleased development version of PyMVPA
To provide the most recent news and documentation www.pymvpa.org reflects the development 0.6 series of PyMVPA. If you are interested in the documentation of the previous stable 0.4 series of PyMVPA, please visit v04.pymvpa.org.

Classification of SVD-mapped DatasetsΒΆ

Demonstrate the usage of a dataset mapper performing data projection onto singular value components within a cross-validation – for any classifier.

from mvpa2.suite import *

if __debug__:
    debug.active += ["REPM"]

#
# load PyMVPA example dataset
#
attr = SampleAttributes(os.path.join(pymvpa_dataroot,
                        'attributes_literal.txt'))
dataset = fmri_dataset(os.path.join(pymvpa_dataroot, 'bold.nii.gz'),
                       targets=attr.targets, chunks=attr.chunks,
                       mask=os.path.join(pymvpa_dataroot, 'mask.nii.gz'))

#
# preprocessing
#

# do chunkswise linear detrending on dataset
poly_detrend(dataset, polyord=1, chunks_attr='chunks')

# only use 'rest', 'cats' and 'scissors' samples from dataset
dataset = dataset[np.array([ l in ['rest', 'cat', 'scissors']
                    for l in dataset.targets], dtype='bool')]

# zscore dataset relative to baseline ('rest') mean
zscore(dataset, chunks_attr='chunks', param_est=('targets', ['rest']), dtype='float32')

# remove baseline samples from dataset for final analysis
dataset = dataset[dataset.sa.targets != 'rest']

# Specify the class of a base classifier to be used
Clf = LinearCSVMC
# And create the instance of SVDMapper to be reused
svdmapper = SVDMapper()

Lets create a generator of a ChainMapper which would first perform SVD and then subselect the desired range of components.

get_SVD_sliced = lambda x: ChainMapper([svdmapper,
                                        StaticFeatureSelection(x)])

Now we can define a list of some classifiers: a simple one and several classifiers with built-in SVD transformation and selection of corresponding SVD subspaces

clfs = [('All orig.\nfeatures (%i)' % dataset.nfeatures, Clf()),
        ('All Comps\n(%i)' % (dataset.nsamples \
                 - (dataset.nsamples / len(dataset.UC)),),
                        MappedClassifier(Clf(), svdmapper)),
        ('First 5\nComp.', MappedClassifier(Clf(),
                        get_SVD_sliced(slice(0, 5)))),
        ('First 30\nComp.', MappedClassifier(Clf(),
                        get_SVD_sliced(slice(0, 30)))),
        ('Comp.\n6-30', MappedClassifier(Clf(),
                        get_SVD_sliced(slice(5, 30))))]


# run and visualize in barplot
results = []
labels = []

for desc, clf in clfs:
    print desc.replace('\n', ' ')
    cv = CrossValidation(clf, NFoldPartitioner())
    res = cv(dataset)
    # there is only one 'feature' i.e. the error in the returned
    # dataset
    results.append(res.samples[:,0])
    labels.append(desc)

plot_bars(results, labels=labels,
         title='Linear C-SVM classification (cats vs. scissors)',
         ylabel='Mean classification error (N-1 cross-validation, 12-fold)',
         distance=0.5)

Output of the example analysis:

Generalization performance on the selected PCs.

See also

The full source code of this example is included in the PyMVPA source distribution (doc/examples/svdclf.py).

Previous topic

Sensitivity Measure

Next topic

Monte-Carlo testing of Classifier-based Analyses

NeuroDebian

NITRC-listed