Simple Data-ExplorationΒΆ
Example showing some possibilities of data exploration (i.e. to ‘smell’ data).
from mvpa2.suite import *
# load example fmri dataset
ds = load_example_fmri_dataset()
# only use the first 5 chunks to save some cpu-cycles
ds = ds[ds.chunks < 5]
It is always useful to have a quick look at the summary of the dataset and verify that statistics (mean, standard deviation) are in the expected range, that there is balance among targets/chunks, and that order is balanced (where appropriate).
print ds.summary()
Now we can take a look at the distribution of the feature values in all sample categories and chunks.
pl.figure(figsize=(14, 14)) # larger figure
hist(ds, xgroup_attr='chunks', ygroup_attr='targets', noticks=None,
bins=20, normed=True)
# next only works with floating point data
ds.samples = ds.samples.astype('float')
# look at sample similarity
# Note, the decreasing similarity with increasing temporal distance
# of the samples
pl.figure(figsize=(14, 6))
pl.subplot(121)
plot_samples_distance(ds, sortbyattr='chunks')
pl.title('Sample distances (sorted by chunks)')
# similar distance plot, but now samples sorted by their
# respective targets, i.e. samples with same targets are plotted
# in adjacent columns/rows.
# Note, that the first and largest group corresponds to the
# 'rest' condition in the dataset
pl.subplot(122)
plot_samples_distance(ds, sortbyattr='targets')
pl.title('Sample distances (sorted by targets)')
# z-score features individually per chunk
print 'Detrending data'
poly_detrend(ds, polyord=2, chunks_attr='chunks')
print 'Z-Scoring data'
zscore(ds)
pl.figure(figsize=(14, 6))
pl.subplot(121)
plot_samples_distance(ds, sortbyattr='chunks')
pl.title('Distances: z-scored, detrended (sorted by chunks)')
pl.subplot(122)
plot_samples_distance(ds, sortbyattr='targets')
pl.title('Distances: z-scored, detrended (sorted by targets)');
# XXX add some more, maybe show effect of preprocessing
Outputs of the example script. Data prior to preprocessing
data:image/s3,"s3://crabby-images/ef741/ef741f6125d2fc37718d3f299e7368f7015cb1fc" alt="Data prior preprocessing"
Data after minimal preprocessing
data:image/s3,"s3://crabby-images/76509/7650944413ae25227bb7c546fe3e56470c4e3e7b" alt="Data after z-scoring and detrending"
See also
The full source code of this example is included in the PyMVPA source distribution (doc/examples/smellit.py
).