mvpa2.datasets.sources.skl_data.skl_low_rank_matrix

mvpa2.datasets.sources.skl_data.skl_low_rank_matrix(n_samples=100, n_features=100, effective_rank=10, tail_strength=0.5, random_state=None)

Generate a mostly low rank matrix with bell-shaped singular values

Most of the variance can be explained by a bell-shaped curve of width effective_rank: the low rank part of the singular values profile is:

(1 - tail_strength) * exp(-1.0 * (i / effective_rank) ** 2)

The remaining singular values’ tail is fat, decreasing as:

tail_strength * exp(-0.1 * i / effective_rank).

The low rank part of the profile can be considered the structured signal part of the data while the tail can be considered the noisy part of the data that cannot be summarized by a low number of linear components (singular vectors).

This kind of singular profiles is often seen in practice, for instance:
  • gray level pictures of faces
  • TF-IDF vectors of text documents crawled from the web
Parameters:

n_samples : int, optional (default=100)

The number of samples.

n_features : int, optional (default=100)

The number of features.

effective_rank : int, optional (default=10)

The approximate number of singular vectors required to explain most of the data by linear combinations.

tail_strength : float between 0.0 and 1.0, optional (default=0.5)

The relative importance of the fat noisy tail of the singular values profile.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

X : array of shape [n_samples, n_features]

The matrix.

Notes

This function has been auto-generated by wrapping make_low_rank_matrix() from the sklearn package. The documentation of this function has been kept verbatim. Consequently, the actual return value is not as described in the documentation, but the data is returned as a PyMVPA dataset.