mvpa2.datasets.sources.skl_data.skl_multilabel_classification

mvpa2.datasets.sources.skl_data.skl_multilabel_classification(n_samples=100, n_features=20, n_classes=5, n_labels=2, length=50, allow_unlabeled=True, sparse=False, return_indicator=False, return_distributions=False, random_state=None)

Generate a random multilabel classification problem.

For each sample, the generative process is:
  • pick the number of labels: n ~ Poisson(n_labels)
  • n times, choose a class c: c ~ Multinomial(theta)
  • pick the document length: k ~ Poisson(length)
  • k times, choose a word: w ~ Multinomial(theta_c)

In the above process, rejection sampling is used to make sure that n is never zero or more than n_classes, and that the document length is never zero. Likewise, we reject classes which have already been chosen.

Parameters:

n_samples : int, optional (default=100)

The number of samples.

n_features : int, optional (default=20)

The total number of features.

n_classes : int, optional (default=5)

The number of classes of the classification problem.

n_labels : int, optional (default=2)

The average number of labels per instance. More precisely, the number of labels per sample is drawn from a Poisson distribution with n_labels as its expected value, but samples are bounded (using rejection sampling) by n_classes, and must be nonzero if allow_unlabeled is False.

length : int, optional (default=50)

The sum of the features (number of words if documents) is drawn from a Poisson distribution with this expected value.

allow_unlabeled : bool, optional (default=True)

If True, some instances might not belong to any class.

sparse : bool, optional (default=False)

If True, return a sparse feature matrix

return_indicator : bool, optional (default=False),

If True, return Y in the binary indicator format, else return a tuple of lists of labels.

return_distributions : bool, optional (default=False)

If True, return the prior class probability and conditional probabilities of features given classes, from which the data was drawn.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:

X : array or sparse CSR matrix of shape [n_samples, n_features]

The generated samples.

Y : tuple of lists or array of shape [n_samples, n_classes]

The label sets.

p_c : array, shape [n_classes]

The probability of each class being drawn. Only returned if return_distributions=True.

p_w_c : array, shape [n_features, n_classes]

The probability of each feature being drawn given each class. Only returned if return_distributions=True.

Notes

This function has been auto-generated by wrapping make_multilabel_classification() from the sklearn package. The documentation of this function has been kept verbatim. Consequently, the actual return value is not as described in the documentation, but the data is returned as a PyMVPA dataset.