7.6.5. mclearn.preprocessing.balanced_train_test_split¶

mclearn.preprocessing.balanced_train_test_split(data, features, target, train_size, test_size, random_state=None)[source]¶

Split the data into a balanced training set and test set of some given size.

For a dataset with an unequal numer of samples in each class, one useful procedure is to split the data into a training and a test set in such a way that the classes are balanced.

Parameters:

data (DataFrame, shape = [n_samples, n_features]) – Where each row is a sample point and each column is a feature.
features (array, shape = [n_features]) – The names of the columns in data that are used as feature vectors.
target (str) – The name of the column in data that is used as the traget vector
train_size (int) – Number of sample points from each class in the training set.
test_size (int) – Number of sample points from each class in the test set.
random_state (int, optional (default=None)) – Random seed.

Returns:

X_train (array) – The feature vectors (stored as columns) in the training set.
X_test (array) – The feature vectors (stored as columns) in the test set.
y_train (array) – The target vector in the training set.
y_test (array) – The target vector in the test set.

Table Of Contents

Search

7.6.5. mclearn.preprocessing.balanced_train_test_split¶