7.6.5. mclearn.preprocessing.balanced_train_test_split¶
-
mclearn.preprocessing.
balanced_train_test_split
(data, features, target, train_size, test_size, random_state=None)[source]¶ Split the data into a balanced training set and test set of some given size.
For a dataset with an unequal numer of samples in each class, one useful procedure is to split the data into a training and a test set in such a way that the classes are balanced.
Parameters: - data (DataFrame, shape = [n_samples, n_features]) – Where each row is a sample point and each column is a feature.
- features (array, shape = [n_features]) – The names of the columns in data that are used as feature vectors.
- target (str) – The name of the column in data that is used as the traget vector
- train_size (int) – Number of sample points from each class in the training set.
- test_size (int) – Number of sample points from each class in the test set.
- random_state (int, optional (default=None)) – Random seed.
Returns: - X_train (array) – The feature vectors (stored as columns) in the training set.
- X_test (array) – The feature vectors (stored as columns) in the test set.
- y_train (array) – The target vector in the training set.
- y_test (array) – The target vector in the test set.