A Dataset is a simple abstraction around a data and a target matrix.
A Dataset’s data and target attributes are available via attributes of the same name:
>>> data = np.array([[3, 2, 1], [2, 1, 0]] * 4) >>> target = np.array([3, 2] * 4) >>> dataset = Dataset(data, target) >>> dataset.data is data True >>> dataset.target is target True
Attribute split_indices gives us a cross-validation generator:
>>> for train_index, test_index in dataset.split_indices: ... X_train, X_test, = data[train_index], data[test_index] ... y_train, y_test, = target[train_index], target[test_index]
An example of where a cross-validation generator like split_indices returns it is expected is sklearn.grid_search.GridSearchCV.
If all you want is a train/test split of your data, you can simply call Dataset.train_test_split():
>>> X_train, X_test, y_train, y_test = dataset.train_test_split() >>> X_train.shape, X_test.shape, y_train.shape, y_test.shape ((6, 3), (2, 3), (6,), (2,))