Table Of Contents

RandomForestRegressorModel train


train(self, frame, label_column, observation_columns, num_trees=1, impurity='variance', max_depth=4, max_bins=100, seed=-1073942687, categorical_features_info=None, feature_subset_category=None)

[ALPHA] Build Random Forests Regressor model.

Parameters:

frame : Frame

A frame to train the model on

label_column : unicode

Column name containing the label for each observation

observation_columns : list

Column(s) containing the observations

num_trees : int32 (default=1)

Number of tress in the random forest. Default is 1.

impurity : unicode (default=variance)

Criterion used for information gain calculation. Default supported value is “variance”.

max_depth : int32 (default=4)

Maxium depth of the tree. Default is 4.

max_bins : int32 (default=100)

Maximum number of bins used for splitting features. Default is 100.

seed : int32 (default=-1073942687)

Random seed for bootstrapping and choosing feature subsets. Default is a randomly chosen seed.

categorical_features_info : dict (default=None)

Arity of categorical features. Entry (n-> k) indicates that feature ‘n’ is categorical with ‘k’ categories indexed from 0:{0,1,...,k-1}

feature_subset_category : unicode (default=None)

Number of features to consider for splits at each node. Supported values “auto”, “all”, “sqrt”,”log2”, “onethird”. If “auto” is set, this is based on numTrees: if numTrees == 1, set to “all”; if numTrees > 1, set to “onethird”.

Returns:

: dict

dictionary

|A dictionary with trained Random Forest Regressor model with the following keys: |‘observation_columns’: the list of observation columns on which the model was trained |‘label_columns’: the column name containing the labels of the observations |‘num_trees’: the number of decision trees in the random forest |‘num_nodes’: the number of nodes in the random forest |‘categorical_features_info’: the map storing arity of categorical features |‘impurity’: the criterion used for information gain calculation |‘max_depth’: the maximum depth of the tree |‘max_bins’: the maximum number of bins used for splitting features |‘seed’: the random seed used for bootstrapping and choosing featur subset

Creating a Random Forests Regressor Model using the observation columns and target column.

Examples

See here for examples.