Table Of Contents

RandomForestRegressorModel __init__


__init__(self, name=None)

Create a ‘new’ instance of a Random Forest Regressor model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of RandomForestRegressor Model

Random Forest [R55] is a supervised ensemble learning algorithm used to perform regression. A Random Forest Regressor model is initialized, trained on columns of a frame, and used to predict the value of each observation in the frame. This model runs the MLLib implementation of Random Forest [R56]. During training, the decision trees are trained in parallel. During prediction, the average over-all tree’s predicted value is the predicted value of the random forest.

footnotes

[R55]https://en.wikipedia.org/wiki/Random_forest
[R56]https://spark.apache.org/docs/1.5.0/mllib-ensembles.html#random-forests

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns.

>>> frame.inspect()
[#]  Class  Dim_1          Dim_2
=======================================
[0]      1  19.8446136104  2.2985856384
[1]      1  16.8973559126  2.6933495054
[2]      1   5.5548729596  2.7777687995
[3]      0  46.1810010826  3.1611961917
[4]      0  44.3117586448  3.3458963222
[5]      0  34.6334526911  3.6429838715
>>> model = ta.RandomForestRegressorModel()
[===Job Progress===]
>>> train_output = model.train(frame, 'Class', ['Dim_1', 'Dim_2'], num_trees=1, impurity="variance", max_depth=4, max_bins=100)
[===Job Progress===]
>>> train_output
{u'impurity': u'variance', u'max_bins': 100, u'observation_columns': [u'Dim_1', u'Dim_2'], u'num_nodes': 3, u'max_depth': 4, u'seed': -1632404927, u'num_trees': 1, u'label_column': u'Class', u'feature_subset_category': u'all'}
>>> train_output['num_nodes']
3
>>> train_output['label_column']
u'Class'
>>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2'])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  Class  Dim_1          Dim_2         predicted_value
========================================================
[0]      1  19.8446136104  2.2985856384                1.0
[1]      1  16.8973559126  2.6933495054                1.0
[2]      1   5.5548729596  2.7777687995                1.0
[3]      0  46.1810010826  3.1611961917                0.0
[4]      0  44.3117586448  3.3458963222                0.0
[5]      0  34.6334526911  3.6429838715                0.0
>>> model.publish()
[===Job Progress===]