RandomForestClassifierModel init¶

__init__(self, name=None)¶

Create a ‘new’ instance of a Random Forest Classifier model.

Parameters:

Parameters:	name : unicode (default=None) User supplied name.
Returns:	: Model A new instance of RandomForestClassifierModel

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of RandomForestClassifierModel

Random Forest [R51] is a supervised ensemble learning algorithm which can be used to perform binary and multi-class classification. The Random Forest Classifier model is initialized, trained on columns of a frame, used to predict the labels of observations in a frame, and tests the predicted labels against the true labels. This model runs the MLLib implementation of Random Forest [R52]. During training, the decision trees are trained in parallel. During prediction, each tree’s prediction is counted as vote for one class. The label is predicted to be the class which receives the most votes. During testing, labels of the observations are predicted and tested against the true labels using built-in binary and multi-class Classification Metrics.

footnotes

[R51]

https://en.wikipedia.org/wiki/Random_forest

[R52]

https://spark.apache.org/docs/1.5.0/mllib-ensembles.html#random-forests

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns.

>>> frame.inspect()
[#]  Class  Dim_1          Dim_2
=======================================
[0]      1  19.8446136104  2.2985856384
[1]      1  16.8973559126  2.6933495054
[2]      1   5.5548729596  2.7777687995
[3]      0  46.1810010826  3.1611961917
[4]      0  44.3117586448  3.3458963222
[5]      0  34.6334526911  3.6429838715
>>> model = ta.RandomForestClassifierModel()
[===Job Progress===]
>>> train_output = model.train(frame, 'Class', ['Dim_1', 'Dim_2'], num_classes=2, num_trees=1, impurity="entropy", max_depth=4, max_bins=100)
[===Job Progress===]
>>> train_output
{u'impurity': u'entropy', u'max_bins': 100, u'observation_columns': [u'Dim_1', u'Dim_2'], u'num_nodes': 3, u'max_depth': 4, u'seed': 157264076, u'num_trees': 1, u'label_column': u'Class', u'feature_subset_category': u'all', u'num_classes': 2}
>>> train_output['num_nodes']
3
>>> train_output['label_column']
u'Class'
>>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2'])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  Class  Dim_1          Dim_2         predicted_class
========================================================
[0]      1  19.8446136104  2.2985856384                1
[1]      1  16.8973559126  2.6933495054                1
[2]      1   5.5548729596  2.7777687995                1
[3]      0  46.1810010826  3.1611961917                0
[4]      0  44.3117586448  3.3458963222                0
[5]      0  34.6334526911  3.6429838715                0
>>> test_metrics = model.test(frame, 'Class', ['Dim_1','Dim_2'])
[===Job Progress===]
>>> test_metrics
Precision: 1.0
Recall: 1.0
Accuracy: 1.0
FMeasure: 1.0
Confusion Matrix:
            Predicted_Pos  Predicted_Neg
Actual_Pos              3              0
Actual_Neg              0              3
>>> model.publish()
[===Job Progress===]

Quick search

Table Of Contents

RandomForestClassifierModel __init__¶

RandomForestClassifierModel init¶