RandomForestClassifierModel __init__¶
-
__init__
(self, name=None)¶ Create a ‘new’ instance of a Random Forest Classifier model.
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of RandomForestClassifierModel
Random Forest [R51] is a supervised ensemble learning algorithm which can be used to perform binary and multi-class classification. The Random Forest Classifier model is initialized, trained on columns of a frame, used to predict the labels of observations in a frame, and tests the predicted labels against the true labels. This model runs the MLLib implementation of Random Forest [R52]. During training, the decision trees are trained in parallel. During prediction, each tree’s prediction is counted as vote for one class. The label is predicted to be the class which receives the most votes. During testing, labels of the observations are predicted and tested against the true labels using built-in binary and multi-class Classification Metrics.
footnotes
[R51] https://en.wikipedia.org/wiki/Random_forest [R52] https://spark.apache.org/docs/1.5.0/mllib-ensembles.html#random-forests Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing three columns.
>>> frame.inspect() [#] Class Dim_1 Dim_2 ======================================= [0] 1 19.8446136104 2.2985856384 [1] 1 16.8973559126 2.6933495054 [2] 1 5.5548729596 2.7777687995 [3] 0 46.1810010826 3.1611961917 [4] 0 44.3117586448 3.3458963222 [5] 0 34.6334526911 3.6429838715 >>> model = ta.RandomForestClassifierModel() [===Job Progress===] >>> train_output = model.train(frame, 'Class', ['Dim_1', 'Dim_2'], num_classes=2, num_trees=1, impurity="entropy", max_depth=4, max_bins=100) [===Job Progress===] >>> train_output {u'impurity': u'entropy', u'max_bins': 100, u'observation_columns': [u'Dim_1', u'Dim_2'], u'num_nodes': 3, u'max_depth': 4, u'seed': 157264076, u'num_trees': 1, u'label_column': u'Class', u'feature_subset_category': u'all', u'num_classes': 2} >>> train_output['num_nodes'] 3 >>> train_output['label_column'] u'Class' >>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2']) [===Job Progress===] >>> predicted_frame.inspect() [#] Class Dim_1 Dim_2 predicted_class ======================================================== [0] 1 19.8446136104 2.2985856384 1 [1] 1 16.8973559126 2.6933495054 1 [2] 1 5.5548729596 2.7777687995 1 [3] 0 46.1810010826 3.1611961917 0 [4] 0 44.3117586448 3.3458963222 0 [5] 0 34.6334526911 3.6429838715 0 >>> test_metrics = model.test(frame, 'Class', ['Dim_1','Dim_2']) [===Job Progress===] >>> test_metrics Precision: 1.0 Recall: 1.0 Accuracy: 1.0 FMeasure: 1.0 Confusion Matrix: Predicted_Pos Predicted_Neg Actual_Pos 3 0 Actual_Neg 0 3 >>> model.publish() [===Job Progress===]