Models NaiveBayesModel¶
-
class
NaiveBayesModel
¶ Entity NaiveBayesModel
Attributes
last_read_date Read-only property - Last time this model’s data was accessed. name Set or get the name of the model object. status Read-only property - Current model life cycle status. Methods
__init__(self[, name, _info]) Create a ‘new’ instance of a Naive Bayes model predict(self, frame[, observation_columns]) [ALPHA] Predict labels for data points using trained Naive Bayes model. publish(self) [ALPHA] Creates a scoring engine tar file. test(self, frame, label_column[, observation_columns]) [ALPHA] Predict test frame labels and return metrics. train(self, frame, label_column, observation_columns[, lambda_parameter]) [ALPHA] Build a naive bayes model.
-
__init__
(self, name=None)¶ Create a ‘new’ instance of a Naive Bayes model
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of NaiveBayesModel
Naive Bayes [R35] is a probabilistic classifier with strong independence assumptions between features. It computes the conditional probability distribution of each feature given label, and then applies Bayes’ theorem to compute the conditional probability distribution of a label given an observation, and use it for prediction. The Naive Bayes model is initialized, trained on columns of a frame, tested against true labels of a frame and used to predict the value of the dependent variable given the independent observations of a frame and test the performance of the classification on test data. This model runs the MLLib implementation of Naive Bayes [R36].
footnotes
[R35] https://en.wikipedia.org/wiki/Naive_Bayes_classifier [R36] https://spark.apache.org/docs/1.5.0/mllib-naive-bayes.html Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing three columns.
>>> frame.inspect() [#] Class Dim_1 Dim_2 ======================================= [0] 1 19.8446136104 2.2985856384 [1] 1 16.8973559126 2.6933495054 [2] 1 5.5548729596 2.7777687995 [3] 0 46.1810010826 3.1611961917 [4] 0 44.3117586448 3.3458963222 [5] 0 34.6334526911 3.6429838715
>>> model = ta.NaiveBayesModel() [===Job Progress===] >>> model.train(frame, 'Class', ['Dim_1', 'Dim_2'], lambda_parameter=0.9) [===Job Progress===] >>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2']) [===Job Progress===] >>> predicted_frame.inspect() [#] Class Dim_1 Dim_2 predicted_class ======================================================== [0] 1 19.8446136104 2.2985856384 0.0 [1] 1 16.8973559126 2.6933495054 1.0 [2] 1 5.5548729596 2.7777687995 1.0 [3] 0 46.1810010826 3.1611961917 0.0 [4] 0 44.3117586448 3.3458963222 0.0 [5] 0 34.6334526911 3.6429838715 0.0
>>> test_metrics = model.test(frame, 'Class', ['Dim_1','Dim_2']) [===Job Progress===] >>> test_metrics Precision: 1.0 Recall: 0.666666666667 Accuracy: 0.833333333333 FMeasure: 0.8 Confusion Matrix: Predicted_Pos Predicted_Neg Actual_Pos 2 1 Actual_Neg 0 3 >>> model.publish() [===Job Progress===]