NaiveBayesModel __init__¶
-
__init__
(self, name=None)¶ Create a ‘new’ instance of a Naive Bayes model
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of NaiveBayesModel
Naive Bayes [R37] is a probabilistic classifier with strong independence assumptions between features. It computes the conditional probability distribution of each feature given label, and then applies Bayes’ theorem to compute the conditional probability distribution of a label given an observation, and use it for prediction. The Naive Bayes model is initialized, trained on columns of a frame, tested against true labels of a frame and used to predict the value of the dependent variable given the independent observations of a frame and test the performance of the classification on test data. This model runs the MLLib implementation of Naive Bayes [R38].
footnotes
[R37] https://en.wikipedia.org/wiki/Naive_Bayes_classifier [R38] https://spark.apache.org/docs/1.5.0/mllib-naive-bayes.html Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing three columns.
>>> frame.inspect() [#] Class Dim_1 Dim_2 ======================================= [0] 1 19.8446136104 2.2985856384 [1] 1 16.8973559126 2.6933495054 [2] 1 5.5548729596 2.7777687995 [3] 0 46.1810010826 3.1611961917 [4] 0 44.3117586448 3.3458963222 [5] 0 34.6334526911 3.6429838715
>>> model = ta.NaiveBayesModel() [===Job Progress===] >>> model.train(frame, 'Class', ['Dim_1', 'Dim_2'], lambda_parameter=0.9) [===Job Progress===] >>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2']) [===Job Progress===] >>> predicted_frame.inspect() [#] Class Dim_1 Dim_2 predicted_class ======================================================== [0] 1 19.8446136104 2.2985856384 0.0 [1] 1 16.8973559126 2.6933495054 1.0 [2] 1 5.5548729596 2.7777687995 1.0 [3] 0 46.1810010826 3.1611961917 0.0 [4] 0 44.3117586448 3.3458963222 0.0 [5] 0 34.6334526911 3.6429838715 0.0
>>> test_metrics = model.test(frame, 'Class', ['Dim_1','Dim_2']) [===Job Progress===] >>> test_metrics Precision: 1.0 Recall: 0.666666666667 Accuracy: 0.833333333333 FMeasure: 0.8 Confusion Matrix: Predicted_Pos Predicted_Neg Actual_Pos 2 1 Actual_Neg 0 3 >>> model.publish() [===Job Progress===]