Table Of Contents

NaiveBayesModel __init__


__init__(self, name=None)

Create a ‘new’ instance of a Naive Bayes model

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of NaiveBayesModel

Naive Bayes [R37] is a probabilistic classifier with strong independence assumptions between features. It computes the conditional probability distribution of each feature given label, and then applies Bayes’ theorem to compute the conditional probability distribution of a label given an observation, and use it for prediction. The Naive Bayes model is initialized, trained on columns of a frame, tested against true labels of a frame and used to predict the value of the dependent variable given the independent observations of a frame and test the performance of the classification on test data. This model runs the MLLib implementation of Naive Bayes [R38].

footnotes

[R37]https://en.wikipedia.org/wiki/Naive_Bayes_classifier
[R38]https://spark.apache.org/docs/1.5.0/mllib-naive-bayes.html

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns.

>>> frame.inspect()
[#]  Class  Dim_1          Dim_2
=======================================
[0]      1  19.8446136104  2.2985856384
[1]      1  16.8973559126  2.6933495054
[2]      1   5.5548729596  2.7777687995
[3]      0  46.1810010826  3.1611961917
[4]      0  44.3117586448  3.3458963222
[5]      0  34.6334526911  3.6429838715
>>> model = ta.NaiveBayesModel()
[===Job Progress===]
>>> model.train(frame, 'Class', ['Dim_1', 'Dim_2'], lambda_parameter=0.9)
[===Job Progress===]
>>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2'])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  Class  Dim_1          Dim_2         predicted_class
========================================================
[0]      1  19.8446136104  2.2985856384              0.0
[1]      1  16.8973559126  2.6933495054              1.0
[2]      1   5.5548729596  2.7777687995              1.0
[3]      0  46.1810010826  3.1611961917              0.0
[4]      0  44.3117586448  3.3458963222              0.0
[5]      0  34.6334526911  3.6429838715              0.0
>>> test_metrics = model.test(frame, 'Class', ['Dim_1','Dim_2'])
[===Job Progress===]
>>> test_metrics
Precision: 1.0
Recall: 0.666666666667
Accuracy: 0.833333333333
FMeasure: 0.8
Confusion Matrix:
            Predicted_Pos  Predicted_Neg
Actual_Pos              2              1
Actual_Neg              0              3
>>> model.publish()
[===Job Progress===]