Models NaiveBayesModel


class NaiveBayesModel

Entity NaiveBayesModel

Attributes

last_read_date Read-only property - Last time this model’s data was accessed.
name Set or get the name of the model object.
status Read-only property - Current model life cycle status.

Methods

__init__(self[, name, _info]) Create a ‘new’ instance of a Naive Bayes model
predict(self, frame[, observation_columns]) [ALPHA] Predict labels for data points using trained Naive Bayes model.
publish(self) [ALPHA] Creates a scoring engine tar file.
test(self, frame, label_column[, observation_columns]) [ALPHA] Predict test frame labels and return metrics.
train(self, frame, label_column, observation_columns[, lambda_parameter]) [ALPHA] Build a naive bayes model.
__init__(self, name=None)

Create a ‘new’ instance of a Naive Bayes model

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of NaiveBayesModel

Naive Bayes [R35] is a probabilistic classifier with strong independence assumptions between features. It computes the conditional probability distribution of each feature given label, and then applies Bayes’ theorem to compute the conditional probability distribution of a label given an observation, and use it for prediction. The Naive Bayes model is initialized, trained on columns of a frame, tested against true labels of a frame and used to predict the value of the dependent variable given the independent observations of a frame and test the performance of the classification on test data. This model runs the MLLib implementation of Naive Bayes [R36].

footnotes

[R35]https://en.wikipedia.org/wiki/Naive_Bayes_classifier
[R36]https://spark.apache.org/docs/1.5.0/mllib-naive-bayes.html

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns.

>>> frame.inspect()
[#]  Class  Dim_1          Dim_2
=======================================
[0]      1  19.8446136104  2.2985856384
[1]      1  16.8973559126  2.6933495054
[2]      1   5.5548729596  2.7777687995
[3]      0  46.1810010826  3.1611961917
[4]      0  44.3117586448  3.3458963222
[5]      0  34.6334526911  3.6429838715
>>> model = ta.NaiveBayesModel()
[===Job Progress===]
>>> model.train(frame, 'Class', ['Dim_1', 'Dim_2'], lambda_parameter=0.9)
[===Job Progress===]
>>> predicted_frame = model.predict(frame, ['Dim_1', 'Dim_2'])
[===Job Progress===]
>>> predicted_frame.inspect()
[#]  Class  Dim_1          Dim_2         predicted_class
========================================================
[0]      1  19.8446136104  2.2985856384              0.0
[1]      1  16.8973559126  2.6933495054              1.0
[2]      1   5.5548729596  2.7777687995              1.0
[3]      0  46.1810010826  3.1611961917              0.0
[4]      0  44.3117586448  3.3458963222              0.0
[5]      0  34.6334526911  3.6429838715              0.0
>>> test_metrics = model.test(frame, 'Class', ['Dim_1','Dim_2'])
[===Job Progress===]
>>> test_metrics
Precision: 1.0
Recall: 0.666666666667
Accuracy: 0.833333333333
FMeasure: 0.8
Confusion Matrix:
            Predicted_Pos  Predicted_Neg
Actual_Pos              2              1
Actual_Neg              0              3
>>> model.publish()
[===Job Progress===]