Naive Bayes Classification

In-Database Naive Bayes modelization and prediction. Copies the interface of sklearn.naive_bayes

Initiate model

Create NaiveBayes object

class ibmdbpy.learn.naive_bayes.NaiveBayes(modelname=None, disc=None, bins=None)[source]

The Naive Bayes classification algorithm is a probabilistic classifier. It is based on probability models that incorporate strong independence assumptions. Often, the independence assumptions do not have an impact on reality. Therefore, they are considered naive.

The NaiveBayes class provides an interface for using the NAIVEBAYES and PREDICT_NAIVEBAYES IDAX methods of dashDB/DB2.

__init__(modelname=None, disc=None, bins=None)[source]

Constructor for NaiveBayes model objects

Parameters:

modelname : str, optional

The name of the Naive Bayes model that will be built. If no name is specified, it will be generated automatically. If the parameter corresponds to an existing model in the database, it is replaced during the fitting step.

disc : str, optional, default: ew

Determine the automatic discretization of all continuous attributes. The following values are allowed: ef, em, ew, and ewn.

  • disc=ef
    Equal-frequency discretization. An unsupervised discretization algorithm that uses the equal frequency criterion for interval bound setting.
  • disc=em
    Minimal entropy discretization. An unsupervised discretization algorithm that uses the minimal entropy criterion for interval bound setting.
  • disc=ew (default)
    Equal-width discretization. An unsupervised discretization algorithm that uses the equal width criterion for interval bound setting.
  • disc=ewn
    Equal-width discretization with nice bucket limits. An unsupervised discretization algorithm that uses the equal width criterion for interval bound setting.

bins : int, optional, default

Number of bins for numeric columns.

Returns:

The NaiveBayes object, ready to be used for fitting and prediction.

Notes

Inner parameters of the model can be printed and modified by using get_params and set_params. But we recommend creating a new NaiveBayes model instead of modifying an existing model.

Examples

>>> idadb = IdaDataBase("BLUDB-TEST")
>>> idadf = IdaDataFrame(idadb, "IRIS")
>>> bayes = NaiveBayes("NAIVEBAYES_TEST")
>>> bayes.fit(idadf, column_id="ID", target="species")
>>> bayes.predict(idadf, outtable="IRIS_PREDICTION", outtableProb="IRIS_PREDICTIONPROB")

Attributes

TODO  

Get parameters

NaiveBayes.get_params()[source]

Return the parameters of the Naive Byes model.

Set parameters

NaiveBayes.set_params(**params)[source]

Modify the parameters of the Naive Bayes model.

Methods

Fit and predict

fit

NaiveBayes.fit(idadf, target, column_id=u'ID', incolumn=None, coldeftype=None, coldefrole=None, colpropertiestable=None, verbose=False)[source]

Create a Naive Bayes model from an IdaDataFrame.

Parameters:

idadf : IdaDataFrame

The IdaDataFrame to be used as input.

target : str

The column of the input table that represents the class

column_id : str, default: “ID

The column of the input table that identifies the transaction ID.

incolumn : str, optional

The columns of the input table that have specific properties, which are separated by a semi-colon (;). Each column is succeeded by one or more of the following properties:

  • By type nominal (‘:nom’) or by type continuous (‘:cont’). By default, numerical types are continuous, and all other types are nominal.
  • By role ‘:id’, ‘:target’, ‘:input’, or ‘:ignore’.

If this parameter is not specified, all columns of the input table have default properties.

coldeftype : str, optional

The default type of the input table columns. The following values are allowed: ‘nom’ and ‘cont’. If the parameter is not specified, numeric columns are continuous, and all other columns are nominal.

coldefrole : str, optional

The default role of the input table columns. The following values are allowed: ‘input’ and ‘ignore’. If the parameter is not specified, all columns are input columns.

colpropertiestable : str, optional

The input table where the properties of the columns of the input table are stored. If this parameter is not specified, the column properties of the input table column properties are detected automatically.

verbose : bool, default: False

Verbosity mode.

predict

NaiveBayes.predict(idadf, column_id=None, outtable=None, outtableProb=None, mestimation=False)[source]

Use the Naive Bayes predict stored procedure to apply a Naive Bayes model to generate classification predictions for a data set.

Parameters:

idadf : IdaDataFrame

IdaDataFrame to be used as input.

column_id : str, optional

The column of the input table that identifies a unique instance ID. By default, the same id column that is specified in the stored procedure to build the model.

outtable : str, optional

The name of the output table where the predictions are stored. If this parameter is not specified, it is generated automatically. If the parameter corresponds to an existing table in the database, it will be replaced.

outtableProb : str, optional

The output table where the probabilities for each of the classes are stored. If this parameter is not specified, the table is not created. If the parameter corresponds to an existing table in the database, it will be replaced.

mestimation : flag, default: False

A flag that indicates the use of m-estimation for probabilities. This kind of estimation might be slower than other ones, but it might produce better results for small or unbalanced data sets.

Returns:

IdaDataFrame

IdaDataFrame containing the classification decision for each datapoints referenced by their ID.

fit_predict

NaiveBayes.fit_predict(idadf, column_id=u'ID', incolumn=None, coldeftype=None, coldefrole=None, colprepertiesTable=None, outtable=None, outtableProb=None, mestimation=False, verbose=False)[source]

Convenience function for fitting the model and using it to make predictions about the same dataset. See to fit and predict documentation for an explanation about their attributes.

Explore result

describe

NaiveBayes.describe()[source]

Return a description of Naives Bayes.

get labels

NaiveBayes.labels_()[source]

Return the labels of the classification if available

_retrieve_NaiveBayes_Model

NaiveBayes._retrieve_NaiveBayes_Model(modelname, verbose=False)[source]

Retrieve information about the model to print the results. The Naive Bayes IDAX function stores its result in 2 tables:

  • <MODELNAME>_MODEL
  • <MODELNAME>_DISCRANGES
Parameters:

modelname : str

The name of the model that is retrieved.

verbose : bol, default: False

Verbosity mode.

Notes

Needs better formatting instead of printing the tables.