Naive Bayes Classification¶
In-Database Naive Bayes modelization and prediction. Copies the interface of sklearn.naive_bayes
Initiate model¶
Create NaiveBayes object¶
-
class
ibmdbpy.learn.naive_bayes.
NaiveBayes
(modelname=None, disc=None, bins=None)[source]¶ The Naive Bayes classification algorithm is a probabilistic classifier. It is based on probability models that incorporate strong independence assumptions. Often, the independence assumptions do not have an impact on reality. Therefore, they are considered naive.
The NaiveBayes class provides an interface for using the NAIVEBAYES and PREDICT_NAIVEBAYES IDAX methods of dashDB/DB2.
-
__init__
(modelname=None, disc=None, bins=None)[source]¶ Constructor for NaiveBayes model objects
Parameters: modelname : str, optional
The name of the Naive Bayes model that will be built. If no name is specified, it will be generated automatically. If the parameter corresponds to an existing model in the database, it is replaced during the fitting step.
disc : str, optional, default: ew
Determine the automatic discretization of all continuous attributes. The following values are allowed: ef, em, ew, and ewn.
- disc=ef
- Equal-frequency discretization. An unsupervised discretization algorithm that uses the equal frequency criterion for interval bound setting.
- disc=em
- Minimal entropy discretization. An unsupervised discretization algorithm that uses the minimal entropy criterion for interval bound setting.
- disc=ew (default)
- Equal-width discretization. An unsupervised discretization algorithm that uses the equal width criterion for interval bound setting.
- disc=ewn
- Equal-width discretization with nice bucket limits. An unsupervised discretization algorithm that uses the equal width criterion for interval bound setting.
bins : int, optional, default
Number of bins for numeric columns.
Returns: The NaiveBayes object, ready to be used for fitting and prediction.
Notes
Inner parameters of the model can be printed and modified by using get_params and set_params. But we recommend creating a new NaiveBayes model instead of modifying an existing model.
Examples
>>> idadb = IdaDataBase("BLUDB-TEST") >>> idadf = IdaDataFrame(idadb, "IRIS") >>> bayes = NaiveBayes("NAIVEBAYES_TEST") >>> bayes.fit(idadf, column_id="ID", target="species") >>> bayes.predict(idadf, outtable="IRIS_PREDICTION", outtableProb="IRIS_PREDICTIONPROB")
Attributes
TODO
-
Fit and predict¶
fit¶
-
NaiveBayes.
fit
(idadf, target, column_id=u'ID', incolumn=None, coldeftype=None, coldefrole=None, colpropertiestable=None, verbose=False)[source]¶ Create a Naive Bayes model from an IdaDataFrame.
Parameters: idadf : IdaDataFrame
The IdaDataFrame to be used as input.
target : str
The column of the input table that represents the class
column_id : str, default: “ID
The column of the input table that identifies the transaction ID.
incolumn : str, optional
The columns of the input table that have specific properties, which are separated by a semi-colon (;). Each column is succeeded by one or more of the following properties:
- By type nominal (‘:nom’) or by type continuous (‘:cont’). By default, numerical types are continuous, and all other types are nominal.
- By role ‘:id’, ‘:target’, ‘:input’, or ‘:ignore’.
If this parameter is not specified, all columns of the input table have default properties.
coldeftype : str, optional
The default type of the input table columns. The following values are allowed: ‘nom’ and ‘cont’. If the parameter is not specified, numeric columns are continuous, and all other columns are nominal.
coldefrole : str, optional
The default role of the input table columns. The following values are allowed: ‘input’ and ‘ignore’. If the parameter is not specified, all columns are input columns.
colpropertiestable : str, optional
The input table where the properties of the columns of the input table are stored. If this parameter is not specified, the column properties of the input table column properties are detected automatically.
verbose : bool, default: False
Verbosity mode.
predict¶
-
NaiveBayes.
predict
(idadf, column_id=None, outtable=None, outtableProb=None, mestimation=False)[source]¶ Use the Naive Bayes predict stored procedure to apply a Naive Bayes model to generate classification predictions for a data set.
Parameters: idadf : IdaDataFrame
IdaDataFrame to be used as input.
column_id : str, optional
The column of the input table that identifies a unique instance ID. By default, the same id column that is specified in the stored procedure to build the model.
outtable : str, optional
The name of the output table where the predictions are stored. If this parameter is not specified, it is generated automatically. If the parameter corresponds to an existing table in the database, it will be replaced.
outtableProb : str, optional
The output table where the probabilities for each of the classes are stored. If this parameter is not specified, the table is not created. If the parameter corresponds to an existing table in the database, it will be replaced.
mestimation : flag, default: False
A flag that indicates the use of m-estimation for probabilities. This kind of estimation might be slower than other ones, but it might produce better results for small or unbalanced data sets.
Returns: IdaDataFrame
IdaDataFrame containing the classification decision for each datapoints referenced by their ID.
fit_predict¶
-
NaiveBayes.
fit_predict
(idadf, column_id=u'ID', incolumn=None, coldeftype=None, coldefrole=None, colprepertiesTable=None, outtable=None, outtableProb=None, mestimation=False, verbose=False)[source]¶ Convenience function for fitting the model and using it to make predictions about the same dataset. See to fit and predict documentation for an explanation about their attributes.
Explore result¶
_retrieve_NaiveBayes_Model¶
-
NaiveBayes.
_retrieve_NaiveBayes_Model
(modelname, verbose=False)[source]¶ Retrieve information about the model to print the results. The Naive Bayes IDAX function stores its result in 2 tables:
- <MODELNAME>_MODEL
- <MODELNAME>_DISCRANGES
Parameters: modelname : str
The name of the model that is retrieved.
verbose : bol, default: False
Verbosity mode.
Notes
Needs better formatting instead of printing the tables.