Association Rules Mining

In-Database Association Rules Mining

Initiate model

Create an AssociationRules object

class ibmdbpy.learn.association_rules.AssociationRules(modelname=None, minsupport=None, maxlen=5, maxheadlen=1, minconf=0.5)[source]

Association rules mining can be used to discover interesting and useful relations between items in a large-scale transaction table. You can identify strong rules between related items by using different measures of relevance. Apriori or FP-Growth are well-known algorithms for association rules mining. For analytic stored procedures, the PrefixSpan algorithm is preferred due to its scalability.

The AssociationRules class provides an interface for using the ASSOCRULES amd PREDICT_ASSOCRULES IDAX methods of dashDB/DB2.

__init__(modelname=None, minsupport=None, maxlen=5, maxheadlen=1, minconf=0.5)[source]

Constructor for association rules model

Parameters:

modelname : str

The name of the Association Rules model that is built. If the parameter corresponds to an existing model in the database, it will be replaced during the fitting step.

minsupport : float or integer, optional

The minimum fraction (0.0 - 1.0) or the minimum number (above 1) of transactions that must contain a pattern to be considered as frequent. Default: system-determined Range: >0.0 and <1.0 for a minimum fraction

>1 for a minimum number of transactions.

maxlen : int, optional, >=2, default: 5

The maximum length of a pattern or a rule, that is, the maximum number of items per pattern or rule.

maxheadlen : int, optional, >= 1 and <maxlen, default: 1

The maximum length of a rule head, that is, the maximum number of items that might belong to the item set on the right side of a rule. Increasing this value might significantly increase the number of detected rules.

minconf : float, optional, >=0.0 and <= 1, default: 0.5

The minimum confidence that a rule must achieve to be kept in the model of the pattern.

Returns:

The AssociationRules object, ready to be used for fitting and prediction

Notes

Inner parameters of the model can be printed and modified by using get_params and set_params. But we recommend creating a new AssociationRules model instead of modifying it.

Examples

>>> idadb = IdaDataBase("BLUDB-TEST")
>>> idadf = IdaDataFrame(idadb, "GROCERIES")
>>> arules = AssociationRules("ASSOCRULES_TEST")
>>> arules.fit(idadf, transaction_id = "TID", item_id = "SID")

Attributes

TODO  

Get parameters

AssociationRules.get_params()[source]

Return the parameters of the Association Rules model.

Set parameters

AssociationRules.set_params(**params)[source]

Modify the parameters of the Association Rules model.

Methods

Fit, prune and predict

fit

AssociationRules.fit(idadf, transaction_id, item_id, nametable=None, namecol=None, verbose=False)[source]

Create an Association Rules model from an IdaDataFrame.

Parameters:

idadf : IdaDataFrame

The IdaDataFrame to be used as input.

transaction_id : str

The column of the input table that identifies the transaction ID.

item_id : str

The column of the input table that identifies an item of the transaction.

nametable : str, optional

The table that contains a mapping of the items in the input table and their names. The table must contain at least two columns, where * The first column has the same name as the column that is contained in the item parameter of the input table * The second column has the same name as the name that is defined in the namecol parameter

namecol : str, optional

The column that contains the item name that is defined in the nametable parameter. You cannot specify this parameter if the nametable parameter is not specified.

prune

AssociationRules.prune(itemsin=None, itemsout=None, minlen=1, maxlen=None, minsupport=0, maxsupport=1, minlift=None, maxlift=None, minconf=None, maxconf=None, reset=False)[source]

Prune the rules and patterns of an association rules model. To remove rules and pattern which you are not interested in, you can use filters to exclude these rules and patterns. These rules and patterns are then marked as not valid in the model and are no longer shown.

Parameters:

itemsin : str or list, optional

A list of item names that must be contained in the rules or patterns to be kept. The items are separated by semicolons. At least one of the listed items must be contained in a rule or pattern to be kept. For rules, the following conditions apply:

  • To indicate that the item must be contained in the head of

then rule, the item names can be succeeded by :h or :head. * To indicate that the item must be contained in the body of the rule, the item names can be succeeded by :b or :body

If this parameter is not specified, no constraint is applied.

itemsout : str or list, optional

A list of item names that must not be contained in the rules or patterns to be kept. The items are separated by semicolons. If this parameter is not specified, no constraint is applied.

minlen : int, optional, >=1, default: 1

The minimum number of items that are to be kept in the rules or patterns.

maxlen : int, optional, >=1, default: the longest pattern of the model

The maximum number of items that are to be kept in the rules or patterns.

minsupport : float, optional, >=0.0 and <=maxsupport, default

The minimum support for the rules or patterns that are to be kept.

maxsupport : float, optional, >=minsupport and <=1.0, default

The maximum support for the rules or patterns that are to be kept.

minlift : float, optional, >=0.0 and <=maxlift, defaukt

The minimum lift of the rules or patterns that are to be kept.

maxlift : float, optional, >=minlift, default: the maximum lift of the patterns of the model

The maximum lift of the rules or patterns that are to be kept.

minconf : float, optional, >=0.0 and <= maxconf, default

The minimum confidence of the rules that are to be kept.

maxconf : float, optional, >=minconf and <= 1.0, default

The maximum confidence of the rules that are to be kept.

reset : bool, optional, default: false

If you specify reset=true, all rules and patterns are first reset to not pruned. If you specify reset=true or reset=false, the rules and patterns that are not to be kept are marked as pruned.

predict

AssociationRules.predict(idadf, outtable=None, transaction_id=None, item_id=None, type=u'rules', limit=1, sort=None)[source]

Apply the rules and patterns of an association rules model to other transactions. You can apply all rules or only specific rules according to specified criteria.

Parameters:

idadf : IdaDataFrame

IdaDataFrame to be used as input.

outtable : str, optional

The name of the output table in which the mapping between the input sequences and the associated rules or patterns is written. If the parameter corresponds to an existing table in the database, it is replaced.

transaction_id : str, optional

The column of the input table that identifies the transaction ID. By default, this is the same tid column that is specified in the stored procedure to build the model.

item_id : str, optional

The column of the input table that identifies an item of the transaction. By default, this is the same item column that is specified in the stored procedure to build the model.

type : str, optional, default

The type of information that is written in the output table. The following values are possible: ‘rules’ and ‘patterns’.

limit : int, optional, >=1, default: 1

The maximum number of rules or patterns that is written in the output table for each input sequence.

sort : str or list, optional

A list of keywords that indicates the order in which the rules or patterns are written in the output table. The order of the list is descending. The items are separated by semicolons. The following values are possible: ‘support’, ‘confidence’, ‘lift’, and ‘length’. The ‘confidence’ value can only be specified if the type parameter is ‘rules’. If the type parameter is ‘rules’, the default is: support;confidence;length. If the type parameter is ‘patterns’, the default is: support;lift;length.

Notes

When “type” is set to “rules”, it looks like nothing is returned.

fit_predict

AssociationRules.fit_predict(idadf, transaction_id, item_id, nametable=None, namecol=None, outtable=None, type=u'rules', limit=1, sort=None, verbose=False)[source]

Convenience function for fitting the model and using it to make predictions about the same dataset. See the fit and predict documentation for an explanation about their attributes.

Notes

If you use this function, you are not able to use the prune step between the fit and the predict step. However, you can still prune afterwards and reuse the predict function.

Explore result

describe

AssociationRules.describe()[source]

Return a description of Association Rules Model.

_retrieve_AssociationRules_Model

AssociationRules._retrieve_AssociationRules_Model(modelname, verbose=False)[source]

Retrieve information about the model to print the results. The Association Rules IDAX function stores its result in 4 tables:

  • <MODELNAME>_ASSOCPATTERNS
  • <MODELNAME>_ASSOCPATTERNS_STATISTICS
  • <MODELNAME>_ASSOCRULES
  • <MODELNAME>_ITEMS
Parameters:

modelname : str

The name of the model that is retrieved.

verbose : bol, default: False

Verbosity mode.

Notes

Needs better formatting instead of printing the tables