Association Rules Mining¶
In-Database Association Rules Mining
Initiate model¶
Create an AssociationRules object¶
-
class
ibmdbpy.learn.association_rules.
AssociationRules
(modelname=None, minsupport=None, maxlen=5, maxheadlen=1, minconf=0.5)[source]¶ Association rules mining can be used to discover interesting and useful relations between items in a large-scale transaction table. You can identify strong rules between related items by using different measures of relevance. Apriori or FP-Growth are well-known algorithms for association rules mining. For analytic stored procedures, the PrefixSpan algorithm is preferred due to its scalability.
The AssociationRules class provides an interface for using the ASSOCRULES amd PREDICT_ASSOCRULES IDAX methods of dashDB/DB2.
-
__init__
(modelname=None, minsupport=None, maxlen=5, maxheadlen=1, minconf=0.5)[source]¶ Constructor for association rules model
Parameters: modelname : str
The name of the Association Rules model that is built. If the parameter corresponds to an existing model in the database, it will be replaced during the fitting step.
minsupport : float or integer, optional
The minimum fraction (0.0 - 1.0) or the minimum number (above 1) of transactions that must contain a pattern to be considered as frequent. Default: system-determined Range: >0.0 and <1.0 for a minimum fraction
>1 for a minimum number of transactions.
maxlen : int, optional, >=2, default: 5
The maximum length of a pattern or a rule, that is, the maximum number of items per pattern or rule.
maxheadlen : int, optional, >= 1 and <maxlen, default: 1
The maximum length of a rule head, that is, the maximum number of items that might belong to the item set on the right side of a rule. Increasing this value might significantly increase the number of detected rules.
minconf : float, optional, >=0.0 and <= 1, default: 0.5
The minimum confidence that a rule must achieve to be kept in the model of the pattern.
Returns: The AssociationRules object, ready to be used for fitting and prediction
Notes
Inner parameters of the model can be printed and modified by using get_params and set_params. But we recommend creating a new AssociationRules model instead of modifying it.
Examples
>>> idadb = IdaDataBase("BLUDB-TEST") >>> idadf = IdaDataFrame(idadb, "GROCERIES") >>> arules = AssociationRules("ASSOCRULES_TEST") >>> arules.fit(idadf, transaction_id = "TID", item_id = "SID")
Attributes
TODO
-
Get parameters¶
Fit, prune and predict¶
fit¶
-
AssociationRules.
fit
(idadf, transaction_id, item_id, nametable=None, namecol=None, verbose=False)[source]¶ Create an Association Rules model from an IdaDataFrame.
Parameters: idadf : IdaDataFrame
The IdaDataFrame to be used as input.
transaction_id : str
The column of the input table that identifies the transaction ID.
item_id : str
The column of the input table that identifies an item of the transaction.
nametable : str, optional
The table that contains a mapping of the items in the input table and their names. The table must contain at least two columns, where * The first column has the same name as the column that is contained in the item parameter of the input table * The second column has the same name as the name that is defined in the namecol parameter
namecol : str, optional
The column that contains the item name that is defined in the nametable parameter. You cannot specify this parameter if the nametable parameter is not specified.
prune¶
-
AssociationRules.
prune
(itemsin=None, itemsout=None, minlen=1, maxlen=None, minsupport=0, maxsupport=1, minlift=None, maxlift=None, minconf=None, maxconf=None, reset=False)[source]¶ Prune the rules and patterns of an association rules model. To remove rules and pattern which you are not interested in, you can use filters to exclude these rules and patterns. These rules and patterns are then marked as not valid in the model and are no longer shown.
Parameters: itemsin : str or list, optional
A list of item names that must be contained in the rules or patterns to be kept. The items are separated by semicolons. At least one of the listed items must be contained in a rule or pattern to be kept. For rules, the following conditions apply:
- To indicate that the item must be contained in the head of
then rule, the item names can be succeeded by :h or :head. * To indicate that the item must be contained in the body of the rule, the item names can be succeeded by :b or :body
If this parameter is not specified, no constraint is applied.
itemsout : str or list, optional
A list of item names that must not be contained in the rules or patterns to be kept. The items are separated by semicolons. If this parameter is not specified, no constraint is applied.
minlen : int, optional, >=1, default: 1
The minimum number of items that are to be kept in the rules or patterns.
maxlen : int, optional, >=1, default: the longest pattern of the model
The maximum number of items that are to be kept in the rules or patterns.
minsupport : float, optional, >=0.0 and <=maxsupport, default
The minimum support for the rules or patterns that are to be kept.
maxsupport : float, optional, >=minsupport and <=1.0, default
The maximum support for the rules or patterns that are to be kept.
minlift : float, optional, >=0.0 and <=maxlift, defaukt
The minimum lift of the rules or patterns that are to be kept.
maxlift : float, optional, >=minlift, default: the maximum lift of the patterns of the model
The maximum lift of the rules or patterns that are to be kept.
minconf : float, optional, >=0.0 and <= maxconf, default
The minimum confidence of the rules that are to be kept.
maxconf : float, optional, >=minconf and <= 1.0, default
The maximum confidence of the rules that are to be kept.
reset : bool, optional, default: false
If you specify reset=true, all rules and patterns are first reset to not pruned. If you specify reset=true or reset=false, the rules and patterns that are not to be kept are marked as pruned.
predict¶
-
AssociationRules.
predict
(idadf, outtable=None, transaction_id=None, item_id=None, type=u'rules', limit=1, sort=None)[source]¶ Apply the rules and patterns of an association rules model to other transactions. You can apply all rules or only specific rules according to specified criteria.
Parameters: idadf : IdaDataFrame
IdaDataFrame to be used as input.
outtable : str, optional
The name of the output table in which the mapping between the input sequences and the associated rules or patterns is written. If the parameter corresponds to an existing table in the database, it is replaced.
transaction_id : str, optional
The column of the input table that identifies the transaction ID. By default, this is the same tid column that is specified in the stored procedure to build the model.
item_id : str, optional
The column of the input table that identifies an item of the transaction. By default, this is the same item column that is specified in the stored procedure to build the model.
type : str, optional, default
The type of information that is written in the output table. The following values are possible: ‘rules’ and ‘patterns’.
limit : int, optional, >=1, default: 1
The maximum number of rules or patterns that is written in the output table for each input sequence.
sort : str or list, optional
A list of keywords that indicates the order in which the rules or patterns are written in the output table. The order of the list is descending. The items are separated by semicolons. The following values are possible: ‘support’, ‘confidence’, ‘lift’, and ‘length’. The ‘confidence’ value can only be specified if the type parameter is ‘rules’. If the type parameter is ‘rules’, the default is: support;confidence;length. If the type parameter is ‘patterns’, the default is: support;lift;length.
Notes
When “type” is set to “rules”, it looks like nothing is returned.
fit_predict¶
-
AssociationRules.
fit_predict
(idadf, transaction_id, item_id, nametable=None, namecol=None, outtable=None, type=u'rules', limit=1, sort=None, verbose=False)[source]¶ Convenience function for fitting the model and using it to make predictions about the same dataset. See the fit and predict documentation for an explanation about their attributes.
Notes
If you use this function, you are not able to use the prune step between the fit and the predict step. However, you can still prune afterwards and reuse the predict function.
Explore result¶
_retrieve_AssociationRules_Model¶
-
AssociationRules.
_retrieve_AssociationRules_Model
(modelname, verbose=False)[source]¶ Retrieve information about the model to print the results. The Association Rules IDAX function stores its result in 4 tables:
- <MODELNAME>_ASSOCPATTERNS
- <MODELNAME>_ASSOCPATTERNS_STATISTICS
- <MODELNAME>_ASSOCRULES
- <MODELNAME>_ITEMS
Parameters: modelname : str
The name of the model that is retrieved.
verbose : bol, default: False
Verbosity mode.
Notes
Needs better formatting instead of printing the tables