ML.Frequent Pattern Mining

class ddf_library.functions.ml.fpm.AssociationRules(confidence=0.5, max_rules=-1)

Bases: ddf_library.bases.ddf_model.ModelDDF

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases.

Example:
>>> rules = AssociationRules(confidence=0.10).fit_transform(item_set)

Setup all AssociationsRules’s parameters.

Parameters:
  • confidence – Minimum confidence (default is 0.5);
  • max_rules – Maximum number of output rules, -1 to all (default).
check_fitted_model()
fit_transform(data, col_item='items', col_freq='support')

Fit the model.

Parameters:
  • data – DDF;
  • col_item – Column with the frequent item set (default, ‘items’);
  • col_freq – Column with its support (default, ‘support’);
Returns:

DDF with ‘Pre-Rule’, ‘Post-Rule’ and ‘confidence’ columns.

load_model(filepath)

Load a machine learning model from a binary file in a storage.

Parameters:filepath – The absolute path name;
Returns:self
Example:
>>> ml_model = Kmeans().load_model('hdfs://localhost:9000/model')
save_model(filepath, overwrite=True)

Save a machine learning model as a binary file in a storage.

Parameters:
  • filepath – The output absolute path name;
  • overwrite – Overwrite if file already exists (default, True);
Returns:

self

Example:
>>> cls = KMeans().fit(dataset, input_col=['col1', 'col2'])
>>> cls.save_model('hdfs://localhost:9000/trained_model')
set_max_rules(count)
set_min_confidence(confidence)
class ddf_library.functions.ml.fpm.FPGrowth(min_support=0.5)

Bases: ddf_library.bases.ddf_model.ModelDDF

FPGrowth implements the FP-growth algorithm described in the paper LI et al., Mining frequent patterns without candidate generation, where “FP” stands for frequent pattern. Given a data set of transactions, the first step of FP-growth is to calculate item frequencies and identify frequent items.

LI, Haoyuan et al. Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM conference on Recommender systems. ACM, 2008. p. 107-114.

Example:
>>> fp = FPGrowth(min_support=0.10)
>>> item_set = fp.fit_transform(ddf1, column='col_0')

Setup all FPGrowth’s parameters.

Parameters:min_support – minimum support value.
check_fitted_model()
fit_transform(data, input_col)

Fit the model and transform the data.

Parameters:
  • data – DDF;
  • input_col – Transactions feature name;
Returns:

DDF

load_model(filepath)

Load a machine learning model from a binary file in a storage.

Parameters:filepath – The absolute path name;
Returns:self
Example:
>>> ml_model = Kmeans().load_model('hdfs://localhost:9000/model')
save_model(filepath, overwrite=True)

Save a machine learning model as a binary file in a storage.

Parameters:
  • filepath – The output absolute path name;
  • overwrite – Overwrite if file already exists (default, True);
Returns:

self

Example:
>>> cls = KMeans().fit(dataset, input_col=['col1', 'col2'])
>>> cls.save_model('hdfs://localhost:9000/trained_model')