Models PowerIterationClusteringModel


class PowerIterationClusteringModel

Entity PowerIterationClusteringModel

Attributes

last_read_date Read-only property - Last time this model’s data was accessed.
name Set or get the name of the model object.
status Read-only property - Current model life cycle status.

Methods

__init__(self[, name, _info]) Create a ‘new’ instance of a PowerIterationClustering model.
predict(self, frame, source_column, destination_column, similarity_column) Predict the clusters to which the nodes belong to
__init__(self, name=None)

Create a ‘new’ instance of a PowerIterationClustering model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of PowerIterationClustering Model

Power Iteration Clustering [R39] is a scalable and efficient algorithm for clustering vertices of a graph given pairwise similarities as edge properties. A Power Iteration Clustering model is initialized and the cluster assignments of the vertices can be predicted on specifying the source column, destination column and similarity column of the given frame.

footnotes

[R39]http://www.cs.cmu.edu/~wcohen/postscript/icm12010-pic-final.pdf
[R40]https://spark.apache.org/docs/1.5.0/mllib-clustering.html#power-iteration-clustering-pic

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns denoting the source vertex, destination vertex and corresponding similarity.

>>> frame.inspect()
[#]  Source  Destination  Similarity
====================================
[0]       1            2         1.0
[1]       1            3         0.3
[2]       2            3         0.3
[3]       3            0        0.03
[4]       0            5        0.01
[5]       5            4         0.3
[6]       5            6         1.0
[7]       4            6         0.3
>>> model = ta.PowerIterationClusteringModel()
[===Job Progress===]
>>> predict_output = model.predict(frame, 'Source', 'Destination', 'Similarity', k=3)
[===Job Progress===]
>>> predict_output['predicted_frame'].inspect()
[#]  id  cluster
================
[0]   4        3
[1]   0        2
[2]   1        1
[3]   6        1
[4]   3        3
[5]   5        1
[6]   2        1
>>> predict_output['cluster_size']
{u'Cluster:1': 4, u'Cluster:2': 1, u'Cluster:3': 2}
>>> predict_output['number_of_clusters']
3