Table Of Contents

PowerIterationClusteringModel __init__


__init__(self, name=None)

Create a ‘new’ instance of a PowerIterationClustering model.

Parameters:

name : unicode (default=None)

User supplied name.

Returns:

: Model

A new instance of PowerIterationClustering Model

Power Iteration Clustering [R41] is a scalable and efficient algorithm for clustering vertices of a graph given pairwise similarities as edge properties. A Power Iteration Clustering model is initialized and the cluster assignments of the vertices can be predicted on specifying the source column, destination column and similarity column of the given frame.

footnotes

[R41]http://www.cs.cmu.edu/~wcohen/postscript/icm12010-pic-final.pdf
[R42]https://spark.apache.org/docs/1.5.0/mllib-clustering.html#power-iteration-clustering-pic

Examples

Consider the following model trained and tested on the sample data set in frame ‘frame’.

Consider the following frame containing three columns denoting the source vertex, destination vertex and corresponding similarity.

>>> frame.inspect()
[#]  Source  Destination  Similarity
====================================
[0]       1            2         1.0
[1]       1            3         0.3
[2]       2            3         0.3
[3]       3            0        0.03
[4]       0            5        0.01
[5]       5            4         0.3
[6]       5            6         1.0
[7]       4            6         0.3
>>> model = ta.PowerIterationClusteringModel()
[===Job Progress===]
>>> predict_output = model.predict(frame, 'Source', 'Destination', 'Similarity', k=3)
[===Job Progress===]
>>> predict_output['predicted_frame'].inspect()
[#]  id  cluster
================
[0]   4        3
[1]   0        2
[2]   1        1
[3]   6        1
[4]   3        3
[5]   5        1
[6]   2        1
>>> predict_output['cluster_size']
{u'Cluster:1': 4, u'Cluster:2': 1, u'Cluster:3': 2}
>>> predict_output['number_of_clusters']
3