PowerIterationClusteringModel __init__¶
-
__init__
(self, name=None)¶ Create a ‘new’ instance of a PowerIterationClustering model.
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of PowerIterationClustering Model
Power Iteration Clustering [R41] is a scalable and efficient algorithm for clustering vertices of a graph given pairwise similarities as edge properties. A Power Iteration Clustering model is initialized and the cluster assignments of the vertices can be predicted on specifying the source column, destination column and similarity column of the given frame.
footnotes
[R41] http://www.cs.cmu.edu/~wcohen/postscript/icm12010-pic-final.pdf [R42] https://spark.apache.org/docs/1.5.0/mllib-clustering.html#power-iteration-clustering-pic Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing three columns denoting the source vertex, destination vertex and corresponding similarity.
>>> frame.inspect() [#] Source Destination Similarity ==================================== [0] 1 2 1.0 [1] 1 3 0.3 [2] 2 3 0.3 [3] 3 0 0.03 [4] 0 5 0.01 [5] 5 4 0.3 [6] 5 6 1.0 [7] 4 6 0.3
>>> model = ta.PowerIterationClusteringModel() [===Job Progress===] >>> predict_output = model.predict(frame, 'Source', 'Destination', 'Similarity', k=3) [===Job Progress===] >>> predict_output['predicted_frame'].inspect() [#] id cluster ================ [0] 4 3 [1] 0 2 [2] 1 1 [3] 6 1 [4] 3 3 [5] 5 1 [6] 2 1
>>> predict_output['cluster_size'] {u'Cluster:1': 4, u'Cluster:2': 1, u'Cluster:3': 2} >>> predict_output['number_of_clusters'] 3