Models PowerIterationClusteringModel¶
-
class
PowerIterationClusteringModel
¶ Entity PowerIterationClusteringModel
Attributes
last_read_date Read-only property - Last time this model’s data was accessed. name Set or get the name of the model object. status Read-only property - Current model life cycle status. Methods
__init__(self[, name, _info]) Create a ‘new’ instance of a PowerIterationClustering model. predict(self, frame, source_column, destination_column, similarity_column) Predict the clusters to which the nodes belong to
-
__init__
(self, name=None)¶ Create a ‘new’ instance of a PowerIterationClustering model.
Parameters: name : unicode (default=None)
User supplied name.
Returns: : Model
A new instance of PowerIterationClustering Model
Power Iteration Clustering [R39] is a scalable and efficient algorithm for clustering vertices of a graph given pairwise similarities as edge properties. A Power Iteration Clustering model is initialized and the cluster assignments of the vertices can be predicted on specifying the source column, destination column and similarity column of the given frame.
footnotes
[R39] http://www.cs.cmu.edu/~wcohen/postscript/icm12010-pic-final.pdf [R40] https://spark.apache.org/docs/1.5.0/mllib-clustering.html#power-iteration-clustering-pic Examples
Consider the following model trained and tested on the sample data set in frame ‘frame’.
Consider the following frame containing three columns denoting the source vertex, destination vertex and corresponding similarity.
>>> frame.inspect() [#] Source Destination Similarity ==================================== [0] 1 2 1.0 [1] 1 3 0.3 [2] 2 3 0.3 [3] 3 0 0.03 [4] 0 5 0.01 [5] 5 4 0.3 [6] 5 6 1.0 [7] 4 6 0.3
>>> model = ta.PowerIterationClusteringModel() [===Job Progress===] >>> predict_output = model.predict(frame, 'Source', 'Destination', 'Similarity', k=3) [===Job Progress===] >>> predict_output['predicted_frame'].inspect() [#] id cluster ================ [0] 4 3 [1] 0 2 [2] 1 1 [3] 6 1 [4] 3 3 [5] 5 1 [6] 2 1
>>> predict_output['cluster_size'] {u'Cluster:1': 4, u'Cluster:2': 1, u'Cluster:3': 2} >>> predict_output['number_of_clusters'] 3