KMeansModel train¶

train(self, frame, observation_columns, column_scalings, k=2, max_iterations=20, epsilon=0.0001, initialization_mode='k-means||')¶

[BETA] Creates KMeans Model from train frame.

Parameters:

Parameters:	frame : Frame A frame to train the model on. observation_columns : list Columns containing the observations. column_scalings : list Column scalings for each of the observation columns. The scaling value is multiplied by the corresponding value in the observation column. k : int32 (default=2) Desired number of clusters. Default is 2. max_iterations : int32 (default=20) Number of iterations for which the algorithm should run. Default is 20. epsilon : float64 (default=0.0001) Distance threshold within which we consider k-means to have converged. Default is 1e-4. If all centers move less than this Euclidean distance, we stop iterating one run. initialization_mode : unicode (default=k-means\|\|) The initialization technique for the algorithm. It could be either “random” to choose random points as initial clusters, or “k-means\|\|” to use a parallel variant of k-means++. Default is “k-means\|\|”.
Returns:	: dict dictionary A dictionary with trained KMeans model with the following keys: ‘cluster_size’ : dictionary with ‘Cluster:id’ as the key and the corresponding cluster size is the value ‘within_set_sum_of_squared_error’ : The set of sum of squared error for the model.

frame : Frame

A frame to train the model on.

observation_columns : list

Columns containing the observations.

column_scalings : list

Column scalings for each of the observation columns. The scaling value is multiplied by the corresponding value in the observation column.

k : int32 (default=2)

Desired number of clusters. Default is 2.

max_iterations : int32 (default=20)

Number of iterations for which the algorithm should run. Default is 20.

epsilon : float64 (default=0.0001)

Distance threshold within which we consider k-means to have converged. Default is 1e-4. If all centers move less than this Euclidean distance, we stop iterating one run.

initialization_mode : unicode (default=k-means||)

The initialization technique for the algorithm. It could be either “random” to choose random points as initial clusters, or “k-means||” to use a parallel variant of k-means++. Default is “k-means||”.

Returns:

: dict

dictionary

A dictionary with trained KMeans model with the following keys:

‘cluster_size’ : dictionary with ‘Cluster:id’ as the key and the corresponding cluster size is the value ‘within_set_sum_of_squared_error’ : The set of sum of squared error for the model.

Creating a KMeans Model using the observation columns.

Examples

See here for examples.

Quick search

Table Of Contents

KMeansModel train¶