Clustering

Hierarchical Clustering

Hierarchical Clustering algorithm derived from the R package ‘amap’ [Amap].

class mlpy.HCluster(method='euclidean', link='complete')

Hierarchical Cluster.

Initialize Hierarchical Cluster.

Input

  • method - [string] the distance measure to be used
    • ‘euclidean’
  • link - [string] the agglomeration method to be used
    • ‘single’
    • ‘complete’
    • ‘mcquitty’
    • ‘median’

Example:

>>> import numpy as np
>>> import mlpy
>>> data = np.array([[1.0, 1.1, 2.0, 3.2, 3.4],
...                  [1.5, 1.8, 2.8, 3.1, 3.2]])
>>> hc = mlpy.HCluster()
>>> hc.compute(data)
>>> hc.ia
array([-4, -1, -3,  2])
>>> hc.ib
array([-5, -2,  1,  3])
>>> hc.heights
array([ 0.2236068 ,  0.31622776,  1.4560219 ,  2.94108844])
>>> hc.cut(0.5)
array([1, 1, 2, 3, 3])
compute(x)

Compute Hierarchical Cluster.

Input

  • x - [2D numpy array float] (feature x sample) input data

Output

  • self.ia - [1D numpy array float] merge

  • self.ib - [1D numpy array float] merge

    Element i of merge describes the merging of clusters at step i of the clustering. If an element j is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.

  • self.heights - [1D numpy array float] a set of n-1 non-decreasing real values. The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.

cut(ht)

Cuts the tree into several groups by specifying the cut height.

Input

  • ht - [float] height where the tree should be cut

Output

  • gm - [1D numpy array integer] group memberships. Groups are in 1, ..., N
[Amap]amap: Another Multidimensional Analysis Package, http://cran.r-project.org/web/packages/amap/index.html

Table Of Contents

Previous topic

Preprocessing

Next topic

Classification

This Page