Feature List Analysis

Canberra Indicator

Canberra stability indicator on top-k positions [Jurman08]

mlpy.canberra(lists, k, modules=None)

Compute mean Canberra distance indicator on top-k sublists.

Input

  • lists - [2D numpy array integer] position lists Positions must be in [0, #elems-1]
  • k - [integer] top-k sublists
  • modules - [list] modules (list of group indicies)

Output

  • cd - [float] canberra distance
>>> from numpy import *
>>> from mlpy import *
>>> lists = array([[2,4,1,3,0],  # first positions list
...                [3,4,1,2,0],  # second positions list
...                [2,4,3,0,1],  # third positions list
...                [0,1,4,2,3]]) # fourth positions list
>>> canberra(lists, 3)
1.0861983059292479
mlpy.canberraq(lists, complete=True, normalize=False)

Compute mean Canberra distance indicator on generic lists.

Input

  • lists - [2D numpy array integer] position lists Positions must be in [-1, #elems-1], where -1 indicates features not present in the list
  • complete - [bool] complete
  • normalize - [bool] normalize

Output

  • cd - [float] canberra distance
>>> from numpy import *
>>> from mlpy import *
>>> lists = array([[2,-1,1,-1,0],  # first positions list
...                [3,4,1,2,0],    # second positions list
...                [2,-1,3,0,1],   # third positions list
...                [0,1,4,2,3]])   # fourth positions list
>>> canberraq(lists)
1.0628570368721744
mlpy.normalizer(lists)

Compute the average length of the partial lists (nm) and the corresponding normalizing factor (nf) given by 1 - a / b where a is the exact value computed on the average length and b is the exact value computed on the whole set of features.

Inputs

  • lists - [2D numpy array integer] position lists Positions must be in [-1, #elems-1], where -1 indicates features not present in the list

Output

  • (nm, nf) - (float, float)

Borda Count, Extraction Indicator, Mean Position Indicator

Borda Count [Borda1781]

mlpy.borda(lists, k, modules=None)

Compute the number of extractions on top-k sublists and the mean position on lists for each element. Sort the element ids with decreasing number of extractions, AND element ids with equal number of extractions should be sorted with increasing mean positions.

Input

  • lists - [2D numpy array integer] ranked feature-id lists. Feature-id must be in [0, #elems-1].
  • k - [integer] on top-k sublists
  • modules - [list] modules (list of group indicies)

Output

  • borda - (feature-id, number of extractions, mean positions)

Example:

>>> from numpy import *
>>> from mlpy import *
>>> lists = array([[2,4,1,3,0],  # first ranked feature-id list
...                [3,4,1,2,0],  # second ranked feature-id list
...                [2,4,3,0,1],  # third ranked feature-id list
...                [0,1,4,2,3]]) # fourth ranked feature-id list
>>> borda(lists, 3)
(array([4, 1, 2, 3, 0]), array([4, 3, 2, 2, 1]), array([ 1.25      ,  1.66666667,  0.        ,  1.        ,  0.        ]))
  • Element 4 is in the first position with 4 extractions and mean position 1.25.
  • Element 1 is in the first position with 3 extractions and mean position 1.67.
  • Element 2 is in the first position with 2 extractions and mean position 0.00.
  • Element 3 is in the first position with 2 extractions and mean position 1.00.
  • Element 0 is in the first position with 1 extractions and mean position 0.00.
[Jurman08]G Jurman, S Merler, A Barla, S Paoli, A Galea, and C Furlanello. Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics, 24(2):258-264, 2008.
[Borda1781]J C Borda. Mémoire sur les élections au scrutin. Histoire de l’Académie Royale des Sciences, 1781.

Table Of Contents

Previous topic

Metric Functions

Next topic

Data Management

This Page