Table Of Contents

Frame entropy


entropy(self, data_column, weights_column=None)

Calculate the Shannon entropy of a column.

Parameters:

data_column : unicode

The column whose entropy is to be calculated.

weights_column : unicode (default=None)

The column that provides weights (frequencies) for the entropy calculation. Must contain numerical data. Default is using uniform weights of 1 for all items.

Returns:

: float64

Entropy.

The data column is weighted via the weights column. All data elements of weight <= 0 are excluded from the calculation, as are all data elements whose weight is NaN or infinite. If there are no data elements with a finite weight greater than 0, the entropy is zero.

Examples

Consider the following sample data set in frame ‘frame’ containing several numbers.

Given a frame of coin flips, half heads and half tails, the entropy is simply ln(2):

>>> frame.inspect()
[#]  data  weight
=================
[0]     0       1
[1]     1       2
[2]     2       4
[3]     4       8
>>> entropy = frame.entropy("data", "weight")
[===Job Progress===]
>>> "%0.8f" % entropy
'1.13691659'

If we have more choices and weights, the computation is not as simple. An on-line search for “Shannon Entropy” will provide more detail.

Given a frame of coin flips, half heads and half tails, the entropy is simply ln(2):

>>> frame.inspect()
[#]  data
=========
[0]  H
[1]  T
[2]  H
[3]  T
[4]  H
[5]  T
[6]  H
[7]  T
[8]  H
[9]  T
>>> entropy = frame.entropy("data")
[===Job Progress===]
>>> "%0.8f" % entropy
'0.69314718'