Table Of Contents

Frame histogram


histogram(self, column_name, num_bins=None, weight_column_name=None, bin_type='equalwidth')

[BETA] Compute the histogram for a column in a frame.

Parameters:

column_name : unicode

Name of column to be evaluated.

num_bins : int32 (default=None)

Number of bins in histogram. Default is Square-root choice will be used (in other words math.floor(math.sqrt(frame.row_count)).

weight_column_name : unicode (default=None)

Name of column containing weights. Default is all observations are weighted equally.

bin_type : unicode (default=equalwidth)

The type of binning algorithm to use: [“equalwidth”|”equaldepth”] Defaults is “equalwidth”.

Returns:

: dict

histogram

A Histogram object containing the result set. The data returned is composed of multiple components:

cutoffs : array of float

A list containing the edges of each bin.

hist : array of float

A list containing count of the weighted observations found in each bin.

density : array of float

A list containing a decimal containing the percentage of observations found in the total set per bin.

Compute the histogram of the data in a column. The returned value is a Histogram object containing 3 lists one each for: the cutoff points of the bins, size of each bin, and density of each bin.

Notes

The num_bins parameter is considered to be the maximum permissible number of bins because the data may dictate fewer bins. With equal depth binning, for example, if the column to be binned has 10 elements with only 2 distinct values and the num_bins parameter is greater than 2, then the number of actual number of bins will only be 2. This is due to a restriction that elements with an identical value must belong to the same bin.

Examples

Consider the following sample data set:

>>> frame.inspect()
    [#]  a  b
    =========
    [0]  a  2
    [1]  b  7
    [2]  c  3
    [3]  d  9
    [4]  e  1

A simple call for 3 equal-width bins gives:

>>> hist = frame.histogram("b", num_bins=3)
[===Job Progress===]

>>> print hist
Histogram:
cutoffs: [1.0, 3.6666666666666665, 6.333333333333333, 9.0],
hist: [3.0, 0.0, 2.0],
density: [0.6, 0.0, 0.4]

Switching to equal depth gives:

>>> hist = frame.histogram("b", num_bins=3, bin_type='equaldepth')
[===Job Progress===]

>>> print hist
Histogram:
cutoffs: [1.0, 2.0, 7.0, 9.0],
hist: [1.0, 2.0, 2.0],
density: [0.2, 0.4, 0.4]
Plot hist as a bar chart using matplotlib: