Table Of Contents

Frame bin_column


bin_column(self, column_name, cutoffs, include_lowest=None, strict_binning=None, bin_column_name=None)

Classify data into user-defined groups.

Parameters:

column_name : unicode

Name of the column to bin.

cutoffs : list

Array of values containing bin cutoff points. Array can be list or tuple. Array values must be progressively increasing. All bin boundaries must be included, so, with N bins, you need N+1 values.

include_lowest : bool (default=None)

Specify how the boundary conditions are handled. True indicates that the lower bound of the bin is inclusive. False indicates that the upper bound is inclusive. Default is True.

strict_binning : bool (default=None)

Specify how values outside of the cutoffs array should be binned. If set to True, each value less than cutoffs[0] or greater than cutoffs[-1] will be assigned a bin value of -1. If set to False, values less than cutoffs[0] will be included in the first bin while values greater than cutoffs[-1] will be included in the final bin.

bin_column_name : unicode (default=None)

The name for the new binned column. Default is <column_name>_binned.

Summarize rows of data based on the value in a single column by sorting them into bins, or groups, based on a list of bin cutoff points.

Notes

  1. Unicode in column names is not supported and will likely cause the drop_frames() method (and others) to fail!
  2. Bins IDs are 0-index, in other words, the lowest bin number is 0.
  3. The first and last cutoffs are always included in the bins. When include_lowest is True, the last bin includes both cutoffs. When include_lowest is False, the first bin (bin 0) includes both cutoffs.

Examples

For these examples, we will use a frame with column a accessed by a Frame object my_frame:

>>> my_frame.inspect( n=11 )
[##]  a
========
[0]    1
[1]    1
[2]    2
[3]    3
[4]    5
[5]    8
[6]   13
[7]   21
[8]   34
[9]   55
[10]  89

Modify the frame with a column showing what bin the data is in. The data values should use strict_binning:

>>> my_frame.bin_column('a', [5,12,25,60], include_lowest=True,
... strict_binning=True, bin_column_name='binned')
[===Job Progress===]
>>> my_frame.inspect( n=11 )
[##]  a   binned
================
[0]    1      -1
[1]    1      -1
[2]    2      -1
[3]    3      -1
[4]    5       0
[5]    8       0
[6]   13       1
[7]   21       1
[8]   34       2
[9]   55       2
[10]  89      -1

Modify the frame with a column showing what bin the data is in. The data value should not use strict_binning:

>>> my_frame.bin_column('a', [5,12,25,60], include_lowest=True,
... strict_binning=False, bin_column_name='binned')
[===Job Progress===]
>>> my_frame.inspect( n=11 )
[##]  a   binned
================
[0]    1       0
[1]    1       0
[2]    2       0
[3]    3       0
[4]    5       0
[5]    8       0
[6]   13       1
[7]   21       1
[8]   34       2
[9]   55       2
[10]  89       2

Modify the frame with a column showing what bin the data is in. The bins should be lower inclusive:

>>> my_frame.bin_column('a', [1,5,34,55,89], include_lowest=True,
... strict_binning=False, bin_column_name='binned')
[===Job Progress===]
>>> my_frame.inspect( n=11 )
[##]  a   binned
================
[0]    1       0
[1]    1       0
[2]    2       0
[3]    3       0
[4]    5       1
[5]    8       1
[6]   13       1
[7]   21       1
[8]   34       2
[9]   55       3
[10]  89       3

Modify the frame with a column showing what bin the data is in. The bins should be upper inclusive:

>>> my_frame.bin_column('a', [1,5,34,55,89], include_lowest=False,
... strict_binning=True, bin_column_name='binned')
[===Job Progress===]
>>> my_frame.inspect( n=11 )
[##]  a   binned
================
[0]    1       0
[1]    1       0
[2]    2       0
[3]    3       0
[4]    5       0
[5]    8       1
[6]   13       1
[7]   21       1
[8]   34       1
[9]   55       2
[10]  89       3