Frame bin_column¶
-
bin_column
(self, column_name, cutoffs, include_lowest=None, strict_binning=None, bin_column_name=None)¶ Classify data into user-defined groups.
Parameters: column_name : unicode
Name of the column to bin.
cutoffs : list
Array of values containing bin cutoff points. Array can be list or tuple. Array values must be progressively increasing. All bin boundaries must be included, so, with N bins, you need N+1 values.
include_lowest : bool (default=None)
Specify how the boundary conditions are handled.
True
indicates that the lower bound of the bin is inclusive.False
indicates that the upper bound is inclusive. Default isTrue
.strict_binning : bool (default=None)
Specify how values outside of the cutoffs array should be binned. If set to
True
, each value less than cutoffs[0] or greater than cutoffs[-1] will be assigned a bin value of -1. If set toFalse
, values less than cutoffs[0] will be included in the first bin while values greater than cutoffs[-1] will be included in the final bin.bin_column_name : unicode (default=None)
The name for the new binned column. Default is
<column_name>_binned
.Summarize rows of data based on the value in a single column by sorting them into bins, or groups, based on a list of bin cutoff points.
Notes
- Unicode in column names is not supported and will likely cause the drop_frames() method (and others) to fail!
- Bins IDs are 0-index, in other words, the lowest bin number is 0.
- The first and last cutoffs are always included in the bins.
When include_lowest is
True
, the last bin includes both cutoffs. When include_lowest isFalse
, the first bin (bin 0) includes both cutoffs.
Examples
For these examples, we will use a frame with column a accessed by a Frame object my_frame:
>>> my_frame.inspect( n=11 ) [##] a ======== [0] 1 [1] 1 [2] 2 [3] 3 [4] 5 [5] 8 [6] 13 [7] 21 [8] 34 [9] 55 [10] 89
Modify the frame with a column showing what bin the data is in. The data values should use strict_binning:
>>> my_frame.bin_column('a', [5,12,25,60], include_lowest=True, ... strict_binning=True, bin_column_name='binned') [===Job Progress===] >>> my_frame.inspect( n=11 ) [##] a binned ================ [0] 1 -1 [1] 1 -1 [2] 2 -1 [3] 3 -1 [4] 5 0 [5] 8 0 [6] 13 1 [7] 21 1 [8] 34 2 [9] 55 2 [10] 89 -1
Modify the frame with a column showing what bin the data is in. The data value should not use strict_binning:
>>> my_frame.bin_column('a', [5,12,25,60], include_lowest=True, ... strict_binning=False, bin_column_name='binned') [===Job Progress===] >>> my_frame.inspect( n=11 ) [##] a binned ================ [0] 1 0 [1] 1 0 [2] 2 0 [3] 3 0 [4] 5 0 [5] 8 0 [6] 13 1 [7] 21 1 [8] 34 2 [9] 55 2 [10] 89 2
Modify the frame with a column showing what bin the data is in. The bins should be lower inclusive:
>>> my_frame.bin_column('a', [1,5,34,55,89], include_lowest=True, ... strict_binning=False, bin_column_name='binned') [===Job Progress===] >>> my_frame.inspect( n=11 ) [##] a binned ================ [0] 1 0 [1] 1 0 [2] 2 0 [3] 3 0 [4] 5 1 [5] 8 1 [6] 13 1 [7] 21 1 [8] 34 2 [9] 55 3 [10] 89 3
Modify the frame with a column showing what bin the data is in. The bins should be upper inclusive:
>>> my_frame.bin_column('a', [1,5,34,55,89], include_lowest=False, ... strict_binning=True, bin_column_name='binned') [===Job Progress===] >>> my_frame.inspect( n=11 ) [##] a binned ================ [0] 1 0 [1] 1 0 [2] 2 0 [3] 3 0 [4] 5 0 [5] 8 1 [6] 13 1 [7] 21 1 [8] 34 1 [9] 55 2 [10] 89 3