VertexFrame group_by¶
-
group_by
(self, group_by_columns, *aggregation_arguments)¶ [BETA] Create summarized frame.
Parameters: group_by_columns : list
Column name or list of column names
*aggregation_arguments : dict
Aggregation function based on entire row, and/or dictionaries (one or more) of { column name str : aggregation function(s) }.
Returns: : Frame
A new frame with the results of the group_by
Creates a new frame and returns a Frame object to access it. Takes a column or group of columns, finds the unique combination of values, and creates unique rows with these column values. The other columns are combined according to the aggregation argument(s).
Notes
- Column order is not guaranteed when columns are added
- The column names created by aggregation functions in the new frame are the original column name appended with the ‘_’ character and the aggregation function. For example, if the original field is a and the function is avg, the resultant column is named a_avg.
- An aggregation argument of count results in a column named count.
- The aggregation function agg.count is the only full row aggregation function supported at this time.
- Aggregation currently supports using the following functions:
- avg
- count
- count_distinct
- max
- min
- stdev
- sum
- var (see glossary Bias vs Variance)
Examples
For setup, we will use a Frame my_frame accessing a frame with a column a:
>>> frame.inspect() [#] a b c d e f g ======================================== [0] 1 alpha 3.0 small 1 3.0 9 [1] 1 bravo 5.0 medium 1 4.0 9 [2] 1 alpha 5.0 large 1 8.0 8 [3] 2 bravo 8.0 large 1 5.0 7 [4] 2 charlie 12.0 medium 1 6.0 6 [5] 2 bravo 7.0 small 1 8.0 5 [6] 2 bravo 12.0 large 1 6.0 4 Count the groups in column 'b' >>> b_count = frame.group_by('b', ta.agg.count) [===Job Progress===] >>> b_count.inspect() [#] b count =================== [0] alpha 2 [1] bravo 4 [2] charlie 1 >>> avg1 = frame.group_by(['a', 'b'], {'c' : ta.agg.avg}) [===Job Progress===] >>> avg1.inspect() [#] a b c_AVG ====================== [0] 2 bravo 9.0 [1] 1 alpha 4.0 [2] 2 charlie 12.0 [3] 1 bravo 5.0 >>> mix_frame = frame.group_by('a', ta.agg.count, {'f': [ta.agg.avg, ta.agg.sum, ta.agg.min], 'g': ta.agg.max}) [===Job Progress===] >>> mix_frame.inspect() [#] a count g_MAX f_AVG f_SUM f_MIN ========================================= [0] 1 3 9 5.0 15.0 3.0 [1] 2 4 7 6.25 25.0 5.0
For further examples, see Group by (and aggregate):.