Frame group_by¶

group_by(self, group_by_columns, *aggregation_arguments)¶

[BETA] Create summarized frame.

Parameters:

Parameters:	group_by_columns : list Column name or list of column names *aggregation_arguments : dict Aggregation function based on entire row, and/or dictionaries (one or more) of { column name str : aggregation function(s) }.
Returns:	: Frame A new frame with the results of the group_by

group_by_columns : list

Column name or list of column names

*aggregation_arguments : dict

Aggregation function based on entire row, and/or dictionaries (one or more) of { column name str : aggregation function(s) }.

Returns:

: Frame

A new frame with the results of the group_by

Creates a new frame and returns a Frame object to access it. Takes a column or group of columns, finds the unique combination of values, and creates unique rows with these column values. The other columns are combined according to the aggregation argument(s).

Notes

Column order is not guaranteed when columns are added
The column names created by aggregation functions in the new frame are the original column name appended with the ‘_’ character and the aggregation function. For example, if the original field is a and the function is avg, the resultant column is named a_avg.
An aggregation argument of count results in a column named count.
The aggregation function agg.count is the only full row aggregation function supported at this time.
Aggregation currently supports using the following functions:
- avg
- count
- count_distinct
- max
- min
- stdev
- sum
- var (see glossary Bias vs Variance)

Examples

For setup, we will use a Frame my_frame accessing a frame with a column a:

>>> frame.inspect()
[#]  a  b        c     d       e  f    g
========================================
[0]  1  alpha     3.0  small   1  3.0  9
[1]  1  bravo     5.0  medium  1  4.0  9
[2]  1  alpha     5.0  large   1  8.0  8
[3]  2  bravo     8.0  large   1  5.0  7
[4]  2  charlie  12.0  medium  1  6.0  6
[5]  2  bravo     7.0  small   1  8.0  5
[6]  2  bravo    12.0  large   1  6.0  4

Count the groups in column 'b'

>>> b_count = frame.group_by('b', ta.agg.count)
[===Job Progress===]
>>> b_count.inspect()
[#]  b        count
===================
[0]  alpha        2
[1]  bravo        4
[2]  charlie      1

>>> avg1 = frame.group_by(['a', 'b'], {'c' : ta.agg.avg})
[===Job Progress===]
>>> avg1.inspect()
[#]  a  b        c_AVG
======================
[0]  2  bravo      9.0
[1]  1  alpha      4.0
[2]  2  charlie   12.0
[3]  1  bravo      5.0

>>> mix_frame = frame.group_by('a', ta.agg.count, {'f': [ta.agg.avg, ta.agg.sum, ta.agg.min], 'g': ta.agg.max})
[===Job Progress===]
>>> mix_frame.inspect()
[#]  a  count  g_MAX  f_AVG  f_SUM  f_MIN
=========================================
[0]  1      3      9    5.0   15.0    3.0
[1]  2      4      7   6.25   25.0    5.0

For further examples, see Group by (and aggregate):.

Quick search

Table Of Contents

Frame group_by¶