Table Of Contents

VertexFrame group_by


group_by(self, group_by_columns, *aggregation_arguments)

[BETA] Create summarized frame.

Parameters:

group_by_columns : list

Column name or list of column names

*aggregation_arguments : dict

Aggregation function based on entire row, and/or dictionaries (one or more) of { column name str : aggregation function(s) }.

Returns:

: Frame

A new frame with the results of the group_by

Creates a new frame and returns a Frame object to access it. Takes a column or group of columns, finds the unique combination of values, and creates unique rows with these column values. The other columns are combined according to the aggregation argument(s).

Notes

  • Column order is not guaranteed when columns are added
  • The column names created by aggregation functions in the new frame are the original column name appended with the ‘_’ character and the aggregation function. For example, if the original field is a and the function is avg, the resultant column is named a_avg.
  • An aggregation argument of count results in a column named count.
  • The aggregation function agg.count is the only full row aggregation function supported at this time.
  • Aggregation currently supports using the following functions:
    • avg
    • count
    • count_distinct
    • max
    • min
    • stdev
    • sum
    • var (see glossary Bias vs Variance)

Examples

For setup, we will use a Frame my_frame accessing a frame with a column a:

>>> frame.inspect()
[#]  a  b        c     d       e  f    g
========================================
[0]  1  alpha     3.0  small   1  3.0  9
[1]  1  bravo     5.0  medium  1  4.0  9
[2]  1  alpha     5.0  large   1  8.0  8
[3]  2  bravo     8.0  large   1  5.0  7
[4]  2  charlie  12.0  medium  1  6.0  6
[5]  2  bravo     7.0  small   1  8.0  5
[6]  2  bravo    12.0  large   1  6.0  4

Count the groups in column 'b'

>>> b_count = frame.group_by('b', ta.agg.count)
[===Job Progress===]
>>> b_count.inspect()
[#]  b        count
===================
[0]  alpha        2
[1]  bravo        4
[2]  charlie      1

>>> avg1 = frame.group_by(['a', 'b'], {'c' : ta.agg.avg})
[===Job Progress===]
>>> avg1.inspect()
[#]  a  b        c_AVG
======================
[0]  2  bravo      9.0
[1]  1  alpha      4.0
[2]  2  charlie   12.0
[3]  1  bravo      5.0

>>> mix_frame = frame.group_by('a', ta.agg.count, {'f': [ta.agg.avg, ta.agg.sum, ta.agg.min], 'g': ta.agg.max})
[===Job Progress===]
>>> mix_frame.inspect()
[#]  a  count  g_MAX  f_AVG  f_SUM  f_MIN
=========================================
[0]  1      3      9    5.0   15.0    3.0
[1]  2      4      7   6.25   25.0    5.0

For further examples, see Group by (and aggregate):.