Table Of Contents

Frame top_k


top_k(self, column_name, k, weights_column=None)

Most or least frequent column values.

Parameters:

column_name : unicode

The column whose top (or bottom) K distinct values are to be calculated.

k : int32

Number of entries to return (If k is negative, return bottom k).

weights_column : unicode (default=None)

The column that provides weights (frequencies) for the topK calculation. Must contain numerical data. Default is 1 for all items.

Returns:

: Frame

An object with access to the frame of data.

Calculate the top (or bottom) K distinct values by count of a column. The column can be weighted. All data elements of weight <= 0 are excluded from the calculation, as are all data elements whose weight is NaN or infinite. If there are no data elements of finite weight > 0, then topK is empty.

Examples

For this example, we calculate the top 5 movie genres in a data frame: Consider the following frame containing four columns.

>>> frame.inspect()
    [#]  rank  city         population_2013  population_2010  change  county
    ============================================================================
    [0]     1  Portland              609456           583776  4.40%   Multnomah
    [1]     2  Salem                 160614           154637  3.87%   Marion
    [2]     3  Eugene                159190           156185  1.92%   Lane
    [3]     4  Gresham               109397           105594  3.60%   Multnomah
    [4]     5  Hillsboro              97368            91611  6.28%   Washington
    [5]     6  Beaverton              93542            89803  4.16%   Washington
    [6]    15  Grants Pass            35076            34533  1.57%   Josephine
    [7]    16  Oregon City            34622            31859  8.67%   Clackamas
    [8]    17  McMinnville            33131            32187  2.93%   Yamhill
    [9]    18  Redmond                27427            26215  4.62%   Deschutes
>>> top_frame = frame.top_k("county", 2)
[===Job Progress===]
>>> top_frame.inspect()
    [#]  county      count
        ======================
        [0]  Washington    4.0
        [1]  Clackamas     3.0