Table Of Contents

VertexFrame flatten_columns


flatten_columns(self, columns, delimiters=None)

Spread data to multiple rows based on cell data.

Parameters:

columns : list

The columns to be flattened.

delimiters : list (default=None)

The list of delimiter strings for each column. Default is comma (,).

Splits cells in the specified columns into multiple rows according to a string delimiter. New rows are a full copy of the original row, but the specified columns only contain one value. The original row is deleted.

Examples

Given a data file:

1-solo,mono,single-green,yellow,red
2-duo,double-orange,black

The commands to bring the data into a frame, where it can be worked on:

>>> my_csv = ta.CsvFile("original_data.csv", schema=[('a', int32), ('b', str),('c',str)], delimiter='-')
>>> frame = ta.Frame(source=my_csv)

Looking at it:

>>> frame.inspect()
[#]  a  b                 c
==========================================
[0]  1  solo,mono,single  green,yellow,red
[1]  2  duo,double        orange,black

Now, spread out those sub-strings in column b and c:

>>> frame.flatten_columns(['b','c'], ',')
[===Job Progress===]

Note that the delimiters parameter is optional, and if no delimiter is specified, the default is a comma (,). So, in the above example, the delimiter parameter could be omitted. Also, if the delimiters are different for each column being flattened, a list of delimiters can be provided. If a single delimiter is provided, it’s assumed that we are using the same delimiter for all columns that are being flattened. If more than one delimiter is provided, the number of delimiters must match the number of string columns being flattened.

Check again:

>>> frame.inspect()
[#]  a  b       c
======================
[0]  1  solo    green
[1]  1  mono    yellow
[2]  1  single  red
[3]  2  duo     orange
[4]  2  double  black

Alternatively, flatten_columns also accepts a single column name (instead of a list) if just one column is being flattened. For example, we could have called flatten_column on just column b:

>>> frame.flatten_columns('b', ',')
[===Job Progress===]

Check again:

>>> frame.inspect()
[#]  a  b       c
================================
[0]  1  solo    green,yellow,red
[1]  1  mono    green,yellow,red
[2]  1  single  green,yellow,red
[3]  2  duo     orange,black
[4]  2  double  orange,black