Table Of Contents

Frame drop_duplicates


drop_duplicates(self, unique_columns=None)

Modify the current frame, removing duplicate rows.

Parameters:unique_columns : None (default=None)

Remove data rows which are the same as other rows. The entire row can be checked for duplication, or the search for duplicates can be limited to one or more columns. This modifies the current frame.

Examples

Given a frame with data:

>>> frame.inspect()
[#]  a    b  c
===============
[0]  200  4  25
[1]  200  5  25
[2]  200  4  25
[3]  200  5  35
[4]  200  6  25
[5]  200  8  35
[6]  200  4  45
[7]  200  4  25
[8]  200  5  25
[9]  201  4  25

Remove any rows that are identical to a previous row. The result is a frame of unique rows. Note that row order may change.

>>> frame.drop_duplicates()
[===Job Progress===]
>>> frame.inspect()
[#]  a    b  c
===============
[0]  201  4  25
[1]  200  4  25
[2]  200  5  25
[3]  200  8  35
[4]  200  6  25
[5]  200  5  35
[6]  200  4  45

Now remove any rows that have the same data in columns a and c as a previously checked row:

>>> frame.drop_duplicates([ "a", "c"])
[===Job Progress===]

The result is a frame with unique values for the combination of columns a and c.

>>> frame.inspect()
[#]  a    b  c
===============
[0]  201  4  25
[1]  200  4  45
[2]  200  4  25
[3]  200  8  35