chicken_turtle_util.data_frame¶
Extensions to pandas.DataFrame
Warning
Module contents have only been tested on DataFrames with an Index, DataFrames using a MultiIndex may not work with this module’s functions.
assert_equals |
Assert 2 data frames are equal |
equals |
Get whether 2 data frames are equal |
replace_na_with_none |
Replace NaN values in pd.DataFrame with None |
split_array_like |
Split cells with array_like values along row axis. |
-
chicken_turtle_util.data_frame.
assert_equals
(df1, df2, ignore_order=set(), ignore_indices=set(), all_close=False, _return_reason=False)[source]¶ Assert 2 data frames are equal
Like
assert equals(df1, df2, ...)
, but with better hints at where the data frames differ. Seechicken_turtle_util.data_frame.equals()
for detailed parameter doc.Parameters: df1, df2 : pd.DataFrame
ignore_order : {int}
ignore_indices : {int}
all_close : bool
-
chicken_turtle_util.data_frame.
equals
(df1, df2, ignore_order=set(), ignore_indices=set(), all_close=False, _return_reason=False)[source]¶ Get whether 2 data frames are equal
NaN
s are considered equal (which is consistent with pandas.DataFrame.equals).None
is considered equal toNaN
.Parameters: df1, df2 : pd.DataFrame
Data frames to compare
ignore_order : {int}
Axi in which to ignore order
ignore_indices : {int}
Axi of which to ignore the index. E.g.
{1}
allows differences indf.columns.name
and df.columns.equals(df2.columns)`.all_close : bool
If False, values must match exactly, if True, floats are compared as if compared with np.isclose.
_return_reason : bool
Internal. If True, equals returns a tuple containing the reason, else equals only returns a bool indicating equality (or equivalence rather).
Returns: equal : bool
Whether they’re equal (after ignoring according to the parameters)
reason : str or None
If equal,
None
, otherwise short explanation of why the data frames aren’t equal. Omitted if not _return_reason.Notes
All values (including those of indices) must be copyable and __eq__ must be such that a copy must equal its original. A value must equal itself unless it’s np.nan. Values needn’t be orderable or hashable (however pandas requires index values to be orderable and hashable). By consequence, this is not an efficient function, but it is flexible.
Examples
>>> from chicken_turtle_util import data_frame as df_ >>> import pandas as pd >>> df = pd.DataFrame([ ... [1, 2, 3], ... [4, 5, 6], ... [7, 8, 9] ... ], ... index=pd.Index(('i1', 'i2', 'i3'), name='index1'), ... columns=pd.Index(('c1', 'c2', 'c3'), name='columns1') ... ) >>> df columns1 c1 c2 c3 index1 i1 1 2 3 i2 4 5 6 i3 7 8 9 >>> df2 = df.reindex(('i3', 'i1', 'i2'), columns=('c2', 'c1', 'c3')) >>> df2 columns1 c2 c1 c3 index1 i3 8 7 9 i1 2 1 3 i2 5 4 6 >>> df_.equals(df, df2) False >>> df_.equals(df, df2, ignore_order=(0,1)) True >>> df2 = df.copy() >>> df2.index = [1,2,3] >>> df2 columns1 c1 c2 c3 1 1 2 3 2 4 5 6 3 7 8 9 >>> df_.equals(df, df2) False >>> df_.equals(df, df2, ignore_indices={0}) True >>> df2 = df.reindex(('i3', 'i1', 'i2')) >>> df2 columns1 c1 c2 c3 index1 i3 7 8 9 i1 1 2 3 i2 4 5 6 >>> df_.equals(df, df2, ignore_indices={0}) # does not ignore row order! False >>> df_.equals(df, df2, ignore_order={0}) True >>> df2 = df.copy() >>> df2.index.name = 'other' >>> df_.equals(df, df2) # df.index.name must match as well, same goes for df.columns.name False
-
chicken_turtle_util.data_frame.
replace_na_with_none
(df)[source]¶ Replace
NaN
values in pd.DataFrame withNone
Parameters: df : pd.DataFrame
DataFrame whose
NaN
values to replaceReturns: pd.DataFrame
df with
NaN
values replaced by NoneNotes
Like DataFrame.fillna, but replaces
NaN
values withNone
, which DataFrame.fillna cannot do.These
None
values will not be treated asNA
by DataFrame, as the dtypes will be set toobject
-
chicken_turtle_util.data_frame.
split_array_like
(df, columns=None)[source]¶ Split cells with array_like values along row axis.
Column names are maintained. The index is dropped, but this may change in the future.
Parameters: df : pd.DataFrame
Data frame
df[columns]
should have cell values of type np.array_like.columns : iterable(str) or str or None
Columns (or column) whose values to split. If None, df.columns is used.
Returns: pd.DataFrame
Data frame with array_like values in
df[columns]
split across rows, and corresponding values in other columns repeated.Examples
>>> df = pd.DataFrame([[1,[1,2],[1]],[1,[1,2],[3,4,5]],[2,[1],[1,2]]], columns=('check', 'a', 'b')) >>> df check a b 0 1 [1, 2] [1] 1 1 [1, 2] [3, 4, 5] 2 2 [1] [1, 2] >>> split_array_like(df, ['a', 'b']) check a b 0 1 1 1 1 1 2 1 2 1 1 3 3 1 1 4 4 1 1 5 5 1 2 3 6 1 2 4 7 1 2 5 8 2 1 1 9 2 1 2