chicken_turtle_util.data_frame¶
Extensions to pandas.DataFrame
Warning
Module contents have only been tested on DataFrames with an Index, DataFrames using a MultiIndex may not work with this module’s functions.
assert_equals |
Assert 2 data frames are equal |
equals |
Get whether 2 data frames are equal |
replace_na_with_none |
Replace NaN values in pd.DataFrame with None |
split_array_like |
Split cells with array_like values along row axis. |
-
chicken_turtle_util.data_frame.assert_equals(df1, df2, ignore_order=set(), ignore_indices=set(), all_close=False, _return_reason=False)[source]¶ Assert 2 data frames are equal
Like
assert equals(df1, df2, ...), but with better hints at where the data frames differ. Seechicken_turtle_util.data_frame.equals()for detailed parameter doc.Parameters: df1, df2 : pd.DataFrame
ignore_order : {int}
ignore_indices : {int}
all_close : bool
-
chicken_turtle_util.data_frame.equals(df1, df2, ignore_order=set(), ignore_indices=set(), all_close=False, _return_reason=False)[source]¶ Get whether 2 data frames are equal
NaNs are considered equal (which is consistent with pandas.DataFrame.equals).Noneis considered equal toNaN.Parameters: df1, df2 : pd.DataFrame
Data frames to compare
ignore_order : {int}
Axi in which to ignore order
ignore_indices : {int}
Axi of which to ignore the index. E.g.
{1}allows differences indf.columns.nameand df.columns.equals(df2.columns)`.all_close : bool
If False, values must match exactly, if True, floats are compared as if compared with np.isclose.
_return_reason : bool
Internal. If True, equals returns a tuple containing the reason, else equals only returns a bool indicating equality (or equivalence rather).
Returns: equal : bool
Whether they’re equal (after ignoring according to the parameters)
reason : str or None
If equal,
None, otherwise short explanation of why the data frames aren’t equal. Omitted if not _return_reason.Notes
All values (including those of indices) must be copyable and __eq__ must be such that a copy must equal its original. A value must equal itself unless it’s np.nan. Values needn’t be orderable or hashable (however pandas requires index values to be orderable and hashable). By consequence, this is not an efficient function, but it is flexible.
Examples
>>> from chicken_turtle_util import data_frame as df_ >>> import pandas as pd >>> df = pd.DataFrame([ ... [1, 2, 3], ... [4, 5, 6], ... [7, 8, 9] ... ], ... index=pd.Index(('i1', 'i2', 'i3'), name='index1'), ... columns=pd.Index(('c1', 'c2', 'c3'), name='columns1') ... ) >>> df columns1 c1 c2 c3 index1 i1 1 2 3 i2 4 5 6 i3 7 8 9 >>> df2 = df.reindex(('i3', 'i1', 'i2'), columns=('c2', 'c1', 'c3')) >>> df2 columns1 c2 c1 c3 index1 i3 8 7 9 i1 2 1 3 i2 5 4 6 >>> df_.equals(df, df2) False >>> df_.equals(df, df2, ignore_order=(0,1)) True >>> df2 = df.copy() >>> df2.index = [1,2,3] >>> df2 columns1 c1 c2 c3 1 1 2 3 2 4 5 6 3 7 8 9 >>> df_.equals(df, df2) False >>> df_.equals(df, df2, ignore_indices={0}) True >>> df2 = df.reindex(('i3', 'i1', 'i2')) >>> df2 columns1 c1 c2 c3 index1 i3 7 8 9 i1 1 2 3 i2 4 5 6 >>> df_.equals(df, df2, ignore_indices={0}) # does not ignore row order! False >>> df_.equals(df, df2, ignore_order={0}) True >>> df2 = df.copy() >>> df2.index.name = 'other' >>> df_.equals(df, df2) # df.index.name must match as well, same goes for df.columns.name False
-
chicken_turtle_util.data_frame.replace_na_with_none(df)[source]¶ Replace
NaNvalues in pd.DataFrame withNoneParameters: df : pd.DataFrame
DataFrame whose
NaNvalues to replaceReturns: pd.DataFrame
df with
NaNvalues replaced by NoneNotes
Like DataFrame.fillna, but replaces
NaNvalues withNone, which DataFrame.fillna cannot do.These
Nonevalues will not be treated asNAby DataFrame, as the dtypes will be set toobject
-
chicken_turtle_util.data_frame.split_array_like(df, columns=None)[source]¶ Split cells with array_like values along row axis.
Column names are maintained. The index is dropped, but this may change in the future.
Parameters: df : pd.DataFrame
Data frame
df[columns]should have cell values of type np.array_like.columns : iterable(str) or str or None
Columns (or column) whose values to split. If None, df.columns is used.
Returns: pd.DataFrame
Data frame with array_like values in
df[columns]split across rows, and corresponding values in other columns repeated.Examples
>>> df = pd.DataFrame([[1,[1,2],[1]],[1,[1,2],[3,4,5]],[2,[1],[1,2]]], columns=('check', 'a', 'b')) >>> df check a b 0 1 [1, 2] [1] 1 1 [1, 2] [3, 4, 5] 2 2 [1] [1, 2] >>> split_array_like(df, ['a', 'b']) check a b 0 1 1 1 1 1 2 1 2 1 1 3 3 1 1 4 4 1 1 5 5 1 2 3 6 1 2 4 7 1 2 5 8 2 1 1 9 2 1 2