ineqpy package

Submodules

ineqpy.ineqpy module

A PYTHON PACKAGE TO QUANTITATIVE ANALYSIS OF INEQUALITY.

Collection of estimators of a stratified sample associated to single individuals, in this module are calculations as the mean, variance, quasivariance, population variance of a stratified sample.

Todo

Rethinking this module, maybe must be a class.

ineqpy.ineqpy.atk(income, weights=None, e=0.5, data=None)[source]

Calculate the coefficient of atkinson

Parameters:
  • income (array or str) – If data is none income must be an 1D-array, when data is a pd.DataFrame, you must pass the name of income variable as string.
  • weights (array or str, optional) – If data is none weights must be an 1D-array, when data is a pd.DataFrame, you must pass the name of weights variable as string.
  • e (int, optional) – Epsilon parameter interpreted by atkinson index as inequality adversion, must be between 0 and 1.
  • data (pd.DataFrame, optional) – data is a pd.DataFrame that contains the variables.
Returns:

atkinson

Return type:

float

Notes

Source: https://en.wikipedia.org/wiki/Atkinson_index

Todo

Warning

The results has difference with stata, maybe have a bug.

ineqpy.ineqpy.atk_h(income, weights, group, data=None, e=0.5)[source]
Parameters:
  • income (str or np.array) – Income variable, you can pass name of variable in df or array-like
  • weights (str or np.array) – probability or weights, you can pass name of variable in df or array-like
  • groups (str or np.array) – stratum, name of stratum in df or array-like
  • e (int, optional) – Value of epsilon parameter
  • data (pd.DataFrame, optional) – DataFrame that’s contains the previous data.
Returns:

atkinson_by_group

Return type:

float

Notes

Source: https://en.wikipedia.org/wiki/Atkinson_index

Todo

Review function, has different results with stata.

Examples

ineqpy.ineqpy.cmoment(x, weights=None, order=2, param=None, ddof=0)[source]

Calculate the central moment of x with respect to param of order n, given the weights w.

Parameters:
  • x (1d-array) – Variable
  • weights (1d-array) – Weights
  • order (int, optional) – Moment order, 2 by default (variance)
  • param (int or array, optional) – Parameter for which the moment is calculated, the default is None, implies use the mean.
  • ddof (int, optional) – Degree of freedom, zero by default.
Returns:

central_moment

Return type:

float

Notes

  • The cmoment of order 1 is 0
  • The cmoment of order 2 is the variance.

Source : https://en.wikipedia.org/wiki/Moment_(mathematics)

ineqpy.ineqpy.gini(income='x', weights='w', data=None, sorted=False)[source]

Calcula el indice de Gini,

Parameters:
  • data (pandas.DataFrame) – DataFrame that contains the data.
  • income (str or np.array, optional) – Name of the monetary variable x in` df`
  • weights (str or np.array, optional) – Name of the series containing the weights x in` df`
  • sorted (bool, optional) – If the DataFrame is previously ordered by the variable x, it’s must pass True, but False by default.
Returns:

gini – Gini Index Value.

Return type:

float

Notes

The calculation is done following (discrete probability distribution):

G = 1 - [∑_i^n f(y_i)·(S_{i-1} + S_i)]

where:

  • y_i = Income
  • S_i = ∑_{j=1}^i y_i · f(y_i)

Source:

Todo

  • Implement statistical deviation calculation, VAR (GINI)
  • Clear comments
  • Rename output

Examples

ineqpy.ineqpy.kurt(x, weights)[source]

Calculate the asymmetry coefficient

Parameters:
  • x (1d-array) –
  • w (1d-array) –
Returns:

kurt – Kurtosis coefficient.

Return type:

float

Notes

It is an alias of the standardized fourth-order moment.

ineqpy.ineqpy.lorenz(income, weights, data=None)[source]

This function compute the lorenz curve and returns a DF with two columns of axis x and y.

Parameters:
  • data (pandas.DataFrame) – A pandas.DataFrame thats contains data.
  • income (str or 1d-array, optional) – Population or wights, if a DataFrame is passed then x shuold be a name of the column of DataFrame, else can pass a pandas.Series or array.
  • weights (str or 1d-array) – Income, monetary variable, if a DataFrame is passed then `y`is a name of the series on this DataFrame, however, you can pass a pd.Series or np.array.
Returns:

lorenz – Lorenz distribution in a Dataframe with two columns, labeled x and y, that corresponds to plots axis.

Return type:

pandas.Dataframe

ineqpy.ineqpy.moment_h(x='x', weights='w', group='h', data=None, order=2)[source]

Calculates the asymmetry of each h stratum.

Parameters:
  • x (array or str) –
  • weights (array or str) –
  • group (array or str) –
  • data (pd.DataFrame, optional) –
  • order (int, optional) –
Returns:

moment_of_order

Return type:

float

Todo

Review calculations, it does not appear to be correct. Attempt to make a generalization of vhat_h, for any estimator.

Warning

Actually Does Not Work!

ineqpy.ineqpy.shat2_h(x, weights, group, data=None)[source]

Sample variance of x_name, calculated as the second-order central moment.

Parameters:
  • x (array or str) – variable x apply the statistic. If data is None then must pass this argument as array, else as string name in data
  • weights (array or str) – weights can be interpreted as frequency, probability, density function of x, each element in x. If data is None then must pass this argument as array, else as string name in data
  • group (array or str) – group is a categorical variable to calculate the statistical by each group. If data is None then must pass this argument as array, else as string name in data
  • data (pd.DataFrame, optional) – pd.DataFrame has all variables needed. order
Returns:

shat2_h

Return type:

array or pd.Series

Notes

This function is useful to calculate the variance of the mean.

Todo

Review function

ineqpy.ineqpy.skew(x, weights)[source]

Returns the asymmetry coefficient of a sample.

Parameters:
  • x (1d-array) –
  • w (1d-array) –
Returns:

skew

Return type:

float

Notes

It is an alias of the standardized third-order moment.

ineqpy.ineqpy.stdmoment(x, weights=None, param=None, order=3, ddof=0)[source]

Calculate the standardized moment of order c for the variable` x` with respect to c.

Parameters:
  • x (1d-array) – Random Variable
  • weights (1d-array, optional) – Weights or probability
  • order (int, optional) – Order of Moment, three by default
  • param (int or float or array, optional) – Central trend, default is the mean.
  • ddof (int, optional) – Degree of freedom.
Returns:

stdmoment – Returns the standardized n order moment.

Return type:

float

Notes

Source:

Todo

It is the general case of the raw and central moments. Review implementation.

ineqpy.ineqpy.var(x, weights=None, ddof=0)[source]

Calculate the population variance of x given weights w, for a homogeneous population.

Parameters:
  • x (1d-array or pd.Series or pd.DataFrame) – Variable on which the quasivariation is estimated
  • w (1d-array or pd.Series or pd.DataFrame) – Weights of the x variable of a dimension
Returns:

Shat2 – Estimation of quasivariance of x

Return type:

1d-array or pd.Series or float

Notes

If stratificated sample must pass with groupby each strata.

ineqpy.ineqpy.vhat_h(x='x', weights='w', group='h', data=None)[source]

Data a DataFrame calculates the sample variance for each stratum. The objective of this function is to make it easy to calculate the moments of the distribution that follows an estimator, eg. Can be used to calculate the variance that follows the mean.

Parameters:
  • data (pandas.DataFrame) – Dataframe containing the series needed for the calculation
  • x (str) –
  • weights (str) – Name of the weights w in the DataFrame
  • group (str) – Name of the stratum variable h in the DataFrame
Returns:

vhat_h – A series with the values of the variance of each h stratum.

Return type:

pandas.Series

Notes

Todo

Review improvements.

Examples

>>> # Computes the variance of the mean
>>> data = pd.DataFrame(data=[renta, peso, estrato],
                        columns=["renta", "peso", "estrato"])
>>> v = vhat_h(data,x_name='income')
>>> v
stratum
1                700.917.728,64
2              9.431.897.980,96
3            317.865.839.789,10
4            741.304.873.092,88
5            535.275.436.859,10
6            225.573.783.240,68
7            142.048.272.010,63
8             40.136.989.131,06
9             18.501.808.022,56
dtype: float64
>>> # the value of de variance of the mean:
>>> v_total = v.sum() / peso.sum() ** 2
24662655225.947945
ineqpy.ineqpy.xbar(x, weights=None)[source]

Calculate the mean of x given weights w.

Parameters:
  • x (1d-array or pd.Series or pd.DataFrame) – Variable on which the mean is estimated
  • w (1d-array or pd.Series or pd.DataFrame, optional) – Weights of the x variable of a dimension
Returns:

xbar

Return type:

1d-array or pd.Series or float

Module contents

IneqPy