ineqpy package¶
Submodules¶
ineqpy.ineqpy module¶
A PYTHON PACKAGE TO QUANTITATIVE ANALYSIS OF INEQUALITY.
Collection of estimators of a stratified sample associated to single individuals, in this module are calculations as the mean, variance, quasivariance, population variance of a stratified sample.
Todo
Rethinking this module, maybe must be a class.
-
ineqpy.ineqpy.
atk
(income, weights=None, e=0.5, data=None)[source]¶ Calculate the coefficient of atkinson
Parameters: - income (array or str) – If data is none income must be an 1D-array, when data is a pd.DataFrame, you must pass the name of income variable as string.
- weights (array or str, optional) – If data is none weights must be an 1D-array, when data is a pd.DataFrame, you must pass the name of weights variable as string.
- e (int, optional) – Epsilon parameter interpreted by atkinson index as inequality adversion, must be between 0 and 1.
- data (pd.DataFrame, optional) – data is a pd.DataFrame that contains the variables.
Returns: atkinson
Return type: float
Notes
Source: https://en.wikipedia.org/wiki/Atkinson_index
Todo
- Implement: CALCULATING INCOME DISTRIBUTION INDICES FROM MICRO-DATA http://www.jstor.org/stable/41788716
Warning
The results has difference with stata, maybe have a bug.
-
ineqpy.ineqpy.
atk_h
(income, weights, group, data=None, e=0.5)[source]¶ Parameters: - income (str or np.array) – Income variable, you can pass name of variable in df or array-like
- weights (str or np.array) – probability or weights, you can pass name of variable in df or array-like
- groups (str or np.array) – stratum, name of stratum in df or array-like
- e (int, optional) – Value of epsilon parameter
- data (pd.DataFrame, optional) – DataFrame that’s contains the previous data.
Returns: atkinson_by_group
Return type: float
Notes
Source: https://en.wikipedia.org/wiki/Atkinson_index
Todo
Review function, has different results with stata.
Examples
-
ineqpy.ineqpy.
cmoment
(x, weights=None, order=2, param=None, ddof=0)[source]¶ Calculate the central moment of x with respect to param of order n, given the weights w.
Parameters: - x (1d-array) – Variable
- weights (1d-array) – Weights
- order (int, optional) – Moment order, 2 by default (variance)
- param (int or array, optional) – Parameter for which the moment is calculated, the default is None, implies use the mean.
- ddof (int, optional) – Degree of freedom, zero by default.
Returns: central_moment
Return type: float
Notes
- The cmoment of order 1 is 0
- The cmoment of order 2 is the variance.
Source : https://en.wikipedia.org/wiki/Moment_(mathematics)
Todo
Implement: https://en.wikipedia.org/wiki/L-moment#cite_note-wang:96-6
-
ineqpy.ineqpy.
gini
(income='x', weights='w', data=None, sorted=False)[source]¶ Calcula el indice de Gini,
Parameters: - data (pandas.DataFrame) – DataFrame that contains the data.
- income (str or np.array, optional) – Name of the monetary variable x in` df`
- weights (str or np.array, optional) – Name of the series containing the weights x in` df`
- sorted (bool, optional) – If the DataFrame is previously ordered by the variable x, it’s must pass True, but False by default.
Returns: gini – Gini Index Value.
Return type: float
Notes
The calculation is done following (discrete probability distribution):
G = 1 - [∑_i^n f(y_i)·(S_{i-1} + S_i)]
where:
- y_i = Income
- S_i = ∑_{j=1}^i y_i · f(y_i)
Source:
- https://en.wikipedia.org/wiki/Gini_coefficient
- CALCULATING INCOME DISTRIBUTION INDICES FROM MICRO-DATA - STEPHEN JENKINS
Todo
- Implement statistical deviation calculation, VAR (GINI)
- Clear comments
- Rename output
Examples
-
ineqpy.ineqpy.
kurt
(x, weights)[source]¶ Calculate the asymmetry coefficient
Parameters: - x (1d-array) –
- w (1d-array) –
Returns: kurt – Kurtosis coefficient.
Return type: float
Notes
It is an alias of the standardized fourth-order moment.
-
ineqpy.ineqpy.
lorenz
(income, weights, data=None)[source]¶ This function compute the lorenz curve and returns a DF with two columns of axis x and y.
Parameters: - data (pandas.DataFrame) – A pandas.DataFrame thats contains data.
- income (str or 1d-array, optional) – Population or wights, if a DataFrame is passed then x shuold be a name of the column of DataFrame, else can pass a pandas.Series or array.
- weights (str or 1d-array) – Income, monetary variable, if a DataFrame is passed then `y`is a name of the series on this DataFrame, however, you can pass a pd.Series or np.array.
Returns: lorenz – Lorenz distribution in a Dataframe with two columns, labeled x and y, that corresponds to plots axis.
Return type: pandas.Dataframe
-
ineqpy.ineqpy.
moment_h
(x='x', weights='w', group='h', data=None, order=2)[source]¶ Calculates the asymmetry of each h stratum.
Parameters: - x (array or str) –
- weights (array or str) –
- group (array or str) –
- data (pd.DataFrame, optional) –
- order (int, optional) –
Returns: moment_of_order
Return type: float
Todo
Review calculations, it does not appear to be correct. Attempt to make a generalization of vhat_h, for any estimator.
Warning
Actually Does Not Work!
-
ineqpy.ineqpy.
shat2_h
(x, weights, group, data=None)[source]¶ Sample variance of x_name, calculated as the second-order central moment.
Parameters: - x (array or str) – variable x apply the statistic. If data is None then must pass this argument as array, else as string name in data
- weights (array or str) – weights can be interpreted as frequency, probability, density function of x, each element in x. If data is None then must pass this argument as array, else as string name in data
- group (array or str) – group is a categorical variable to calculate the statistical by each group. If data is None then must pass this argument as array, else as string name in data
- data (pd.DataFrame, optional) – pd.DataFrame has all variables needed. order
Returns: shat2_h
Return type: array or pd.Series
Notes
This function is useful to calculate the variance of the mean.
Todo
Review function
-
ineqpy.ineqpy.
skew
(x, weights)[source]¶ Returns the asymmetry coefficient of a sample.
Parameters: - x (1d-array) –
- w (1d-array) –
Returns: skew
Return type: float
Notes
It is an alias of the standardized third-order moment.
-
ineqpy.ineqpy.
stdmoment
(x, weights=None, param=None, order=3, ddof=0)[source]¶ Calculate the standardized moment of order c for the variable` x` with respect to c.
Parameters: - x (1d-array) – Random Variable
- weights (1d-array, optional) – Weights or probability
- order (int, optional) – Order of Moment, three by default
- param (int or float or array, optional) – Central trend, default is the mean.
- ddof (int, optional) – Degree of freedom.
Returns: stdmoment – Returns the standardized n order moment.
Return type: float
Notes
Source:
- https://en.wikipedia.org/wiki/Moment_(mathematics)#Significance_of_the_moments
- https://en.wikipedia.org/wiki/Standardized_moment
Todo
It is the general case of the raw and central moments. Review implementation.
-
ineqpy.ineqpy.
var
(x, weights=None, ddof=0)[source]¶ Calculate the population variance of x given weights w, for a homogeneous population.
Parameters: - x (1d-array or pd.Series or pd.DataFrame) – Variable on which the quasivariation is estimated
- w (1d-array or pd.Series or pd.DataFrame) – Weights of the x variable of a dimension
Returns: Shat2 – Estimation of quasivariance of x
Return type: 1d-array or pd.Series or float
Notes
If stratificated sample must pass with groupby each strata.
-
ineqpy.ineqpy.
vhat_h
(x='x', weights='w', group='h', data=None)[source]¶ Data a DataFrame calculates the sample variance for each stratum. The objective of this function is to make it easy to calculate the moments of the distribution that follows an estimator, eg. Can be used to calculate the variance that follows the mean.
Parameters: - data (pandas.DataFrame) – Dataframe containing the series needed for the calculation
- x (str) –
- weights (str) – Name of the weights w in the DataFrame
- group (str) – Name of the stratum variable h in the DataFrame
Returns: vhat_h – A series with the values of the variance of each h stratum.
Return type: pandas.Series
Notes
Todo
Review improvements.
Examples
>>> # Computes the variance of the mean >>> data = pd.DataFrame(data=[renta, peso, estrato], columns=["renta", "peso", "estrato"]) >>> v = vhat_h(data,x_name='income') >>> v stratum 1 700.917.728,64 2 9.431.897.980,96 3 317.865.839.789,10 4 741.304.873.092,88 5 535.275.436.859,10 6 225.573.783.240,68 7 142.048.272.010,63 8 40.136.989.131,06 9 18.501.808.022,56 dtype: float64
>>> # the value of de variance of the mean: >>> v_total = v.sum() / peso.sum() ** 2 24662655225.947945
-
ineqpy.ineqpy.
xbar
(x, weights=None)[source]¶ Calculate the mean of x given weights w.
Parameters: - x (1d-array or pd.Series or pd.DataFrame) – Variable on which the mean is estimated
- w (1d-array or pd.Series or pd.DataFrame, optional) – Weights of the x variable of a dimension
Returns: xbar
Return type: 1d-array or pd.Series or float
Module contents¶
IneqPy