dautil.stats¶

Statistical functions and utilities.

class dautil.stats.Box(arr)¶

Implements the concept of a box we know from box plots.

calc_iqr()¶

Calculates the Inter Quartile Range.

Returns:	The interquartile range of the data.

>>> import numpy as np
>>> from dautil import stats
>>> arr = np.array([7, 15, 36, 39, 40, 41])
>>> box = stats.Box(arr)
>>> box.calc_iqr()
25

iqr_from_box()¶

Calculates the number (0 or more) of inter quartile ranges (IQR) from the box formed by the first and third quartile.

Returns:	A float representing the number of IQRs

class dautil.stats.Distribution(data, dist, nbins=20, cutoff=0.75, range=None)¶

Wraps scipy.stats distribution classes. Most of the methods are only appropriate for continuous distributions.

Variables:	nbins – The number of bins for a histogram plot. train – The train data. test – The test data.

describe_residuals(*args, **kwds)¶

Describes the residuals of a fit to a distribution. Only appropriate for continuous distributions.

Returns:	Statistics for the residuals as a dict.

error(*args, **kwds)¶

Computes the residuals of a fit to a distribution. Only appropriate for continuous distributions.

Returns:	The residuals of the fit.

fit(*args)¶

Fits data to a distribution. Only appropriate for continuous distributions.

Returns:	The result of the fit.

mean_ad(*args, **kwds)¶

Computes the mean absolute deviation.

Returns:	The mean absolute deviation.

pdf(*args, **kwds)¶

Computes the probability distribution function. Only appropriate for continuous distributions.

Returns:	The probability distribution function.

plot(ax)¶: Plots a histogram of the data and a fit. Only appropriate for continuous distributions.

rmse(*args, **kwds)¶

Computes the root mean square error of the distribution fit. Only appropriate for continuous distributions.

Returns:	The RMSE of the fit.

rvs(*args, **kwds)¶

Generates random values.

Returns:	The generated values.

split(data, cutoff)¶

Splits data into test and train data.

Parameters:	data – An array containing numbers. cutoff – A value in the range 0 - 1 used to split the data.
Returns:	Two arrays - the train data and the test data.

dautil.stats.ci(arr, alpha=0.95)¶

Computes the confidence interval.

Parameters:	arr – An array containing numbers. alpha – A value in the range 0 - 1 that serves as percentiles.
Returns:	The confidence interval.

dautil.stats.clip_outliers(arr)¶

Clips values using limits for values to be considered outliers.

Parameters:	arr – An array containing numbers.
Returns:	A clipped values array.

>>> import numpy as np
>>> from dautil import stats
>>> arr = list(range(5))
>>> outliers = [-100, 100]
>>> arr.extend(outliers)
>>> arr
[0, 1, 2, 3, 4, -100, 100]
>>> stats.clip_outliers(arr)
array([ 0.,  1.,  2.,  3.,  4., -4.,  8.])

dautil.stats.jackknife(arr, func, alpha=0.95)¶

Jackknifes an array with a supplied function.

Parameters:	arr – An array containing numbers. func – The function to apply. alpha – A number in the range 0 - 1 used to calculate the confidence interval.
Returns:	Three numbers - the function value for arr, the lower limit of the confidence interval and the upper limit of the confidence interval.

dautil.stats.mape(actual, forecast)¶

Computes the mean absolute percentage error. Non-finite values due to zero division are ignored.

\[\mbox{MAPE} = \frac{1}{n}\sum_{t=1}^n \left|\frac{A_t-F_t}{A_t}\right|\]

Parameters:	actual – The target values. forecast – Predicted values.
Returns:	The MAPE.

dautil.stats.mpe(actual, forecast)¶

Computes the mean percentage error. Non-finite values due to zero division are ignored.

\[\text{MPE} = \frac{100\%}{n}\sum_{t=1}^n \frac{a_t-f_t}{a_t}\]

Parameters:	actual – The target values. forecast – Predicted values.
Returns:	The MPE.

dautil.stats.mse(actual, forecast)¶

Computes the mean squared error.

\[\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2\]

Parameters:	actual – The target values. forecast – Predicted values.
Returns:	The MSE.

dautil.stats.outliers(arr, method='IQR', factor=1.5, percentiles=(5, 95))¶

Gets the limits given an array for values to be considered outliers.

Parameters:	arr – An array containing numbers. method – IQR (default) or percentiles. factor – Factor for the IQR method. percentiles – A tuple of percentiles.
Returns:	A namedtuple with upper and lower limits.

>>> import numpy as np
>>> from dautil import stats
>>> stats.outliers(a)
Outlier(min=-48.5, max=149.5)

dautil.stats.rss(actual, forecast)¶

Computes the residuals squares sum (RSS).

\[RSS = \sum_{i=1}^n (y_i - f(x_i))^2\]

Parameters:	actual – The target values. forecast – Predicted values.
Returns:	The RSS.

dautil.stats.sqrt_bins(arr)¶

Uses a rule of thumb to calculate the appropriate number of bins for an array.

Parameters:	arr – An array containing numbers.
Returns:	An integer to serve as the number of bins.

dautil.stats.trimean(arr)¶

Calculates the trimean.

\[TM=\frac{Q_1 + 2Q_2 + Q_3}{4}\]

Parameters:	arr – An array containing numbers.
Returns:	The trimean for the array.

>>> import numpy as np
>>> from dautil import stats
>>> stats.trimean(np.arange(9))
4.0

dautil.stats.wssse(point, center)¶

Computes the Within Set Sum of Squared Error(WSSSE).

Parameters:	point – A point for which to calculate the error. center – The center of a cluster.
Returns:	The WSSSE.

dautil.stats¶

Previous topic

Next topic

This Page

Navigation

dautil.stats¶

Previous topic

Next topic

This Page

Quick search

Navigation