dautil.stats

Statistical functions and utilities.

class dautil.stats.Box(arr)

Implements the concept of a box we know from box plots.

calc_iqr()

Calculates the Inter Quartile Range.

Returns:The interquartile range of the data.
>>> import numpy as np
>>> from dautil import stats
>>> arr = np.array([7, 15, 36, 39, 40, 41])
>>> box = stats.Box(arr)
>>> box.calc_iqr()
25
iqr_from_box()

Calculates the number (0 or more) of inter quartile ranges (IQR) from the box formed by the first and third quartile.

Returns:A float representing the number of IQRs
class dautil.stats.Distribution(data, dist, nbins=20, cutoff=0.75, range=None)

Wraps scipy.stats distribution classes. Most of the methods are only appropriate for continuous distributions.

Variables:
  • nbins – The number of bins for a histogram plot.
  • train – The train data.
  • test – The test data.
describe_residuals(*args, **kwds)

Describes the residuals of a fit to a distribution. Only appropriate for continuous distributions.

Returns:Statistics for the residuals as a dict.
error(*args, **kwds)

Computes the residuals of a fit to a distribution. Only appropriate for continuous distributions.

Returns:The residuals of the fit.
fit(*args)

Fits data to a distribution. Only appropriate for continuous distributions.

Returns:The result of the fit.
mean_ad(*args, **kwds)

Computes the mean absolute deviation.

Returns:The mean absolute deviation.
pdf(*args, **kwds)

Computes the probability distribution function. Only appropriate for continuous distributions.

Returns:The probability distribution function.
plot(ax)

Plots a histogram of the data and a fit. Only appropriate for continuous distributions.

rmse(*args, **kwds)

Computes the root mean square error of the distribution fit. Only appropriate for continuous distributions.

Returns:The RMSE of the fit.
rvs(*args, **kwds)

Generates random values.

Returns:The generated values.
split(data, cutoff)

Splits data into test and train data.

Parameters:
  • data – An array containing numbers.
  • cutoff – A value in the range 0 - 1 used to split the data.
Returns:

Two arrays - the train data and the test data.

dautil.stats.ci(arr, alpha=0.95)

Computes the confidence interval.

Parameters:
  • arr – An array containing numbers.
  • alpha – A value in the range 0 - 1 that serves as percentiles.
Returns:

The confidence interval.

dautil.stats.clip_outliers(arr)

Clips values using limits for values to be considered outliers.

Parameters:arr – An array containing numbers.
Returns:A clipped values array.
>>> import numpy as np
>>> from dautil import stats
>>> arr = list(range(5))
>>> outliers = [-100, 100]
>>> arr.extend(outliers)
>>> arr
[0, 1, 2, 3, 4, -100, 100]
>>> stats.clip_outliers(arr)
array([ 0.,  1.,  2.,  3.,  4., -4.,  8.])
dautil.stats.jackknife(arr, func, alpha=0.95)

Jackknifes an array with a supplied function.

Parameters:
  • arr – An array containing numbers.
  • func – The function to apply.
  • alpha – A number in the range 0 - 1 used to calculate the confidence interval.
Returns:

Three numbers - the function value for arr, the lower limit of the confidence interval and the upper limit of the confidence interval.

dautil.stats.mape(actual, forecast)

Computes the mean absolute percentage error. Non-finite values due to zero division are ignored.

\[\mbox{MAPE} = \frac{1}{n}\sum_{t=1}^n \left|\frac{A_t-F_t}{A_t}\right|\]
Parameters:
  • actual – The target values.
  • forecast – Predicted values.
Returns:

The MAPE.

dautil.stats.mpe(actual, forecast)

Computes the mean percentage error. Non-finite values due to zero division are ignored.

\[\text{MPE} = \frac{100\%}{n}\sum_{t=1}^n \frac{a_t-f_t}{a_t}\]
Parameters:
  • actual – The target values.
  • forecast – Predicted values.
Returns:

The MPE.

dautil.stats.mse(actual, forecast)

Computes the mean squared error.

\[\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2\]
Parameters:
  • actual – The target values.
  • forecast – Predicted values.
Returns:

The MSE.

dautil.stats.outliers(arr, method='IQR', factor=1.5, percentiles=(5, 95))

Gets the limits given an array for values to be considered outliers.

Parameters:
  • arr – An array containing numbers.
  • method – IQR (default) or percentiles.
  • factor – Factor for the IQR method.
  • percentiles – A tuple of percentiles.
Returns:

A namedtuple with upper and lower limits.

>>> import numpy as np
>>> from dautil import stats
>>> stats.outliers(a)
Outlier(min=-48.5, max=149.5)
dautil.stats.rss(actual, forecast)

Computes the residuals squares sum (RSS).

\[RSS = \sum_{i=1}^n (y_i - f(x_i))^2\]
Parameters:
  • actual – The target values.
  • forecast – Predicted values.
Returns:

The RSS.

dautil.stats.sqrt_bins(arr)

Uses a rule of thumb to calculate the appropriate number of bins for an array.

Parameters:arr – An array containing numbers.
Returns:An integer to serve as the number of bins.
dautil.stats.trimean(arr)

Calculates the trimean.

\[TM=\frac{Q_1 + 2Q_2 + Q_3}{4}\]
Parameters:arr – An array containing numbers.
Returns:The trimean for the array.
>>> import numpy as np
>>> from dautil import stats
>>> stats.trimean(np.arange(9))
4.0
dautil.stats.wssse(point, center)

Computes the Within Set Sum of Squared Error(WSSSE).

Parameters:
  • point – A point for which to calculate the error.
  • center – The center of a cluster.
Returns:

The WSSSE.

Previous topic

dautil.report

Next topic

dautil.ts

This Page