Statistical functions and utilities.
Implements the concept of a box we know from box plots.
Calculates the Inter Quartile Range.
Returns: | The interquartile range of the data. |
---|
>>> import numpy as np
>>> from dautil import stats
>>> arr = np.array([7, 15, 36, 39, 40, 41])
>>> box = stats.Box(arr)
>>> box.calc_iqr()
25
Calculates the number (0 or more) of inter quartile ranges (IQR) from the box formed by the first and third quartile.
Returns: | A float representing the number of IQRs |
---|
Wraps scipy.stats distribution classes. Most of the methods are only appropriate for continuous distributions.
Variables: |
|
---|
Describes the residuals of a fit to a distribution. Only appropriate for continuous distributions.
Returns: | Statistics for the residuals as a dict. |
---|
Computes the residuals of a fit to a distribution. Only appropriate for continuous distributions.
Returns: | The residuals of the fit. |
---|
Fits data to a distribution. Only appropriate for continuous distributions.
Returns: | The result of the fit. |
---|
Computes the mean absolute deviation.
Returns: | The mean absolute deviation. |
---|
Computes the probability distribution function. Only appropriate for continuous distributions.
Returns: | The probability distribution function. |
---|
Plots a histogram of the data and a fit. Only appropriate for continuous distributions.
Computes the root mean square error of the distribution fit. Only appropriate for continuous distributions.
Returns: | The RMSE of the fit. |
---|
Generates random values.
Returns: | The generated values. |
---|
Splits data into test and train data.
Parameters: |
|
---|---|
Returns: | Two arrays - the train data and the test data. |
Computes the confidence interval.
Parameters: |
|
---|---|
Returns: | The confidence interval. |
Clips values using limits for values to be considered outliers.
Parameters: | arr – An array containing numbers. |
---|---|
Returns: | A clipped values array. |
>>> import numpy as np
>>> from dautil import stats
>>> arr = list(range(5))
>>> outliers = [-100, 100]
>>> arr.extend(outliers)
>>> arr
[0, 1, 2, 3, 4, -100, 100]
>>> stats.clip_outliers(arr)
array([ 0., 1., 2., 3., 4., -4., 8.])
Jackknifes an array with a supplied function.
Parameters: |
|
---|---|
Returns: | Three numbers - the function value for arr, the lower limit of the confidence interval and the upper limit of the confidence interval. |
Computes the mean absolute percentage error. Non-finite values due to zero division are ignored.
Parameters: |
|
---|---|
Returns: | The MAPE. |
Computes the mean percentage error. Non-finite values due to zero division are ignored.
Parameters: |
|
---|---|
Returns: | The MPE. |
Computes the mean squared error.
Parameters: |
|
---|---|
Returns: | The MSE. |
Gets the limits given an array for values to be considered outliers.
Parameters: |
|
---|---|
Returns: | A namedtuple with upper and lower limits. |
>>> import numpy as np
>>> from dautil import stats
>>> stats.outliers(a)
Outlier(min=-48.5, max=149.5)
Computes the residuals squares sum (RSS).
Parameters: |
|
---|---|
Returns: | The RSS. |
Uses a rule of thumb to calculate the appropriate number of bins for an array.
Parameters: | arr – An array containing numbers. |
---|---|
Returns: | An integer to serve as the number of bins. |
Calculates the trimean.
Parameters: | arr – An array containing numbers. |
---|---|
Returns: | The trimean for the array. |
>>> import numpy as np
>>> from dautil import stats
>>> stats.trimean(np.arange(9))
4.0
Computes the Within Set Sum of Squared Error(WSSSE).
Parameters: |
|
---|---|
Returns: | The WSSSE. |