Author: | Pierre Barbier de Reuille <pierre.barbierdereuille@gmail.com> |
---|
Module implementing kernel-based estimation of density of probability.
Given a kernel \(K\), the density function is estimated from a sampling \(X = \{X_i \in \mathbb{R}^n\}_{i\in\{1,\ldots,m\}}\) as:
where \(h\) is the bandwidth of the kernel, \(w_i\) are the weights of the data points and \(\lambda_i\) are the adaptation factor of the kernel width.
The kernel is a function of \(\mathbb{R}^n\) such that:
The constraint on the covariance is only required to provide a uniform meaning for the bandwidth of the kernel.
If the domain of the density estimation is bounded to the interval \([L,U]\), the density is then estimated with:
where \(\hat{K}\) is a modified kernel that depends on the exact method used. Currently, only 1D KDE supports bounded domains.
Perform a kernel based density estimation in 1D, possibly on a bounded domain \([L,U]\).
Parameters: |
|
---|
The calculation is separated in three parts:
- The kernel (kernel)
- The bandwidth or covariance estimation (bandwidth, covariance)
- The estimation method (method)
Bandwidth of the kernel. Can be set either as a fixed value or using a bandwidth calculator, that is a function of signature w(xdata) that returns a single value.
Note
A ndarray with a single value will be converted to a floating point value.
Compute the cdf from the lower bound to the points given as argument.
Covariance of the gaussian kernel. Can be set either as a fixed value or using a bandwidth calculator, that is a function of signature w(xdata) that returns a single value.
Note
A ndarray with a single value will be converted to a floating point value.
Evaluate the density on a grid of N points spanning the whole dataset.
Returns: | a tuple with the mesh on which the density is evaluated and the density itself |
---|
Compute the inverse cumulative distribution (quantile) function on a grid.
Kernel object. This must be an object modeled on pyqt_fit.kernels.Kernel1D. It is recommended to inherit this class to provide numerical approximation for all methods.
By default, the kernel is an instance of pyqt_fit.kernels.normal_kernel1d
Scaling of the bandwidth, per data point. It can be either a single value or an array with one value per data point.
When deleted, the lamndas are reset to 1.
Select the method to use. The method should be an object modeled on pyqt_fit.kde_methods.KDE1DMethod, and it is recommended to inherit the model.
Available methods in the pyqt_fit.kde_methods sub-module.
Default: | pyqt_fit.kde_methods.default_method |
---|
Returns the covariance matrix:
where \(\tau\) is a correcting factor that depends on the method.
The Silverman bandwidth is defined as a variance bandwidth with factor:
The Scotts bandwidth is defined as a variance bandwidth with factor:
Implementation of the KDE bandwidth selection method outline in:
Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, 2010.
Based on the implementation of Daniel B. Smith, PhD.
The object is a callable returning the bandwidth for a 1D kernel.