\[\DeclareMathOperator{\erf}{erf} \DeclareMathOperator{\argmin}{argmin} \newcommand{\R}{\mathbb{R}} \newcommand{\n}{\boldsymbol{n}}\]

Module pyqt_fit.npr_methods

Author:Pierre Barbier de Reuille <pierre.barbierdereuille@gmail.com>

Module implementing non-parametric regressions using kernel methods.

Non-Parametric Regression Methods

Methods must either inherit or follow the same definition as the pyqt_fit.npr_methods.RegressionKernelMethod.


Compute the bandwidth and covariance for the model, based of its xdata attribute

class pyqt_fit.npr_methods.RegressionKernelMethod[source]

Base class for regression kernel methods

The following methods are interface methods that should be overriden with ones specific to the implemented method.


Fit the method and returns the fitted object that will be used for actual evaluation.

The object needs to call the pyqt_fit.nonparam_regression.NonParamRegression.set_actual_bandwidth() method with the computed bandwidth and covariance.

Default:Compute the bandwidth based on the real data and set it in the regression object
evaluate(points, out)[source]

Evaluate the regression of the provided points.

  • points (ndarray) – 2d-array of points to compute the regression on. Each column is a point.
  • out (ndarray) – 1d-array in which to store the result
Return type:



The method must return the out array, updated with the regression values

Provided methods

Only extra methods will be described:

class pyqt_fit.npr_methods.SpatialAverage[source]

Perform a Nadaraya-Watson regression on the data (i.e. also called local-constant regression) using a gaussian kernel.

The Nadaraya-Watson estimate is given by:

\[f_n(x) \triangleq \frac{\sum_i K\left(\frac{x-X_i}{h}\right) Y_i} {\sum_i K\left(\frac{x-X_i}{h}\right)}\]

Where \(K(x)\) is the kernel and must be such that \(E(K(x)) = 0\) and \(h\) is the bandwidth of the method.

  • xdata (ndarray) – Explaining variables (at most 2D array)
  • ydata (ndarray) – Explained variables (should be 1D array)
  • cov (ndarray or callable) – If an ndarray, it should be a 2D array giving the matrix of covariance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the covariance matrix.

Degree of the fitted polynom


The correction coefficient allows to change the width of the kernel depending on the point considered. It can be either a constant (to correct globaly the kernel width), or a 1D array of same size as the input.


Add a correction coefficient depending on the density of the input

class pyqt_fit.npr_methods.LocalLinearKernel1D[source]

Perform a local-linear regression using a gaussian kernel.

The local constant regression is the function that minimises, for each position:

\[f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - a_1(x-X_i)\right)^2\]

Where \(K(x)\) is the kernel and must be such that \(E(K(x)) = 0\) and \(h\) is the bandwidth of the method.


Degree of the fitted polynom

This class uses the following function:

pyqt_fit.py_local_linear.local_linear_1d(bw, xdata, ydata, points, kernel, out)[source]

We are trying to find the fitting for points \(x\) given a gaussian kernel Given the following definitions:

\[\begin{split}x_0 &=& x-x_i\end{split}\]\[\begin{split}\begin{array}{rlc|rlc} w_i &=& \mathcal{K}\left(\frac{x_0}{h}\right) & W &=& \sum_i w_i \\ X &=& \sum_i w_i x_0 & X_2 &=& w_i x_0^2 \\ Y &=& \sum_i w_i y_i & Y_2 &=& \sum_i w_i y_i x_0 \end{array}\end{split}\]

The fitted value is given by:

\[f(x) = \frac{X_2 T - X Y_2}{W X_2 - X^2}\]
class pyqt_fit.npr_methods.LocalPolynomialKernel1D(q=3)[source]

Perform a local-polynomial regression using a user-provided kernel (Gaussian by default).

The local constant regression is the function that minimises, for each position:

\[f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - a_1(x-X_i) - \ldots - a_q \frac{(x-X_i)^q}{q!}\right)^2\]

Where \(K(x)\) is the kernel such that \(E(K(x)) = 0\), \(q\) is the order of the fitted polynomial and \(h\) is the bandwidth of the method. It is also recommended to have \(\int_\mathbb{R} x^2K(x)dx = 1\), (i.e. variance of the kernel is 1) or the effective bandwidth will be scaled by the square-root of this integral (i.e. the standard deviation of the kernel).

  • xdata (ndarray) – Explaining variables (at most 2D array)
  • ydata (ndarray) – Explained variables (should be 1D array)
  • q (int) – Order of the polynomial to fit. Default: 3
  • cov (float or callable) – If an float, it should be a variance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the variance. Default: scotts_covariance

Degree of the fitted polynomials

class pyqt_fit.npr_methods.LocalPolynomialKernel(q=3)[source]

Perform a local-polynomial regression in N-D using a user-provided kernel (Gaussian by default).

The local constant regression is the function that minimises, for each position:

\[f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - \mathcal{P}_q(X_i-x)\right)^2\]

Where \(K(x)\) is the kernel such that \(E(K(x)) = 0\), \(q\) is the order of the fitted polynomial, \(\mathcal{P}_q(x)\) is a polynomial of order \(d\) in \(x\) and \(h\) is the bandwidth of the method.

The polynomial \(\mathcal{P}_q(x)\) is of the form:

\[\mathcal{F}_d(k) = \left\{ \n \in \mathbb{N}^d \middle| \sum_{i=1}^d n_i = k \right\}\]\[\mathcal{P}_q(x_1,\ldots,x_d) = \sum_{k=1}^q \sum_{\n\in\mathcal{F}_d(k)} a_{k,\n} \prod_{i=1}^d x_i^{n_i}\]

For example we have:

\[\mathcal{P}_2(x,y) = a_{110} x + a_{101} y + a_{220} x^2 + a_{211} xy + a_{202} y^2\]
  • xdata (ndarray) – Explaining variables (at most 2D array). The shape should be (N,D) with D the dimension of the problem and N the number of points. For 1D array, the shape can be (N,), in which case it will be converted to (N,1) array.
  • ydata (ndarray) – Explained variables (should be 1D array). The shape must be (N,).
  • q (int) – Order of the polynomial to fit. Default: 3
  • kernel (callable) – Kernel to use for the weights. Call is kernel(points) and should return an array of values the same size as points. If None, the kernel will be normal_kernel(D).
  • cov (float or callable) – If an float, it should be a variance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the variance. Default: scotts_covariance

Degree of the fitted polynomials


Defaut non-parametric regression method. :Default: LocalPolynomialKernel(q=1)

Utility functions and classes

class pyqt_fit.npr_methods.PolynomialDesignMatrix1D(degree)[source]
class pyqt_fit.npr_methods.PolynomialDesignMatrix(dim, deg)[source]

Class used to create a design matrix for polynomial regression

__call__(x, out=None)[source]

Creates the design matrix for polynomial fitting using the points x.

  • x (ndarray) – Points to create the design matrix. Shape must be (D,N) or (N,), where D is the dimension of the problem, 1 if not there.
  • deg (int) – Degree of the fitting polynomial
  • factors (ndarray) – Scaling factor for the columns of the design matrix. The shape should be (M,) or (M,1), where M is the number of columns of the out. This value can be obtained using the designMatrixSize() function.

The design matrix as a (M,N) matrix.

