
# Module pyqt_fit.npr_methods¶

Author: Pierre Barbier de Reuille

Module implementing non-parametric regressions using kernel methods.

## Non-Parametric Regression Methods¶

Methods must either inherit or follow the same definition as the pyqt_fit.npr_methods.RegressionKernelMethod.

pyqt_fit.npr_methods.compute_bandwidth(reg)[source]

Compute the bandwidth and covariance for the model, based of its xdata attribute

class pyqt_fit.npr_methods.RegressionKernelMethod[source]

Base class for regression kernel methods

The following methods are interface methods that should be overriden with ones specific to the implemented method.

fit(reg)[source]

Fit the method and returns the fitted object that will be used for actual evaluation.

The object needs to call the pyqt_fit.nonparam_regression.NonParamRegression.set_actual_bandwidth() method with the computed bandwidth and covariance.

Default: Compute the bandwidth based on the real data and set it in the regression object
evaluate(points, out)[source]

Evaluate the regression of the provided points.

Parameters: points (ndarray) – 2d-array of points to compute the regression on. Each column is a point. out (ndarray) – 1d-array in which to store the result ndarray The method must return the out array, updated with the regression values

## Provided methods¶

Only extra methods will be described:

class pyqt_fit.npr_methods.SpatialAverage[source]

Perform a Nadaraya-Watson regression on the data (i.e. also called local-constant regression) using a gaussian kernel.

The Nadaraya-Watson estimate is given by:

$f_n(x) \triangleq \frac{\sum_i K\left(\frac{x-X_i}{h}\right) Y_i} {\sum_i K\left(\frac{x-X_i}{h}\right)}$

Where $$K(x)$$ is the kernel and must be such that $$E(K(x)) = 0$$ and $$h$$ is the bandwidth of the method.

Parameters: xdata (ndarray) – Explaining variables (at most 2D array) ydata (ndarray) – Explained variables (should be 1D array) cov (ndarray or callable) – If an ndarray, it should be a 2D array giving the matrix of covariance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the covariance matrix.
q[source]

Degree of the fitted polynom

correction()[source]

The correction coefficient allows to change the width of the kernel depending on the point considered. It can be either a constant (to correct globaly the kernel width), or a 1D array of same size as the input.

set_density_correction()[source]

Add a correction coefficient depending on the density of the input

class pyqt_fit.npr_methods.LocalLinearKernel1D[source]

Perform a local-linear regression using a gaussian kernel.

The local constant regression is the function that minimises, for each position:

$f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - a_1(x-X_i)\right)^2$

Where $$K(x)$$ is the kernel and must be such that $$E(K(x)) = 0$$ and $$h$$ is the bandwidth of the method.

q[source]

Degree of the fitted polynom

This class uses the following function:

pyqt_fit.py_local_linear.local_linear_1d(bw, xdata, ydata, points, kernel, out)[source]

We are trying to find the fitting for points $$x$$ given a gaussian kernel Given the following definitions:

$\begin{split}x_0 &=& x-x_i\end{split}$$\begin{split}\begin{array}{rlc|rlc} w_i &=& \mathcal{K}\left(\frac{x_0}{h}\right) & W &=& \sum_i w_i \\ X &=& \sum_i w_i x_0 & X_2 &=& w_i x_0^2 \\ Y &=& \sum_i w_i y_i & Y_2 &=& \sum_i w_i y_i x_0 \end{array}\end{split}$

The fitted value is given by:

$f(x) = \frac{X_2 T - X Y_2}{W X_2 - X^2}$
class pyqt_fit.npr_methods.LocalPolynomialKernel1D(q=3)[source]

Perform a local-polynomial regression using a user-provided kernel (Gaussian by default).

The local constant regression is the function that minimises, for each position:

$f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - a_1(x-X_i) - \ldots - a_q \frac{(x-X_i)^q}{q!}\right)^2$

Where $$K(x)$$ is the kernel such that $$E(K(x)) = 0$$, $$q$$ is the order of the fitted polynomial and $$h$$ is the bandwidth of the method. It is also recommended to have $$\int_\mathbb{R} x^2K(x)dx = 1$$, (i.e. variance of the kernel is 1) or the effective bandwidth will be scaled by the square-root of this integral (i.e. the standard deviation of the kernel).

Parameters: xdata (ndarray) – Explaining variables (at most 2D array) ydata (ndarray) – Explained variables (should be 1D array) q (int) – Order of the polynomial to fit. Default: 3 cov (float or callable) – If an float, it should be a variance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the variance. Default: scotts_covariance
q[source]

Degree of the fitted polynomials

class pyqt_fit.npr_methods.LocalPolynomialKernel(q=3)[source]

Perform a local-polynomial regression in N-D using a user-provided kernel (Gaussian by default).

The local constant regression is the function that minimises, for each position:

$f_n(x) \triangleq \argmin_{a_0\in\mathbb{R}} \sum_i K\left(\frac{x-X_i}{h}\right) \left(Y_i - a_0 - \mathcal{P}_q(X_i-x)\right)^2$

Where $$K(x)$$ is the kernel such that $$E(K(x)) = 0$$, $$q$$ is the order of the fitted polynomial, $$\mathcal{P}_q(x)$$ is a polynomial of order $$d$$ in $$x$$ and $$h$$ is the bandwidth of the method.

The polynomial $$\mathcal{P}_q(x)$$ is of the form:

$\mathcal{F}_d(k) = \left\{ \n \in \mathbb{N}^d \middle| \sum_{i=1}^d n_i = k \right\}$$\mathcal{P}_q(x_1,\ldots,x_d) = \sum_{k=1}^q \sum_{\n\in\mathcal{F}_d(k)} a_{k,\n} \prod_{i=1}^d x_i^{n_i}$

For example we have:

$\mathcal{P}_2(x,y) = a_{110} x + a_{101} y + a_{220} x^2 + a_{211} xy + a_{202} y^2$
Parameters: xdata (ndarray) – Explaining variables (at most 2D array). The shape should be (N,D) with D the dimension of the problem and N the number of points. For 1D array, the shape can be (N,), in which case it will be converted to (N,1) array. ydata (ndarray) – Explained variables (should be 1D array). The shape must be (N,). q (int) – Order of the polynomial to fit. Default: 3 kernel (callable) – Kernel to use for the weights. Call is kernel(points) and should return an array of values the same size as points. If None, the kernel will be normal_kernel(D). cov (float or callable) – If an float, it should be a variance of the gaussian kernel. Otherwise, it should be a function cov(xdata, ydata) returning the variance. Default: scotts_covariance
q[source]

Degree of the fitted polynomials

pyqt_fit.npr_methods.default_method

Defaut non-parametric regression method. :Default: LocalPolynomialKernel(q=1)

## Utility functions and classes¶

class pyqt_fit.npr_methods.PolynomialDesignMatrix1D(degree)[source]
class pyqt_fit.npr_methods.PolynomialDesignMatrix(dim, deg)[source]

Class used to create a design matrix for polynomial regression

__call__(x, out=None)[source]

Creates the design matrix for polynomial fitting using the points x.

Parameters: x (ndarray) – Points to create the design matrix. Shape must be (D,N) or (N,), where D is the dimension of the problem, 1 if not there. deg (int) – Degree of the fitting polynomial factors (ndarray) – Scaling factor for the columns of the design matrix. The shape should be (M,) or (M,1), where M is the number of columns of the out. This value can be obtained using the designMatrixSize() function. The design matrix as a (M,N) matrix.