\[\DeclareMathOperator{\erf}{erf} \DeclareMathOperator{\argmin}{argmin} \newcommand{\R}{\mathbb{R}} \newcommand{\n}{\boldsymbol{n}}\]

Module pyqt_fit.bootstrap

Author:Pierre Barbier de Reuille <pierre.barbierdereuille@gmail.com>

This modules provides function for bootstrapping a regression method.

Bootstrap Shuffling Methods

pyqt_fit.bootstrap.bootstrap_residuals(fct, xdata, ydata, repeats=3000, residuals=None, add_residual=None, correct_bias=False, **kwrds)[source]

This implements the residual bootstrapping method for non-linear regression.

Parameters:
  • fct (callable) – Function evaluating the function on xdata at least with fct(xdata)
  • xdata (ndarray of shape (N,) or (k,N) for function with k predictors) – The independent variable where the data is measured
  • ydata (ndarray) – The dependant data
  • residuals (ndarray or callable or None) – Residuals for the estimation on each xdata. If callable, the call will be residuals(ydata, yopt).
  • repeats (int) – Number of repeats for the bootstrapping
  • add_residual (callable or None) – Function that add a residual to a value. The call add_residual(yopt, residual) should return the new ydata, with the residuals ‘applied’. If None, it is considered the residuals should simply be added.
  • correct_bias (boolean) – If true, the additive bias of the residuals is computed and restored
  • kwrds (dict) – Dictionnary present to absorbed unknown named parameters
Return type:

(ndarray, ndarray)

Returns:

1. xdata, with a new axis at position -2. This correspond to the ‘shuffled’ xdata (as they are not shuffled here)

2.Second item is the shuffled ydata. There is a line per repeat, each line is shuffled independently.

pyqt_fit.bootstrap.bootstrap_regression(fct, xdata, ydata, repeats=3000, **kwrds)[source]

This implements the shuffling of standard bootstrapping method for non-linear regression.

Parameters:
  • fct (callable) – This is the function to optimize
  • xdata (ndarray of shape (N,) or (k,N) for function with k predictors) – The independent variable where the data is measured
  • ydata (ndarray) – The dependant data
  • repeats (int) – Number of repeats for the bootstrapping
  • kwrds (dict) – Dictionnary to absorbed unknown named parameters
Return type:

(ndarray, ndarray)

Returns:

1. The shuffled x data. The axis -2 has one element per repeat, the other axis are shuffled independently.

2. The shuffled ydata. There is a line per repeat, each line is shuffled independently.

Main Boostrap Functions

pyqt_fit.bootstrap.bootstrap(fit, xdata, ydata, CI, shuffle_method=<function bootstrap_residuals at 0x2b87d5d48c08>, shuffle_args=(), shuffle_kwrds={}, repeats=3000, eval_points=None, full_results=False, nb_workers=None, extra_attrs=(), fit_args=(), fit_kwrds={})[source]

This function implement the bootstrap algorithm for a regression algorithm. It is capable of spreading the load across many threads using shared memory and the multiprocess module.

Parameters:
  • fit (callable) –

    Method used to compute regression. The call is:

    f = fit(xdata, ydata, *fit_args, **fit_kwrds)
    

    Fit should return an object that would evaluate the regression on a set of points. The next call will be:

    f(eval_points)
    
  • xdata (ndarray of shape (N,) or (k,N) for function with k predictors) – The independent variable where the data is measured
  • ydata (ndarray) – The dependant data
  • CI (tuple of float) – List of percentiles to extract
  • shuffle_method (callable) –

    Create shuffled dataset. The call is:

    shuffle_method(xdata, ydata, y_est, repeat=repeats, *shuffle_args,
                   **shuffle_kwrds)
    

    where y_est is the estimated dependant variable on the xdata.

  • shuffle_args (tuple) – List of arguments for the shuffle method
  • shuffle_kwrds (dict) – Dictionnary of arguments for the shuffle method
  • repeats (int) – Number of repeats for the bootstraping
  • eval_points (ndarray or None) – List of points to evaluate. If None, eval_point is xdata.
  • full_results (bool) – if True, output also the whole set of evaluations
  • nb_worders – Number of worker threads. If None, the number of detected CPUs will be used. And if 1 or less, a single thread will be used.
  • extra_attrs (tuple of str) – List of attributes of the fitting method to extract on top of the y values for confidence intervals
  • fit_args (tuple) – List of extra arguments for the fit callable
  • fit_kwrds (dict) – Dictionnary of extra named arguments for the fit callable
Return type:

BootstrapResult

Returns:

Estimated y on the data, on the evaluation points, the requested confidence intervals and, if requested, the shuffled X, Y and the full estimated distributions.

class pyqt_fit.bootstrap.BootstrapResult(y_fit, y_est, y_eval, CIs, shuffled_xs, shuffled_ys, full_results)

Note

This is a class created with pyqt_fit.utils.namedtuple().

y_fit

Estimator object, fitted on the original data :type: fun(xs) -> ys

y_est

Y estimated on xdata :type: ndarray

eval_points

Points on which the confidence interval are evaluated

y_eval

Y estimated on eval_points

CIs_val

Tuple containing the list of percentiles extracted (i.e. this is a copy of the CIs argument of the bootstrap function.

CIs

List of confidence intervals. The first element is for the estimated values on eval_points. The others are for the extra attributes specified in extra_attrs. Each array is a 3-dimensional array (Q,2,N), where Q is the number of confidence interval (e.g. the length of CIs_val) and N is the number of data points. Values (x,0,y) give the lower bounds and (x,1,y) the upper bounds of the confidence intervals.

shuffled_xs

if full_results is True, the shuffled x’s used for the bootstrapping

shuffled_ys

if full_results is True, the shuffled y’s used for the bootstrapping

full_results

if full_results is True, the estimated y’s for each shuffled_ys

Table Of Contents

Previous topic

Module pyqt_fit.curve_fitting

Next topic

Module pyqt_fit.nonparam_regression

This Page