Performance and Robustness Testing¶
Introduction¶
In developing statistical inference applications, it is essential to test the robustness of one’s software to errors and noise of various kinds. Thus, QInfer provides tools to do so by repeatedly running estimation tasks to measure performance, and by corrupting likelihood calculations in various realistic ways.
Testing Estimation Performance¶
Given an estimator, a common question of interest is what risk that estimator incurrs; that is, what is the expected error of the estimates reported by the estimator? This is formalized by defining the loss \(L(\hat{\vec{x}}, \vec{x})\) of an estimate \(\hat{\vec{x}}\) given a true value \(\vec{x}\). In QInfer, we adopt the quadratic loss \(L_Q\) as a default, defined as
where \(\matr{Q}\) is a positive-semidefinite matrix that establishes the relative scale of each model parameter.
We can now define the risk for an estimator as
As the risk is defined by an expectation value, we can estimate it by again Monte Carlo sampling over trials. That is, for a given true model \(\vec{x}\), we can draw many different data sets and then find the corresponding estimate for each in order to determine the risk.
Similarly, the Bayes risk \(r(\hat{\vec{x}}(\cdot), \pi)\) is defined by also taking the expectation over a prior distribution \(\pi\),
The Bayes risk can thus be estimated by drawing a new set of true model
parameters with each trial.
QInfer implements risk and Bayes risk estimation by providing
a function perf_test()
which simulates a single trial
with a given model, an
experiment design heuristic, and either
a true model parameter vector or a prior distribution. The
perf_test_multiple()
function then collects the results
of perf_test()
for many trials, reporting an array of
performance metrics that can be used to quickly compute the risk and
Bayes risk.
For example, we can use performance testing to evaluate the Bayes
risk as a function of the number of particles used in order to
determine quality parameters in experimental practice. Consider
the simple precession model (SimplePrecessionModel
)
under an exponentially sparse sampling heuristic
(ExpSparseHeuristic
). Then, we can test the performance
of estimating the precession frequency for several different
values of n_particles
:
model = SimplePrecessionModel()
prior = UniformDistribution([0, 1])
heuristic_class = ExpSparseHeuristic
for n_particles in [100, 200, 400]:
perf = perf_test_multiple(
n_trials=50,
model=model, n_particles=n_particles, prior=prior,
n_exp=50, heuristic_class=heuristic_class
)
# The array returned by perf_test_multiple has
# shape (n_trials, n_exp), so to take the mean over
# trials (Bayes risk), we need to average over the
# zeroth axis.
bayes_risk = perf['loss'].mean(axis=0)
plt.semilogy(bayes_risk, label='{} particles'.format(n_particles))
plt.xlabel('# of Measurements')
plt.ylabel('Bayes Risk')
plt.legend()
plt.show()
(Source code, svg, pdf, hires.png, png)
We can also pass fixed model parameter vectors to evaluate the risk (that is, not the Bayes risk) as a function of the true model:
model = SimplePrecessionModel()
prior = UniformDistribution([0, 1])
heuristic_class = ExpSparseHeuristic
n_exp = 50
omegas = np.linspace(0.1, 0.9, 6)
risks = np.empty_like(omegas)
for idx_omega, omega in enumerate(omegas[:, np.newaxis]):
perf = perf_test_multiple(
n_trials=100,
model=model, n_particles=400, prior=prior,
n_exp=n_exp, heuristic_class=heuristic_class,
true_mps=omega
)
# We now only take the loss after the last
# measurement (indexed by -1 along axis 1).
risks[idx_omega] = perf['loss'][:, -1].mean(axis=0)
plt.semilogy(omegas, risks)
plt.xlabel(r'$\omega$')
plt.ylabel('Risk')
plt.show()
(Source code, svg, pdf, hires.png, png)
Robustness Testing¶
Incorrect Priors and Models¶
The perf_test()
and perf_test_multiple()
functions also allow for testing the effect of “bad” prior
assumptions, and of using the “wrong” model for estimation.
In particular, the true_prior
and true_model
arguments
allow for testing the effect of using a different prior or
model for performing estimation than for simulating data.
Modeling Faulty or Noisy Simulators¶
In addition, QInfer allows for testing robustness
against errors in the model itself by using
PoisonedModel
.
This derived model
adds noise to a model’s likelihood()
method in such a way as to simulate sampling errors incurred in
likelihood-hood free parameter estimation (LFPE) approaches
[FG13]. The noise that PoisonedModel
adds can be specified as the
tolerance of an adaptive likelihood estimation (ALE) step [FG13], or as the number
of samples and hedging used for a hedged maximum likelihood estimator of
the likelihood [FB12]. In either case, the requested noise is added to the
likelihood reported by the underlying model, such that
where \(\widehat{\Pr}\) is the reported estimate of the true likelihood.
For example, to simulate using adaptive likelihood estimation to reach a threshold tolerance of 0.01:
>>> from qinfer import SimplePrecessionModel, PoisonedModel
>>> model = PoisonedModel(SimplePrecessionModel(), tol=0.01)
We can then use perf_test_multiple()
as above to
quickly test the effect of noise in the likelihood function on
the Bayes risk.
models = [
SimplePrecessionModel(),
PoisonedModel(SimplePrecessionModel(), tol=0.25)
]
prior = UniformDistribution([0, 1])
heuristic_class = ExpSparseHeuristic
for model in models:
perf = perf_test_multiple(
n_trials=50,
model=model, n_particles=400, prior=prior,
true_model=models[0],
n_exp=50, heuristic_class=heuristic_class
)
bayes_risk = perf['loss'].mean(axis=0)
plt.semilogy(bayes_risk, label=type(model).__name__)
plt.xlabel('# of Measurements')
plt.ylabel('Bayes Risk')
plt.legend()
plt.show()
(Source code, svg, pdf, hires.png, png)