abcpy package¶
This reference given details about the API of modules, classes and functions included in ABCpy.
abcpy.approx_lhd module¶
-
class
abcpy.approx_lhd.
Approx_likelihood
(statistics_calc)[source]¶ Bases:
object
This abstract base class defines the approximate likelihood function. To approximate the likelihood function at a parameter value given observed dataset, we need to pass a dataset simulated from model set at the parameter value and the observed dataset.
-
__init__
(statistics_calc)[source]¶ The constructor of a sub-class must accept a non-optional statistics calculator, which is stored to self.statistics_calc.
Parameters: statistics_calc (abcpy.stasistics.Statistics) – Statistics extractor object that conforms to the Statistics class.
-
likelihood
(y_obs, y_sim)[source]¶ To be overwritten by any sub-class: should compute the approximate likelihood value given the observed dataset y_obs and dataset y_sim simulated from model set at the parameter value.
Parameters: - y_obs (Python list) – Observed data set.
- y_sim (Python list) – Simulated data set from model at the parameter value.
Returns: Computed approximate likelihood.
Return type: float
-
-
class
abcpy.approx_lhd.
SynLiklihood
(statistics_calc)[source]¶ Bases:
abcpy.approx_lhd.Approx_likelihood
This class implements the aproximate likelihood function which computes the pproximate likelihood using the synthetic likelihood approach described in Wood [1]. For synthetic likelihood approximation, we compute the robust precision matrix using Ledoit and Wolf’s [2] method.
[1] S. N. Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104, Aug. 2010.
[2] O. Ledoit and M. Wolf, A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices, Journal of Multivariate Analysis, Volume 88, Issue 2, pages 365-411, February 2004.
-
class
abcpy.approx_lhd.
PenLogReg
(statistics_calc, model, n_simulate, n_folds=10, max_iter=100000, seed=None)[source]¶ Bases:
abcpy.approx_lhd.Approx_likelihood
This class implements the aproximate likelihood function which computes the pproximate likelihood upto a constant using penalized logistic regression described in Dutta et. al. [1]. It takes one additional function handler defining the true model and two additional parameters n_folds and n_simulate correspondingly defining number of folds used to estimate prediction error using cross-validation and the number of simulated dataset sampled from each parameter to approximate the likelihood function. For lasso penalized logistic regression we use glmnet of Friedman et. al. [2].
[1] Reference: R. Dutta, J. Corander, S. Kaski, and M. U. Gutmann. Likelihood-free inference by penalised logistic regression. arXiv:1611.10242, Nov. 2016.
[2] Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
-
__init__
(statistics_calc, model, n_simulate, n_folds=10, max_iter=100000, seed=None)[source]¶ Parameters: - statistics_calc (abcpy.stasistics.Statistics) – Statistics extractor object that conforms to the Statistics class.
- model (abcpy.models.Model) – Model object that conforms to the Model class.
- n_simulate (int) – Number of data points in the simulated data set.
- n_folds (int, optional) – Number of folds for cross-validation. The default value is 10.
- max_iter (int, optional) – Maximum passes over the data. The default is 100000.
- seed (int, optional) – Seed for the random number generator. The used glmnet solver is not deterministic, this seed is used for determining the cv folds. The default value is None.
-
abcpy.backends module¶
-
class
abcpy.backends.
Backend
[source]¶ Bases:
object
This is the base class for every parallelization backend. It essentially resembles the map/reduce API from Spark.
An idea for the future is to implement a MPI version of the backend with the hope to be more complient with standard HPC infrastructure and a potential speed-up.
-
parallelize
(list)[source]¶ This method distributes the list on the available workers and returns a reference object.
The list should be split into number of workers many parts. Each part should then be sent to a separate worker node.
Parameters: list (Python list) – the list that should get distributed on the worker nodes Returns: A reference object that represents the parallelized list Return type: PDS class (parallel data set)
-
broadcast
(object)[source]¶ Send object to all worker nodes without splitting it up.
Parameters: object (Python object) – An abitrary object that should be available on all workers Returns: A reference to the broadcasted object Return type: BDS class (broadcast data set)
-
map
(func, pds)[source]¶ A distributed implementation of map that works on parallel data sets (PDS).
On every element of pds the function func is called.
Parameters: - func (Python func) – A function that can be applied to every element of the pds
- pds (PDS class) – A parallel data set to which func should be applied
Returns: a new parallel data set that contains the result of the map
Return type: PDS class
-
-
class
abcpy.backends.
BackendDummy
[source]¶ Bases:
abcpy.backends.Backend
This is a dummy parallelization backend, meaning it doesn’t parallelize anything. It is mainly implemented for testing purpose.
-
parallelize
(python_list)[source]¶ This actually does nothing: it just wraps the Python list into dummy pds (PDSDummy).
Parameters: python_list (Python list) – Returns: Return type: PDSDummy (parallel data set)
-
broadcast
(object)[source]¶ This actually does nothing: it just wraps the object into BDSDummy.
Parameters: object (Python object) – Returns: Return type: BDSDummy class
-
map
(func, pds)[source]¶ This is a wrapper for the Python internal map function.
Parameters: - func (Python func) – A function that can be applied to every element of the pds
- pds (PDSDummy class) – A pseudo-parallel data set to which func should be applied
Returns: a new pseudo-parallel data set that contains the result of the map
Return type: PDSDummy class
-
-
class
abcpy.backends.
PDSDummy
(python_list)[source]¶ Bases:
abcpy.backends.PDS
This is a wrapper for a Python list to fake parallelization.
-
class
abcpy.backends.
BDSDummy
(object)[source]¶ Bases:
abcpy.backends.BDS
This is a wrapper for a Python object to fake parallelization.
-
class
abcpy.backends.
BackendSpark
(sparkContext, parallelism=4)[source]¶ Bases:
abcpy.backends.Backend
A parallelization backend for Apache Spark. It is essetially a wrapper for the required Spark functionality.
-
__init__
(sparkContext, parallelism=4)[source]¶ Initialize the backend with an existing and configured SparkContext.
Parameters: - sparkContext (pyspark.SparkContext) – an existing and fully configured PySpark context
- parallelism (int) – defines on how many workers a distributed dataset can be distributed
-
parallelize
(python_list)[source]¶ This is a wrapper of pyspark.SparkContext.parallelize().
Parameters: list (Python list) – list that is distributed on the workers Returns: A reference object that represents the parallelized list Return type: PDSSpark class (parallel data set)
-
broadcast
(object)[source]¶ This is a wrapper for pyspark.SparkContext.broadcast().
Parameters: object (Python object) – An abitrary object that should be available on all workers Returns: A reference to the broadcasted object Return type: BDSSpark class (broadcast data set)
-
map
(func, pds)[source]¶ This is a wrapper for pyspark.rdd.map()
Parameters: - func (Python func) – A function that can be applied to every element of the pds
- pds (PDSSpark class) – A parallel data set to which func should be applied
Returns: a new parallel data set that contains the result of the map
Return type: PDSSpark class
-
-
class
abcpy.backends.
PDSSpark
(rdd)[source]¶ Bases:
abcpy.backends.PDS
This is a wrapper for Apache Spark RDDs.
-
class
abcpy.backends.
BDSSpark
(bcv)[source]¶ Bases:
abcpy.backends.BDS
This is a wrapper for Apache Spark Broadcast variables.
abcpy.distances module¶
-
class
abcpy.distances.
Distance
(statistics_calc)[source]¶ Bases:
object
This abstract base class defines how the distance between the observed and simulated data should be implemented.
-
__init__
(statistics_calc)[source]¶ The constructor of a sub-class must accept a non-optional statistics calculator as a parameter. If stored to self.statistics_calc, the private helper method _calculate_summary_stat can be used.
Parameters: statistics_calc (abcpy.stasistics.Statistics) – Statistics extractor object that conforms to the Statistics class.
-
distance
(d1, d2)[source]¶ To be overwritten by any sub-class: should calculate the distance between two sets of data d1 and d2 using their respective statistics.
Notes
The data sets d1 and d2 are array-like structures that contain n1 and n2 data points each. An implementation of the distance function should work along the following steps:
1. Transform both input sets dX = [ dX1, dX2, ..., dXn ] to sX = [sX1, sX2, ..., sXn] using the statistics object. See _calculate_summary_stat method.
2. Calculate the mutual desired distance, here denoted by -, between the statstics dist = [s11 - s21, s12 - s22, ..., s1n - s2n].
Important: any sub-class must not calculate the distance between data sets d1 and d2 directly. This is the reason why any sub-class must be initialized with a statistics object.
Parameters: - d1 (Python list) – Contains n1 data points.
- d2 (Python list) – Contains n2 data points.
Returns: The distance between the two input data sets.
Return type: numpy.ndarray
-
dist_max
()[source]¶ To be overwritten by sub-class: should return maximum possible value of the desired distance function.
Examples
If the desired distance maps to \(\mathbb{R}\), this method should return numpy.inf.
Returns: The maximal possible value of the desired distance function. Return type: numpy.float
-
-
class
abcpy.distances.
Euclidean
(statistics)[source]¶ Bases:
abcpy.distances.Distance
This class implements the Euclidean distance between two vectors.
The maximum value of the distance is np.inf.
-
class
abcpy.distances.
PenLogReg
(statistics)[source]¶ Bases:
abcpy.distances.Distance
This class implements a distance mesure based on the classification accuracy.
The classification accuracy is calculated between two dataset d1 and d2 using lasso penalized logistics regression and return it as a distance. The lasso penalized logistic regression is done using glmnet package of Friedman et. al. [2]. While computing the distance, the algorithm automatically chooses the most relevant summary statistics as explained in Gutmann et. al. [1]. The maximum value of the distance is 1.0.
[1] Gutmann, M., Dutta, R., Kaski, S., and Corander, J. (2014). Statistical inference of intractable generative models via classification. arXiv:1407.4981.
[2] Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
-
class
abcpy.distances.
LogReg
(statistics)[source]¶ Bases:
abcpy.distances.Distance
This class implements a distance mesure based on the classification accuracy [1]. The classification accuracy is calculated between two dataset d1 and d2 using logistics regression and return it as a distance. The maximum value of the distance is 1.0.
[1] Gutmann, M., Dutta, R., Kaski, S., and Corander, J. (2014). Statistical inference of intractable generative models via classification. arXiv:1407.4981.
abcpy.distributions module¶
-
class
abcpy.distributions.
Distribution
[source]¶ Bases:
object
This abstract base class represents a distribution. It can be used e.g. as a prior for models.
-
set_parameters
(params)[source]¶ To be overwritten by any sub-class: should set the parameters of the distribution.
Parameters: theta (list) – Contains all the distributions parameters.
-
reseed
(seed)[source]¶ To be overwritten by any sub-class: reseed the random number generator with provided seed.
Parameters: seed (integer) – New seed for the random number generator
-
-
class
abcpy.distributions.
MultiNormal
(mean, cov, seed=None)[source]¶ Bases:
abcpy.distributions.Distribution
This class implements a p-dimensional multivariate Normal distribution.
-
class
abcpy.distributions.
Uniform
(lb: <MagicMock id='140214194910488'>, ub: <MagicMock id='140214194931304'>, seed=None)[source]¶ Bases:
abcpy.distributions.Distribution
This class implements a p-dimensional uniform Prior distribution in a closed interval.
-
__init__
(lb: <MagicMock id='140214194910488'>, ub: <MagicMock id='140214194931304'>, seed=None)[source]¶ Defines the upper and lower bounds of a p-dimensional uniform Prior distribution in a closed interval.
Parameters: - lb (numpy.ndarray or a list) – Vector containing p lower bounds
- ub (numpy.ndarray or a list) – Vector containing p upper bounds
- seed (integer) – Initial seed for the random number generator
-
-
class
abcpy.distributions.
MultiStudentT
(mean, cov, df, seed=None)[source]¶ Bases:
abcpy.distributions.Distribution
This class implements a p-dimensional multivariate Student T distribution.
-
__init__
(mean, cov, df, seed=None)[source]¶ Defines the mean, co-variance and degrees of freedom a p-dimensional multivariate Student T distribution.
Parameters: - mean (numpy.ndarray) – Vector containing p means, one for every dimension
- cov (numpy.ndarray) – pxp matrix containing the co-variance matrix
- df (np.uint) – Degrees of freedom
-
-
class
abcpy.distributions.
Normal
(mean, var, seed=None)[source]¶ Bases:
abcpy.distributions.Distribution
This class implements a 1-dimensional Normal distribution.
abcpy.models module¶
-
class
abcpy.models.
Model
(prior, seed=None)[source]¶ Bases:
object
This abstract class represents the model and forces the implementation of certain methods required by the framework.
-
__init__
(prior, seed=None)[source]¶ The constructor must be overwritten by a sub-class to initialize the model with a given prior.
The standard behaviour is that concrete model parameters are sampled from the provided prior. However, it is alo possible for the constructor to provide optional (!) model parameters. In the latter case, the model should be initialized by the provided parameters instead from sampling from the prior.
Parameters: - prior (abcpy.distributions.Distribution) – A prior distribution
- seed (int, optional) – Optional initial seed for the random number generator that can be used in the model. The default value is generated randomly.
-
set_parameters
(theta)[source]¶ This method properly sets the parameters of the model and must be overwritten by a sub-class.
Notes
Make sure to test whether the provided parameters are compatible with the model. Return true if the parameters are accepted by the model and false otherwise. This behavior is expected e.g. by the inference schemes.
Parameters: theta – An array-like structure containing the p parameter of the model, where theta[0] is the first and theta[p-1] is the last parameter. Returns: TRUE if model accepts the provided parameters, FALSE otherwise Return type: boolean
-
sample_from_prior
()[source]¶ To be overwritten by any sub-class: should resample the model parameters from the prior distribution.
-
-
class
abcpy.models.
Gaussian
(prior, mu=None, sigma=None, seed=None)[source]¶ Bases:
abcpy.models.Model
This class implements the Gaussian model with unknown mean \(\mu\) and unknown standard deviation \(\sigma\).
-
__init__
(prior, mu=None, sigma=None, seed=None)[source]¶ Parameters: - prior (abcpy.distributions.Distribution) – Prior distribution
- mu (float, optional) – Mean of the Gaussian distribution. If the parameters is omitted, sampled from the prior.
- sigma (float, optional) – Standard deviation of the Gaussian distribution. If the parameters is omitted, sampled from the prior.
- seed (int, optional) – Initial seed. The default value is generated randomly.
-
-
class
abcpy.models.
Student_t
(prior, mu=None, df=None, seed=None)[source]¶ Bases:
abcpy.models.Model
This class implements the Student_t distribution with unknown mean \(\mu\) and unknown degrees of freedom.
-
__init__
(prior, mu=None, df=None, seed=None)[source]¶ Parameters: - prior (abcpy.distributions.Distribution) – Prior distribution
- mu (float, optional) – Mean of the Stundent_t distribution. If the parameters is omitted, sampled from the prior.
- df (float, optional) – The degrees of freedom of the Student_t distribution. If the parameters is omitted, sampled from the prior.
- seed (int, optional) – Initial seed. The default value is generated randomly.
-
-
class
abcpy.models.
MixtureNormal
(prior, mu, seed=None)[source]¶ Bases:
abcpy.models.Model
This class implements the Mixture of multivariate normal ditribution with unknown mean \(\mu\) described as following, \(x|\mu \sim 0.5\mathcal{N}(\mu,I_p)+0.5\mathcal{N}(\mu,0.01I_p)\), where \(x=(x_1,x_2,\ldots,x_p)\) is the dataset simulated from the model and mean is \(\mu=(\mu_1,\mu_2,\ldots,\mu_p)\).
-
__init__
(prior, mu, seed=None)[source]¶ Parameters: - prior (abcpy.distributions.Distribution) – Prior distribution
- mu (numpy.ndarray or list, optional) – Mean of the mixture normal. If the parameter is omitted, sampled from the prior.
- seed (int, optional) – Initial seed. The default value is generated randomly.
-
-
class
abcpy.models.
StochLorenz95
(prior, theta, initial_state=None, n_timestep=160, seed=None)[source]¶ Bases:
abcpy.models.Model
Generates time dependent ‘slow’ weather variables following forecast model of Wilks [1], a stochastic reparametrization of original Lorenz model Lorenz [2].
[1] Wilks, D. S. (2005). Effects of stochastic parametrizations in the lorenz ’96 system. Quarterly Journal of the Royal Meteorological Society, 131(606), 389–407.
[2] Lorenz, E. (1995). Predictability: a problem partly solved. In Proceedings of the Seminar on Predictability, volume 1, pages 1–18. European Center on Medium Range Weather Forecasting, Europe
-
__init__
(prior, theta, initial_state=None, n_timestep=160, seed=None)[source]¶ Parameters: - prior (abcpy.distributions.Distribution) – Prior distribution
- theta (list or numpy.ndarray, optional) – Closure parameters. If the parameter is omitted, sampled from the prior.
- initial_state (numpy.ndarray, optional) – Initial state value of the time-series, The default value is None, which assumes a previously computed value from a full Lorenz model as the Initial value.
- n_timestep (int, optional) – Number of timesteps between [0,4], where 4 corresponds to 20 days. The default value is 160.
- seed (int, optional) – Initial seed. The default value is generated randomly.
-
-
class
abcpy.models.
Ricker
(prior, theta=None, n_timestep=100, seed=None)[source]¶ Bases:
abcpy.models.Model
Ecological model that describes the observed size of animal population over time described in [1].
[1] S. N. Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104, Aug. 2010.
-
__init__
(prior, theta=None, n_timestep=100, seed=None)[source]¶ Parameters: - prior (abcpy.distributions.Distribution) – Prior distribution
- theta (list or numpy.ndarray, optional) – The parameter is a vector consisting of three numbers \(\log r\) (real number), \(\sigma\) (positive real number, > 0), \(\phi\) (positive real number > 0) If the parameter is ommitted, sampled from the prior.
- n_timestep (int, optional) – Number of timesteps. The default value is 100.
- seed (int, optional) – Initial seed. The default value is generated randomly.
-
abcpy.output module¶
-
class
abcpy.output.
Journal
(type)[source]¶ Bases:
object
The journal holds information created by the run of inference schemes.
It can be configured to even hold intermediate.
-
parameters
¶ numpy.array – a nxpxt matrix
-
weights
¶ numpy.array – a nxt matrix
-
opt_value
¶ numpy.array – nxp matrix containing for each parameter the evaluated objective function for every time step
-
configuration
¶ Python dictionary – dictionary containing the schemes configuration parameters
-
__init__
(type)[source]¶ Initializes a new output journal of given type.
Parameters: type (int (identifying type)) – type=0 only logs final parametersa and weight (production use); type=1 logs all generated information (reproducibily use).
-
classmethod
fromFile
(filename)[source]¶ This method reads a saved journal from disk an returns it as an object.
Notes
To store a journal use Journal.save(filename).
Parameters: filename (string) – The string representing the location of a file Returns: The journal object serialized in <filename> Return type: abcpy.output.Journal Example
>>> jnl = Journal.fromFile('example_output.jnl')
-
add_parameters
(params)[source]¶ Saves provided parameters by appending them to the journal. If type==0, old parameters get overwritten.
Parameters: params (numpy.array) – nxp matrix containing n parameters of dimension p
-
get_parameters
(iteration=None)[source]¶ Returns the parameters from a sampling scheme.
For intermediate results, pass the iteration.
Parameters: iteration (int) – specify the iteration for which to return parameters
-
get_weights
(iteration=None)[source]¶ Returns the weights from a sampling scheme.
For intermediate results, pass the iteration.
Parameters: iteration (int) – specify the iteration for which to return weights
-
add_weights
(weights)[source]¶ Saves provided weights by appending them to the journal. If type==0, old weights get overwritten.
Parameters: weights (numpy.array) – vector containing n weigths
-
add_opt_values
(opt_values)[source]¶ Saves provided values of the evaluation of the schemes objective function. If type==0, old values get overwritten
Parameters: opt_value (numpy.array) – vector containing n evaluations of the schemes objective function
-
save
(filename)[source]¶ Stores the journal to disk.
Parameters: filename (string) – the location of the file to store the current object to.
-
posterior_mean
()[source]¶ Computes posterior mean from the samples drawn from posterior distribution
Returns: posterior mean Return type: np.ndarray
-
abcpy.inferences module¶
-
class
abcpy.inferences.
RejectionABC
(model, distance, backend, seed=None)[source]¶ Bases:
object
This base class implements the rejection algorithm based inference scheme [1] for Approximate Bayesian Computation.
[1] Tavaré, S., Balding, D., Griffith, R., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145(2), 505–518 (1997).
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- distance (abcpy.distances.Distance) – Distance object that conforms to the Distance class.
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, n_samples, n_samples_per_param, epsilon, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (python list) – The observed data set.
- n_samples (integer) – Number of samples to generate.
- n_samples_per_param (integer) – Number of data points in each simulated dataset.
- epsilon (float) – Value of threshold.
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: a journal containing simulation results, metadata and optionally intermediate results.
Return type:
-
class
abcpy.inferences.
PMCABC
(model, distance, kernel, backend, seed=None)[source]¶ Bases:
object
This base class implements a modified version of Population Monte Carlo based inference scheme for Approximate Bayesian computation of Beaumont et. al. [1]. Here the threshold value at t-th generation are adaptively chosen by taking the maximum between the epsilon_percentile-th value of discrepancies of the accepted parameters at t-1-th generation and the threshold value provided for this generation by the user. If we take the value of epsilon_percentile to be zero (default), this method becomes the inference scheme described in [1], where the threshold values considered at each generation are the ones provided by the user.
[1] M. A. Beaumont. Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution, and Systematics, 41(1):379–406, Nov. 2010.
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- distance (abcpy.distances.Distance) – Distance object that conforms to the Distance class.
- kernel (abcpy.distributions.Distribution) – Distribution object defining the perturbation kernel needed for the sampling
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, steps, epsilon_init, n_samples=10000, n_samples_per_param=1, epsilon_percentile=0, covFactor=2, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (numpy.ndarray) – Observed data.
- steps (integer) – Number of iterations in the sequential algoritm (“generations”)
- epsilon_init (numpy.ndarray) – An array of proposed values of epsilon to be used at each steps. Can be supplied A single value to be used as the threshold in Step 1 or a steps-dimensional array of values to be used as the threshold in evry steps.
- n_samples (integer, optional) – Number of samples to generate. The default value is 10000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
- epsilon_percentile (float, optional) – A value between [0, 100]. The default value is 0, meaning the threshold value provided by the user being used.
- covFactor (float, optional) – scaling parameter of the covariance matrix. The default value is 2 as considered in [1].
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: A journal containing simulation results, metadata and optionally intermediate results.
Return type:
-
class
abcpy.inferences.
PMC
(model, likfun, kernel, backend, seed=None)[source]¶ Bases:
object
Population Monte Carlo based inference scheme of Cappé et. al. [1].
This algorithm assumes a likelihood function is available and can be evaluated at any parameter value given the oberved dataset. In absence of the likelihood function or when it can’t be evaluated with a rational computational expenses, we use the approximated likleihood functions in abcpy.approx_lhd module, for which the argument of the consistency of the inference schemes are based on Andrieu and Roberts [2].
[1] Cappé, O., Guillin, A., Marin, J.-M., and Robert, C. P. (2004). Population Monte Carlo. Journal of Computational and Graphical Statistics, 13(4), 907–929.
[2] C. Andrieu and G. O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations. Annals of Statistics, 37(2):697–725, 04 2009.
-
__init__
(model, likfun, kernel, backend, seed=None)[source]¶ Constructor of PMC inference schemes.
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class
- likfun (abcpy.approx_lhd.Approx_likelihood) – Approx_likelihood object that conforms to the Approx_likelihood class
- kernel (abcpy.distributions.Distribution) – Distribution object defining the perturbation kernel needed for the sampling
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, steps, n_samples=10000, n_samples_per_param=100, covFactor=None, iniPoints=None, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (python list) – Observed data
- steps (integer) – number of iterations in the sequential algoritm (“generations”)
- n_sample (integer, optional) – number of samples to generate. The default value is 10000.
- n_samples_per_param (integer, optional) – number of data points in each simulated data set. The default value is 100.
- covFactor (float, optional) – scaling parameter of the covariance matrix. The default is a p dimensional array of 1 when p is the dimension of the parameter.
- inipoints (numpy.ndarray, optional) – parameter vaulues from where the sampling starts. By default sampled from the prior.
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: A journal containing simulation results, metadata and optionally intermediate results.
Return type:
-
-
class
abcpy.inferences.
SABC
(model, distance, kernel, backend, seed=None)[source]¶ Bases:
object
This base class implements a modified version of Simulated Annealing Approximate Bayesian Computation (SABC) of [1] when the prior is non-informative.
[1] C. Albert, H. R. Kuensch and A. Scheidegger. A Simulated Annealing Approach to Approximate Bayes Computations. Statistics and Computing, (2014).
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- distance (abcpy.distances.Distance) – Distance object that conforms to the Distance class.
- kernel (abcpy.distributions.Distribution) – Distribution object defining the perturbation kernel needed for the sampling
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, steps, epsilon, n_samples=10000, n_samples_per_param=1, beta=2, delta=0.2, v=0.3, ar_cutoff=0.5, resample=None, n_update=None, adaptcov=1, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (numpy.ndarray) – Observed data.
- steps (integer) – Number of maximum iterations in the sequential algoritm (“generations”)
- epsilon (numpy.float) – An array of proposed values of epsilon to be used at each steps.
- n_samples (integer, optional) – Number of samples to generate. The default value is 10000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
- beta (numpy.float) – Tuning parameter of SABC
- delta (numpy.float) – Tuning parameter of SABC
- v (numpy.float, optional) – Tuning parameter of SABC, The default value is 0.3.
- ar_cutoff (numpy.float) – Acceptance ratio cutoff, The default value is 0.5
- resample (int, optional) – Resample after this many acceptance, The default value if n_samples
- n_update (int, optional) – Number of perturbed parameters at each step, The default value if n_samples
- adaptcov (boolean, optional) – Whether we adapt the covariance matrix in iteration stage. The default value TRUE.
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: A journal containing simulation results, metadata and optionally intermediate results.
Return type:
-
class
abcpy.inferences.
ABCsubsim
(model, distance, kernel, backend, seed=None)[source]¶ Bases:
object
This base class implements Approximate Bayesian Computation by subset simulation (ABCsubsim) algorithm of [1].
[1] M. Chiachio, J. L. Beck, J. Chiachio, and G. Rus., Approximate Bayesian computation by subset simulation. SIAM J. Sci. Comput., 36(3):A1339–A1358, 2014/10/03 2014.
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- distance (abcpy.distances.Distance) – Distance object that conforms to the Distance class.
- kernel (abcpy.distributions.Distribution) – Distribution object defining the perturbation kernel needed for the sampling
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, steps, n_samples=10000, n_samples_per_param=1, chain_length=10, ap_change_cutoff=10, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (numpy.ndarray) – Observed data.
- steps (integer) – Number of iterations in the sequential algoritm (“generations”)
- n_samples (integer, optional) – Number of samples to generate. The default value is 10000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
- chain_length (integer, optional) – Chain length of the MCMC. n_samples should be divisable by chain_length. The default value is 10.
- ap_change_cutoff (float, optional) – The cutoff value for the percentage change in the anneal parameter. If the change is less than ap_change_cutoff the iterations are stopped. The default value is 10.
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: A journal containing simulation results, metadata and optionally intermediate results.
Return type:
-
class
abcpy.inferences.
RSMCABC
(model, distance, kernel, backend, seed=None)[source]¶ Bases:
object
This base class implements Adaptive Population Monte Carlo Approximate Bayesian computation of Drovandi and Pettitt [1].
[1] CC. Drovandi CC and AN. Pettitt, Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics 67(1):225–233, 2011.
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- distance (abcpy.distances.Distance) – Distance object that conforms to the Distance class.
- kernel (abcpy.distributions.Distribution) – Distribution object defining the perturbation kernel needed for the sampling
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, steps, n_samples=10000, n_samples_per_param=1, alpha=0.1, epsilon_init=100, epsilon_final=0.1, const=1, covFactor=2.0, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (numpy.ndarray) – Observed data.
- steps (integer) – Number of iterations in the sequential algoritm (“generations”)
- n_samples (integer, optional) – Number of samples to generate. The default value is 10000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
- alpha (float, optional) – A parameter taking values between [0,1], the default value is 0.1.
- epsilon_init (float, optional) – Initial value of threshold, the default is 100
- epsilon_final (float, optional) – Terminal value of threshold, the default is 0.1
- const (float, optional) – A constant to compute acceptance probabilty
- covFactor (float, optional) – scaling parameter of the covariance matrix. The default value is 2.
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: A journal containing simulation results, metadata and optionally intermediate results.
Return type:
-
class
abcpy.inferences.
APMCABC
(model, distance, kernel, backend, seed=None)[source]¶ Bases:
object
This base class implements Adaptive Population Monte Carlo Approximate Bayesian computation of M. Lenormand et al. [1].
[1] M. Lenormand, F. Jabot and G. Deffuant, Adaptive approximate Bayesian computation for complex models. Computational Statistics, 28:2777–2796, 2013.
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- distance (abcpy.distances.Distance) – Distance object that conforms to the Distance class.
- kernel (abcpy.distributions.Distribution) – Distribution object defining the perturbation kernel needed for the sampling
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, steps, n_samples=10000, n_samples_per_param=1, alpha=0.9, acceptance_cutoff=0.2, covFactor=2.0, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (numpy.ndarray) – Observed data.
- steps (integer) – Number of iterations in the sequential algoritm (“generations”)
- n_samples (integer, optional) – Number of samples to generate. The default value is 10000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
- alpha (float, optional) – A parameter taking values between [0,1], the default value is 0.1.
- acceptance_cutoff (float, optional) – Acceptance ratio cutoff, The default value is 0.2
- covFactor (float, optional) – scaling parameter of the covariance matrix. The default value is 2.
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: A journal containing simulation results, metadata and optionally intermediate results.
Return type:
-
class
abcpy.inferences.
SMCABC
(model, distance, kernel, backend, seed=None)[source]¶ Bases:
object
This base class implements Adaptive Population Monte Carlo Approximate Bayesian computation of Del Moral et al. [1].
[1] P. Del Moral, A. Doucet, A. Jasra, An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing, 22(5):1009–1020, 2012.
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- distance (abcpy.distances.Distance) – Distance object that conforms to the Distance class.
- kernel (abcpy.distributions.Distribution) – Distribution object defining the perturbation kernel needed for the sampling
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
sample
(observations, steps, n_samples=10000, n_samples_per_param=1, epsilon_final=0.1, alpha=0.95, covFactor=2, resample=None, full_output=0)[source]¶ Samples from the posterior distribution of the model parameter given the observed data observations.
Parameters: - observations (numpy.ndarray) – Observed data.
- steps (integer) – Number of iterations in the sequential algoritm (“generations”)
- epsilon_final (float, optional) – The final threshold value of epsilon to be reached. The default value is 0.1.
- n_samples (integer, optional) – Number of samples to generate. The default value is 10000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
- alpha (float, optional) – A parameter taking values between [0,1], determinining the rate of change of the threshold epsilon. The default value is 0.5.
- covFactor (float, optional) – scaling parameter of the covariance matrix. The default value is 2.
- full_output (integer, optional) – If full_output==1, intermediate results are included in output journal. The default value is 0, meaning the intermediate results are not saved.
Returns: A journal containing simulation results, metadata and optionally intermediate results.
Return type:
abcpy.modelselections module¶
-
class
abcpy.modelselections.
ModelSelections
(model_array, statistics_calc, backend, seed=None)[source]¶ Bases:
object
This abstract base class defines a model selection rule of how to choose a model from a set of models given an observation.
-
__init__
(model_array, statistics_calc, backend, seed=None)[source]¶ Constructor that must be overwritten by the sub-class.
The constructor of a sub-class must accept an array of models to choose the model from, and two non-optional parameters statistics calculator and backend stored in self.statistics_calc and self.backend defining how to calculate sumarry statistics from data and what kind of parallelization to use.
Parameters: - model_array (list) – A list of models which are of type abcpy.models.Model
- statistics (abcpy.statistics.Statistics) – Statistics object that conforms to the Statistics class.
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
select_model
(observations, n_samples=1000, n_samples_per_param=100)[source]¶ To be overwritten by any sub-class: returns a model selected by the modelselection procedure most suitable to the obersved data set observations. It is assumed that observations is a list of n same type elements(eg., The observations can be a list containing n timeseries, n graphs or n np.ndarray). Further two optional integer arguments n_samples and n_samples_per_param is supplied denoting the number of samples in the refernce table and the data points in each simulated data set.
Parameters: - observations (python list) – The observed data set.
- n_samples (integer, optional) – Number of samples to generate for reference table.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set.
Returns: A model which are of type abcpy.models.Model
Return type:
-
posterior_probability
(observations)[source]¶ To be overwritten by any sub-class: returns the approximate posterior probability of the chosen model given the observed data set observations. It is assumed that observations is a list of n same type elements(eg., The observations can be a list containing n timeseries, n graphs or n np.ndarray).
Parameters: observations (python list) – The observed data set. Returns: A vector containing the approximate posterior probability of the model chosen. Return type: np.ndarray
-
-
class
abcpy.modelselections.
RandomForest
(model_array, statistics_calc, backend, N_tree=100, n_try_fraction=0.5, seed=None)[source]¶ Bases:
abcpy.modelselections.ModelSelections
This class implements the model selection procedure based on the Random Forest ensemble learner as described in Pudlo et. al. [1].
[1] Pudlo, P., Marin, J.-M., Estoup, A., Cornuet, J.-M., Gautier, M. and Robert, C. (2016). Reliable ABC model choice via random forests. Bioinformatics, 32 859–866.
-
__init__
(model_array, statistics_calc, backend, N_tree=100, n_try_fraction=0.5, seed=None)[source]¶ Parameters: - N_tree (integer, optional) – Number of trees in the random forest. The default value is 100.
- n_try_fraction (float, optional) – The fraction of number of summary statistics to be considered as the size of the number of covariates randomly sampled at each node by the randomised CART. The default value is 0.5.
-
select_model
(observations, n_samples=1000, n_samples_per_param=1)[source]¶ Parameters: - observations (python list) – The observed data set.
- n_samples (integer, optional) – Number of samples to generate for reference table. The default value is 1000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
Returns: A model which are of type abcpy.models.Model
Return type:
-
posterior_probability
(observations, n_samples=1000, n_samples_per_param=1)[source]¶ Parameters: - observations (python list) – The observed data set.
- n_samples (integer, optional) – Number of samples to generate for reference table. The default value is 1000.
- n_samples_per_param (integer, optional) – Number of data points in each simulated data set. The default value is 1.
Returns: A model which are of type abcpy.models.Model
Return type:
-
abcpy.statistics module¶
-
class
abcpy.statistics.
Statistics
(degree=2, cross=True)[source]¶ Bases:
object
This abstract base class defines how to calculate statistics from dataset.
The base class also implements a polynomial expansion with cross-product terms that can be used to get desired polynomial expansion of the calculated statistics.
-
__init__
(degree=2, cross=True)[source]¶ Constructor that must be overwritten by the sub-class.
The constructor of a sub-class must accept arguments for the polynomial expansion after extraction of the summary statistics, one has to define the degree of polynomial expansion and cross, indicating whether cross-prodcut terms are included.
Parameters: - degree (integer, optional) – Of polynomial expansion. The default value is 2 meaning second order polynomial expansion.
- cross (boolean, optional) – Defines whether to include the cross-product terms. The default value is TRUE, meaning the cross product term is included.
-
statistics
(data)[source]¶ To be overwritten by any sub-class: should extract statistics from the data set data. It is assumed that data is a list of n same type elements(eg., The data can be a list containing n timeseries, n graphs or n np.ndarray).
Parameters: data (python list) – Contains n data sets. Returns: nxp matrix where for each of the n data points p statistics are calculated. Return type: numpy.ndarray
-
-
class
abcpy.statistics.
Identity
(degree=2, cross=True)[source]¶ Bases:
abcpy.statistics.Statistics
This class implements identity statistics returning a nxp matrix when the data set contains n numpy.ndarray of length p.
-
class
abcpy.statistics.
HakkarainenLorenzStatistics
(degree=2, cross=True)[source]¶ Bases:
abcpy.statistics.Statistics
This class implements the statistics function from the Statistics protocol. This extracts the statistics following Hakkarainen et. al. [1] from the multivariate timesereis generated by solving Lorenz 95 odes.
[1] J. Hakkarainen, A. Ilin, A. Solonen, M. Laine, H. Haario, J. Tamminen, E. Oja, and H. Järvinen. On closure parameter estimation in chaotic systems. Nonlinear Processes in Geophysics, 19(1):127–143, Feb. 2012.
abcpy.summaryselections module¶
-
class
abcpy.summaryselections.
Summaryselections
(model, statistics_calc, backend, n_samples=1000, seed=None)[source]¶ Bases:
object
This abstract base class defines a way to choose the summary statistics.
-
__init__
(model, statistics_calc, backend, n_samples=1000, seed=None)[source]¶ The constructor of a sub-class must accept a non-optional model, statistics calculator and backend which are stored to self.model, self.statistics_calc and self.backend. Further it accepts two optional parameters n_samples and seed defining the number of simulated dataset used for the pilot to decide the summary statistics and the integer to initialize the random number generator.
Parameters: - model (abcpy.models.Model) – Model object that conforms to the Model class.
- statistics_cal (abcpy.statistics.Statistics) – Statistics object that conforms to the Statistics class.
- backend (abcpy.backends.Backend) – Backend object that conforms to the Backend class.
- n_samples (int, optional) – The number of (parameter, simulated data) tuple generated to learn the summary statistics in pilot step. The default value is 1000.
- seed (integer, optional) – Optional initial seed for the random number generator. The default value is generated randomly.
-
-
class
abcpy.summaryselections.
Semiautomatic
(model, statistics_calc, backend, n_samples=1000, seed=None)[source]¶ Bases:
abcpy.summaryselections.Summaryselections
This class implements the semi auomatic summary statistics choice described in Fearnhead and Prangle [1].
[1] Fearnhead P., Prangle D. 2012. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. Roy. Stat. Soc. B 74:419–474.