Installation ============ If you prefer to work on the source, clone the repository :: git clone https://github.com/eth-cscs/abcpy.git Make sure all requirements are installed :: cd abcpy pip3 install -r requirements.txt To create a package and install it do :: make package pip3 install build/dist/abcpy-0.1-py3-none-any.whl Note that ABCpy requires Python3. Getting Started =============== Here we show how to use ABCpy to infer parameters of model, given observed some data. As a simple example we consider a Gaussian model, where we want to model the height of grown up humans given the following set of measurement (observation, observed data). .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 5 :dedent: 4 Now, we want to model the height of humans by a Gaussian model which has parameters mean, denoted by :math:`\mu`, and standard deviation, denoted by :math:`\sigma`. The goal is to use ABC to infer these yet unknown parameters from the information contained in the observed data. A pre-requisite for ABC is that we provide certain *prior* knowledge about the parameters we want to infer. In our case it is quite simple, we know from experience that the average height should be somewhere between 150cm and 200cm, and the standard deviation is around 5 to 25. .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 8-10, 12-13 :dedent: 4 Further, we need a means to quantify how close our observation is to synthetic data (generated by the model). Often the real and synthetic observations cannot compared directly in a reasonable of efficient way. Thus, *summary statistics* are used to extract relevant properties from the observations, with the idea the these stastistics then compared. .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 16-17 :dedent: 4 As a distance we chose the LogReg distance here. Note that in ABCpy distance functions operate not on the observations, but on summary statistice. .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 20-21 :dedent: 4 We can now setup a inference scheme -- let us chose PMCABC as our inference algorithm of choice. As a pre-requisit it requires a perturbation kernel and a backend. We define both in the following: .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 24-27, 30-31 :dedent: 4 We instanciate an PMCABC object and pass the kernel and backend objects to the constructor: .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 34-35 :dedent: 4 Finally, we need to parametrize and start the actualy sampling: .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 38-41 :dedent: 4 With this the inferrence process is done and the probabilities of the inferred parameters are stored in the journal object. See `Post Analysis`_ for further information on extracting results. The code currently uses the dummy backend `BackendDummy` which does not parallelize the execution of the inference schemes, but is very handy quick prototyping and testing. To execute the code you only need to run :: python3 gaussian.py The full source can be found in `examples/backends/dummy/pmcabc_gaussian.py`. Post Analysis ============= The output when sampling from an inferrence scheme is a Journal (:py:class:`abcpy.output.Journal`) which holds all the necessary results and convenient methods to do the post analysis. For example, one can easily access the sampled parameters and corresponding weights using: .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 48-49 :dedent: 4 For the post analysis basic functions are provided: .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 52-53 :dedent: 4 Also, to ensure reproducibility, every journal stores the parameters of the algorithm that created it: .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 57 :dedent: 4 And certainly, a journal can easily be saved to and loaded from disk: .. literalinclude:: ../../examples/backends/dummy/pmcabc_gaussian.py :language: python :lines: 60, 63 :dedent: 4 Using the Spark Backend ======================= To run ABCpy in parallel using Apache Spark, one only needs to use the provided Spark backend. Considering the example from above, the statements for the backend have to be changed to .. literalinclude:: ../../examples/backends/apache_spark/pmcabc_gaussian.py :language: python :lines: 29-32 :dedent: 4 In words, a Spark context has to be created and passed to the Spark backend. Additionally, the level of parallelism can be provided, which defines in a sense in how many blocks the work should be split up. It corresponds to the parallelism of an RDD in Apache Spark terminology. A good value is usually a small multiple of the total number of available cores. The standard way to run the script on Spark is via the spark-submit command: :: PYSPARK_PYTHON=python3 spark-submit gaussian.py Often Spark installations use Python 2 by default. To make Spark use the required Python 3 interpreter, the `PYSPARK_PYTHON` environment variable can be set. The adapted python code can be found in `examples/backend/apache_spark/gaussian.py`. Note that in order to run jobs in parallel you need to have Apache Spark installed on the system in question. Details on the installation can be found on the official `homepage `_. Further, keep in mind that the ABCpy library has to be properly installed on the cluster, such that it is available to the Python interpreters on the master and the worker nodes. Implementing a new Model ======================== Often one wants to use one of the provided inference schemes on a new model, which is not part of ABCpy. We now go through the details of such a scenario using the Gaussian model to exemplify the mechanics. Every model has to conform to the API specified by the abstract base class :py:class:`abcpy.models.Model`. Thus, making a new model compatible with ABCpy, essentially boils down to implementing the following methods: .. literalinclude:: ../../abcpy/models.py :language: python :lines: 6, 13, 35, 61, 69, 88 In the following we go through a few of the required methods, explain what is expected, and show how it would be implemented for the Gaussian model. As a general note, one can say that it is always a good idea to consult the reference for implementation details. For the constructor, the reference states the following: .. automethod:: abcpy.models.Model.__init__ :noindex: Consequently, we would implement a simple version of a Gaussian model as follows: .. literalinclude:: ../../examples/extensions/models/gaussian_python/pmcabc_gaussian_model_simple.py :language: python :lines: 5-9 Here we actually initialize the model parameters by calling :py:class:`abcpy.models.Model.sample_from_prior`, which is another functions that must be implemented. Its requirements are quite simple: .. automethod:: abcpy.models.Model.sample_from_prior :noindex: .. literalinclude:: ../../examples/extensions/models/gaussian_python/pmcabc_gaussian_model_simple.py :language: python :lines: 24-26 Let us have a look at the details on implementing :py:class:`abcpy.models.Model.set_parameters`: .. automethod:: abcpy.models.Model.set_parameters :noindex: For a Gaussian model a simple implementation would look like the following: .. literalinclude:: ../../examples/extensions/models/gaussian_python/pmcabc_gaussian_model_simple.py :language: python :lines: 11-19 Note that :py:class:`abcpy.models.Model.set_parameters` is expected to return a boolean dependent on whether the provided parameters are suitable for the model. Thus, we added a few tests to the method. For the remaining methods that must be implemented, namely :py:class:`abcpy.models.Model.get_parameters` and :py:class:`abcpy.models.Model.simulate`, we proceed in exactly the same way. This leads to an implementation that might look like the following: .. literalinclude:: ../../examples/extensions/models/gaussian_python/pmcabc_gaussian_model_simple.py :language: python :lines: 21- 23, 27-29 Our model now conforms to ABCpy and we can start inferring parameters in the same way (see `Getting Started`_) as we would do with shipped models. The complete example code can be found `here `_ .. Extending: Add your Distance ============================ TBD Extending: Add your Statistics ============================== TBD Extending: Add your approx_likelihood ===================================== TBD Extending: Add you prior ======================== TBD Extending: Add your own inference scheme ======================================== TBD Use ABCpy with a C++ model ========================== TBD