.. Created by phyles-quickstart.
Add some items to the toctree.
Keres Documentation
===================
*Keres* is the testing suite for a Bayes filter useful for
amyloid fibrillogenesis fluorimetry data.
The keres homepage is at and
the most complete documentation is available at
.
Fibrillogenesis
===============
Fibrillogenesis is the process of fiber formation by
amyloid forming proteins, such as the Alzeimer's protein
A-beta. At the beginning of a fibrillogensis experiment, the total
protein exists in a fiber-free form. After some time, the
protein starts to form fibers in a rapidly accelerating
reaction that eventually converts all of the protein to fibers.
The moment at which the first fibers are detected is the point of
transition from the "lag phase" of no fibers to the "signal
phase" where fibers are present. The "signal phase" gets its
name from the fact that the fibers produce signal
measureable by a detector. The time from the start of the
experiment to the point of transition to the signal phase
is called the "lag time".
.. figure:: pictures/fibrillogenesis.png
:alt: Picture of fluorometric fibirllogenesis data.
:figclass: align-center
Schematized representation of fluorometric fibrillogenesis data.
Data is generally noisy, although this latter figure represents
data as a smooth curve. The detector measures
the light emitted from dyes that fluoresce when bound to
amyloid fibers and irradiated at specific wavelengths. Thus, the
continuous curve represents a series of many individual
datapoints taken closely together.
Several problems with data, such as noisiness, incompleteness,
or baseline drift make it difficult to measure the lag time with
certainty. One way to address these challenges is to embrace
this uncertainty and convert the series of intensities to a
series of probabilities using a Bayes filter.
The Bayes Filter
================
The Bayes filter is an application of recursive
Bayesian estimation, a full description of which will be
published soon. But briefly, the principle of recursive
Bayesian estimation is to update a
posterior probability :math:`p_i(H|E)`, where :math:`i`
indexes the data point in a series. The probability
:math:`p_i(H|E)` describes how likely it is that the
experiment is still in the lag phase of fibrillogenesis.
If the point :math:`i+1` has higher signal than point :math:`i`, then
:math:`p_i(H|E)` gets lower (i.e. less likelihood that
the experiment is still in the lag phase). Conversely, if
the point :math:`i+1` has lower signal than point :math:`i`,
then :math:`p_i(H|E)` gets higher.
For each round, the estimator is updated according to Bayes's
equation:
.. math::
:label: Bayes
p_i(H|E) = \dfrac{p_i(E|H) \cdot p_i(H)}{p_i(E)}
The value :math:`p_i(H)` is the equal to :math:`p_{i-1}(H|E)`. The value
:math:`p_i(E|H)` is the probability to see a point with intensity :math:`I_i`
given that the experiment is in the lag phase. :math:`p_i(E|H)` assumes
a normal distribution of intensities around the mean intensity of the
presumed lag phase (basically a reasonable window of data points
prior to point :math:`i`).
The value :math:`p_i(E)` is the probability of seeing the intensity :math:`I_i`
in a reasonable window of points around point :math:`i`.
Once the probability :math:`p_i(H|E)` falls below a hard cutoff (:math:`10^{-10}`),
the experiment is confidently in the signal phase.
To find the exact transition from the lag
to signal phases, it is useful to "backtrack"
to a higher probability (:math:`10^{-4}`) and then apply an
empircal correction optimized from simulation data
with Gaussian noise:
.. math::
:label: Pareto
C = \dfrac{\alpha \cdot m^{\alpha}}
{\left ( \dfrac {\nu_h}{\sigma_h} \right )^{1 + \alpha}} + k
Here, :math:`\nu_h` is the average intensity around the hard cutoff point :math:`h`,
:math:`\sigma_h` is the square root of the variance of the lag phase for
data point :math:`h`. The rest of the values are empirical: :math:`k = 7`,
:math:`m = 362`, :math:`\alpha = 0.9`. Although this correction works well
for both simulated and experimental data, we don't have a rigorous theoretical
rationale for its efficacy. In other words, this correction is entirely empirical.
Using the Bayes Filter Directly
===============================
Data can be passed to the default Bayes filter by calling the ``bayesian_pickup()``
function ("pickup" refers to when the signal "picks up"):
.. code-block:: python
from keres import bayesian_pickup
(time, value), history, signoise = bayesian_pickup(data)
Here, ``data`` is a :math:`2 \times N` array, where the first element is a vector
of times (:math:`t_0, t_1, t_2 ... N`) and the second element is a vector
of intensities (:math:`I_0, I_1, I_2 ... N`).
Return Values
-------------
The return value of ``bayesian_pickup()`` is a tuple of three elements,
the first of which is a 2-tuple of the ``time`` at the end of the lag phase
("pickup") and the intensity (``value``) at the pickup.
The second element, ``history``, is a list of 2-tuples, each having a first
element of the data point number :math:`i` and a second element of the
:math:`\log_{10} p_i(H|E)`:
.. math::
:label: history
\left [ \left (i-K, \log_{10} \{ p_{i-K}(H|E) \} \right ),
\left (i-K+1, \log_{10} \{ p_{i-K+1}(H|E) \} \right ) ...
\left (i, \log_{10} \{ p_i(H|E) \} \right ) \right ]
Here, :math:`K` is the number of data points in the lag phase.
The third element, ``signoise``, is the ratio of the interpolated
value (:math:`I_H`) at the time :math:`t_H` where :math:`p(H|E) = 10^{-10}` to
the standard deviation (:math:`\sigma_{j