.. Created by phyles-quickstart. Add some items to the toctree. Keres Documentation =================== *Keres* is the testing suite for a Bayes filter useful for amyloid fibrillogenesis fluorimetry data. The keres homepage is at and the most complete documentation is available at . Fibrillogenesis =============== Fibrillogenesis is the process of fiber formation by amyloid forming proteins, such as the Alzeimer's protein A-beta. At the beginning of a fibrillogensis experiment, the total protein exists in a fiber-free form. After some time, the protein starts to form fibers in a rapidly accelerating reaction that eventually converts all of the protein to fibers. The moment at which the first fibers are detected is the point of transition from the "lag phase" of no fibers to the "signal phase" where fibers are present. The "signal phase" gets its name from the fact that the fibers produce signal measureable by a detector. The time from the start of the experiment to the point of transition to the signal phase is called the "lag time". .. figure:: pictures/fibrillogenesis.png :alt: Picture of fluorometric fibirllogenesis data. :figclass: align-center Schematized representation of fluorometric fibrillogenesis data. Data is generally noisy, although this latter figure represents data as a smooth curve. The detector measures the light emitted from dyes that fluoresce when bound to amyloid fibers and irradiated at specific wavelengths. Thus, the continuous curve represents a series of many individual datapoints taken closely together. Several problems with data, such as noisiness, incompleteness, or baseline drift make it difficult to measure the lag time with certainty. One way to address these challenges is to embrace this uncertainty and convert the series of intensities to a series of probabilities using a Bayes filter. The Bayes Filter ================ The Bayes filter is an application of recursive Bayesian estimation, a full description of which will be published soon. But briefly, the principle of recursive Bayesian estimation is to update a posterior probability :math:`p_i(H|E)`, where :math:`i` indexes the data point in a series. The probability :math:`p_i(H|E)` describes how likely it is that the experiment is still in the lag phase of fibrillogenesis. If the point :math:`i+1` has higher signal than point :math:`i`, then :math:`p_i(H|E)` gets lower (i.e. less likelihood that the experiment is still in the lag phase). Conversely, if the point :math:`i+1` has lower signal than point :math:`i`, then :math:`p_i(H|E)` gets higher. For each round, the estimator is updated according to Bayes's equation: .. math:: :label: Bayes p_i(H|E) = \dfrac{p_i(E|H) \cdot p_i(H)}{p_i(E)} The value :math:`p_i(H)` is the equal to :math:`p_{i-1}(H|E)`. The value :math:`p_i(E|H)` is the probability to see a point with intensity :math:`I_i` given that the experiment is in the lag phase. :math:`p_i(E|H)` assumes a normal distribution of intensities around the mean intensity of the presumed lag phase (basically a reasonable window of data points prior to point :math:`i`). The value :math:`p_i(E)` is the probability of seeing the intensity :math:`I_i` in a reasonable window of points around point :math:`i`. Once the probability :math:`p_i(H|E)` falls below a hard cutoff (:math:`10^{-10}`), the experiment is confidently in the signal phase. To find the exact transition from the lag to signal phases, it is useful to "backtrack" to a higher probability (:math:`10^{-4}`) and then apply an empircal correction optimized from simulation data with Gaussian noise: .. math:: :label: Pareto C = \dfrac{\alpha \cdot m^{\alpha}} {\left ( \dfrac {\nu_h}{\sigma_h} \right )^{1 + \alpha}} + k Here, :math:`\nu_h` is the average intensity around the hard cutoff point :math:`h`, :math:`\sigma_h` is the square root of the variance of the lag phase for data point :math:`h`. The rest of the values are empirical: :math:`k = 7`, :math:`m = 362`, :math:`\alpha = 0.9`. Although this correction works well for both simulated and experimental data, we don't have a rigorous theoretical rationale for its efficacy. In other words, this correction is entirely empirical. Using the Bayes Filter Directly =============================== Data can be passed to the default Bayes filter by calling the ``bayesian_pickup()`` function ("pickup" refers to when the signal "picks up"): .. code-block:: python from keres import bayesian_pickup (time, value), history, signoise = bayesian_pickup(data) Here, ``data`` is a :math:`2 \times N` array, where the first element is a vector of times (:math:`t_0, t_1, t_2 ... N`) and the second element is a vector of intensities (:math:`I_0, I_1, I_2 ... N`). Return Values ------------- The return value of ``bayesian_pickup()`` is a tuple of three elements, the first of which is a 2-tuple of the ``time`` at the end of the lag phase ("pickup") and the intensity (``value``) at the pickup. The second element, ``history``, is a list of 2-tuples, each having a first element of the data point number :math:`i` and a second element of the :math:`\log_{10} p_i(H|E)`: .. math:: :label: history \left [ \left (i-K, \log_{10} \{ p_{i-K}(H|E) \} \right ), \left (i-K+1, \log_{10} \{ p_{i-K+1}(H|E) \} \right ) ... \left (i, \log_{10} \{ p_i(H|E) \} \right ) \right ] Here, :math:`K` is the number of data points in the lag phase. The third element, ``signoise``, is the ratio of the interpolated value (:math:`I_H`) at the time :math:`t_H` where :math:`p(H|E) = 10^{-10}` to the standard deviation (:math:`\sigma_{j