# dendropy.coalescent – Coalescent Calculations and Statistics¶

Methods for working with Kingman’s n-coalescent framework.

dendropy.coalescent.coalesce(nodes, pop_size=None, period=None, rng=None, use_expected_tmrca=False)

Returns a list of nodes that have not yet coalesced once period is exhausted.

nodes is a list of DendroPy Nodes representing a sample of neutral genes (some, all, or none of these nodes may have descendent nodes).

pop_size is the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.

period is the time that the genes have to coalesce. If pop_size is 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.

This function will a draw a coalescence time, t, from EXP(1/num_genes). If period is given and if this time is less than period, or if period is not given, then two nodes are selected at random from nodes, and coalesced: a new node is created, and the two nodes are added as child_nodes to this node with an edge length such the the total length from tip to the ancestral node is equal to the depth of the deepest child + t. The two nodes are removed from the list of nodes, and the new node is added to it. t is then deducted from period, and the process repeats.

The function ends and returns the list of nodes once period is exhausted or if any draw of t exceeds period, if period is given or when there is only one node left.

As each coalescent event occurs, all nodes have their edges extended to the point of the coalescent event. In the case of constrained coalescence, all uncoalesced nodes have their edges extended to the end of the period (coalesced nodes have the edges fixed by the coalescent event in their ancestor). Thus multiple calls to this method with the same set of nodes will gradually ‘grow’ the edges, until all the the nodes coalesce. The edge lengths of the nodes passed to this method thus should not be modified or reset until the process is complete.

dendropy.coalescent.discrete_time_to_coalescence(n_genes, pop_size=None, rng=None)

A random draw from the “Kingman distribution” (discrete time version): Time to go from n genes to n-1 genes; i.e. waiting time until two lineages coalesce. pop_size is the effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. If pop_size is 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.

dendropy.coalescent.expected_tmrca(n_genes, pop_size=None)

Expected (mean) value for the Time to the Most Recent Common Ancestor. n_genes is the number of genes in the sample. pop_size is the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. If pop_size is 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.

dendropy.coalescent.extract_coalescent_frames(tree, check_ultrametricity_prec=1e-07)

Returns dictionary, with key = number of alleles, and values = waiting time for coalescent for the given tree

dendropy.coalescent.kl_divergence_coalescent_trees(tree_list, haploid_pop_size)

Returns KL divergence for coalescent frames found in a collection of trees from the theoretical distribution given the specified haploid population size.

dendropy.coalescent.kl_divergence_coalescent_waiting_times(allele_waiting_time_dist, haploid_pop_size)
allele_branch_len_dist is a dictionary with number of alleles as keys

and a list of waiting times associated with that number of alleles as values. haploid_pop_size is the population size in terms of total numbers of genes. This returns a the KL-divergence between the distribution of waiting times and the Kingman coalescent distribution.

D_{mathrm{KL}}(P|Q) = sum_i P(i) log

rac{P(i)}{Q(i)}.

dendropy.coalescent.log_probability_of_coalescent_frames(coalescent_frames, haploid_pop_size)
Under the classical neutral coalescent citep{Kingman1982,
Kingman1982b}, the waiting times between coalescent events in a sample of \$k\$ alleles segregating in a population of (haploid) size \$N_e\$ is distributed exponentially with a rate parameter of \$

rac{{k choose 2}}{N_e}\$:

Pr(T) =

rac{{k choose 2}}{N_e} e{- rac{{k choose 2}}{N_e} T},

where \$T\$ is the length of (chronological) time in which there are \$k\$ alleles in the sample (i.e., for \$k\$ alleles to coalesce into \$k-1\$ alleles).
dendropy.coalescent.log_probability_of_coalescent_tree(tree, haploid_pop_size, check_ultrametricity_prec=1e-07)

Wraps up extraction of coalescent frames and reporting of probability.

dendropy.coalescent.node_waiting_time_pairs(tree, check_ultrametricity_prec=1e-07)

Returns list of tuples of (node, coalescent interval [= time between last coalescent event and current node age])

dendropy.coalescent.time_to_coalescence(n_genes, pop_size=None, haploid=True, rng=None)

A random draw from the “Kingman distribution” (continuous time version): Time to go from n genes to n-1 genes; i.e. waiting time until two lineages coalesce. This is a random number with an exponential distribution with a rate of (n choose 2). pop_size is the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. If pop_size is 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.

dendropy.coalescent.update_allele_waiting_time_dist(coalescent_frames, allele_waiting_time_dist=None)

coalescent_frames is a dictionary with number of alleles as keys and a scalar representing the waiting time to a coalescence event given a particular number of alleles on a particular tree (as returned by extract_coalescent_frame. allele_branch_len_dist is a dictionary with number of alleles as keys and a list of waiting times associated with that number of alleles as values. This is simply a convenience function that adds the waiting times found in coalescent_frames to the collection of values tracked in allele_waiting_time_dist.

#### Previous topic

dendropy.dataobject.char – Character Data

#### Next topic

dendropy.reconcile – Tree Reconciliation and Fitting

### Announcements

Join the "DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-announce" group, or click here to visit this group page directly.

### Discussion

Join the "DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.