dendropy.treesim – Tree Simulation

Tree simulation and generation.

exception dendropy.treesim.TreeSimTotalExtinctionException(*args, **kwargs)

Exception to be raised when branching process results in all lineages going extinct.

dendropy.treesim.birth_death(birth_rate, death_rate, birth_rate_sd=0.0, death_rate_sd=0.0, **kwargs)

Returns a birth-death tree with birth rate specified by birth_rate, and death rate specified by death_rate, with edge lengths in continuous (real) units.

birth_rate_sd is the standard deviation of the normally-distributed mutation added to the birth rate as it is inherited by daughter nodes; if 0, birth rate does not evolve on the tree.

death_rate_sd is the standard deviation of the normally-distributed mutation added to the death rate as it is inherited by daughter nodes; if 0, death rate does not evolve on the tree.

Tree growth is controlled by one or more of the following arguments, of which at least one must be specified:

  • If ntax is given as a keyword argument, tree is grown until the number of tips == ntax.
  • If taxon_set is given as a keyword argument, tree is grown until the number of tips == len(taxon_set), and the taxa are assigned randomly to the tips.
  • If ‘max_time’ is given as a keyword argument, tree is grown for a maximum of max_time.
  • If gsa_ntax is given then the tree will be simulated up to this number of tips (or 0 tips), then a tree will be randomly selected from the intervals which corresond to times at which the tree had exactly ntax leaves (or len(taxon_set) tips). This allows for simulations according to the “General Sampling Approach” of [citeHartmannWS2010]

If more than one of the above is given, then tree growth will terminate when any of the termination conditions (i.e., number of tips == ntax, or number of tips == len(taxon_set) or maximum time = max_time) are met.

Also accepts a Tree object (with valid branch lengths) as an argument passed using the keyword tree: if given, then this tree will be used; otherwise a new one will be created.

If assign_taxa is False, then taxa will not be assigned to the tips; otherwise (default), taxa will be assigned. If taxon_set is given (tree.taxon_set, if tree is given), and the final number of tips on the tree after the termination condition is reached is less then the number of taxa in taxon_set (as will be the case, for example, when ntax < len(taxon_set)), then a random subset of taxa in taxon_set will be assigned to the tips of tree. If the number of tips is more than the number of taxa in the taxon_set, new Taxon objects will be created and added to the taxon_set if the keyword argument create_required_taxa is not given as False.

Under some conditions, it is possible for all lineages on a tree to go extinct. In this case, if the keyword argument repeat_until_success is True (the default), then a new branching process is initiated. If False (default), then a TreeSimTotalExtinctionException is raised.

A Random() object or equivalent can be passed using the rng keyword; otherwise GLOBAL_RNG is used.

[citeHartmannWS2010]Hartmann, Wong, and Stadler “Sampling Trees from Evolutionary Models” Systematic Biology. 2010. 59(4). 465-476
dendropy.treesim.constrained_kingman(pop_tree, gene_tree_list=None, rng=None, gene_node_label_func=None, num_genes_attr='num_genes', pop_size_attr='pop_size', decorate_original_tree=False)

Given a population tree, pop_tree this will return a pair of trees: a gene tree simulated on this population tree based on Kingman’s n-coalescent, and population tree with the additional attribute ‘gene_nodes’ on each node, which is a list of uncoalesced nodes from the gene tree associated with the given node from the population tree.

pop_tree should be a DendroPy Tree object or an object of a class derived from this with the following attribute num_genes – the number of gene samples from each population in the present. Each edge on the tree should also have the attribute

pop_size_attr is the attribute name of the edges of pop_tree that specify the population size. By default it is pop_size. The should specify the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.

If pop_size is 1 or 0 or None, then the edge lengths of pop_tree is taken to be in haploid population units; i.e. where 1 unit equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise the edge lengths of pop_tree is taken to be in generations.

If gene_tree_list is given, then the gene tree is added to the tree block, and the tree block’s taxa block will be used to manage the gene tree’s taxa.

gene_node_label_func is a function that takes two arguments (a string and an integer, respectively, where the string is the containing species taxon label and the integer is the gene index) and returns a label for the corresponding the gene node.

if decorate_original_tree is True, then the list of uncoalesced nodes at each node of the population tree is added to the original (input) population tree instead of a copy.

Note that this function does very much the same thing as contained_coalescent(), but provides a very different API.

dendropy.treesim.contained_coalescent(containing_tree, gene_to_containing_taxon_map, edge_pop_size_attr='pop_size', default_pop_size=1, rng=None)

Returns a gene tree simulated under the coalescent contained within a population or species tree.

containing_tree
The population or species tree. If edge_pop_size_map is not None, and population sizes given are non-trivial (i.e., >1), then edge lengths on this tree are in units of generations. Otherwise edge lengths are in population units; i.e. 2N generations for diploid populations of size N, or N generations for diploid populations of size N.
gene_to_containing_taxon_map
A TaxonSetMapping object mapping Taxon objects in the containing_tree TaxonSet to corresponding Taxon objects in the resulting gene tree.
edge_pop_size_attr
Name of attribute of edges that specify population size. By default this is “pop_size”. If this attribute does not exist, default_pop_size will be used. The value for this attribute should be the haploid population size or the number of genes; i.e. 2N for a diploid population of N individuals, or N for a haploid population of N individuals. This value determines how branch length units are interpreted in the input tree, containing_tree. If a biologically-meaningful value, then branch lengths on the containing_tree are properly read as generations. If not (e.g. 1 or 0), then they are in population units, i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations. If this argument is None, then population sizes default to default_pop_size.
default_pop_size
Population size to use if edge_pop_size_attr is None or if an edge does not have the attribute. Defaults to 1.

The returned gene tree will have the following extra attributes:

pop_node_genes
A dictionary with nodes of containing_tree as keys and a list of gene tree nodes that are uncoalesced as values.

Note that this function does very much the same thing as constrained_kingman(), but provides a very different API.

dendropy.treesim.discrete_birth_death(birth_rate, death_rate, birth_rate_sd=0.0, death_rate_sd=0.0, **kwargs)

Returns a birth-death tree with birth rate specified by birth_rate, and death rate specified by death_rate, with edge lengths in discrete (integer) units.

birth_rate_sd is the standard deviation of the normally-distributed mutation added to the birth rate as it is inherited by daughter nodes; if 0, birth rate does not evolve on the tree.

death_rate_sd is the standard deviation of the normally-distributed mutation added to the death rate as it is inherited by daughter nodes; if 0, death rate does not evolve on the tree.

Tree growth is controlled by one or more of the following arguments, of which at least one must be specified:

  • If ntax is given as a keyword argument, tree is grown until the number of tips == ntax.
  • If taxon_set is given as a keyword argument, tree is grown until the number of tips == len(taxon_set), and the taxa are assigned randomly to the tips.
  • If ‘max_time’ is given as a keyword argument, tree is grown for max_time number of generations.

If more than one of the above is given, then tree growth will terminate when any of the termination conditions (i.e., number of tips == ntax, or number of tips == len(taxon_set) or number of generations = max_time) are met.

Also accepts a Tree object (with valid branch lengths) as an argument passed using the keyword tree: if given, then this tree will be used; otherwise a new one will be created.

If assign_taxa is False, then taxa will not be assigned to the tips; otherwise (default), taxa will be assigned. If taxon_set is given (tree.taxon_set, if tree is given), and the final number of tips on the tree after the termination condition is reached is less then the number of taxa in taxon_set (as will be the case, for example, when ntax < len(taxon_set)), then a random subset of taxa in taxon_set will be assigned to the tips of tree. If the number of tips is more than the number of taxa in the taxon_set, new Taxon objects will be created and added to the taxon_set if the keyword argument create_required_taxa is not given as False.

Under some conditions, it is possible for all lineages on a tree to go extinct. In this case, if the keyword argument repeat_until_success is True, then a new branching process is initiated. If False (default), then a TreeSimTotalExtinctionException is raised.

A Random() object or equivalent can be passed using the rng keyword; otherwise GLOBAL_RNG is used.

dendropy.treesim.mean_kingman(taxon_set, pop_size=1)

Returns a tree with coalescent intervals given by the expected times under Kingman’s neutral coalescent.

dendropy.treesim.pop_gen_tree(tree=None, taxon_set=None, ages=None, num_genes=None, pop_sizes=None, num_genes_attr='num_genes', pop_size_attr='pop_size', rng=None)

This will simulate and return a tree with edges decorated with population sizes and leaf nodes decorated by the number of genes (samples or lineages) in each leaf.

If tree is given, then this is used as the tree to be decorated. Otherwise, a Yule tree is generated based on the given taxon_set. Either tree or taxon_set must be given.

The timing of the divergences can be controlled by specifying a vector of ages, ages. This should be sequences of values specifying the ages of the first, second, third etc. divergence events, in terms of time from the present, specified either in generations (if the pop_sizes vector is given) or population units (if the pop_size vector is not given). If an ages vector is given and there are less than num_pops-1 of these, then an exception is raised.

The number of gene lineages per population can be specified through the ‘num_genes’, which can either be an scalar integer or a list. If it is an integer, all the population get the same number of genes. If it is a list, it must be at least as long as num_pops.

The population sizes of each edge can be specified using the pop_sizes vector, which should be a sequence of values specifying the population sizes of the edges in postorder. If the pop_size vector is given, then it must be at least as long as there are branches on a tree, i.e. 2 * num_pops + 1, otherwise it is an error. The population size should be the effective haploid population size; i.e., number of gene copies in the population: 2 * N in a diploid population of N individuals, or N in a haploid population * of N individuals.

If pop_size is 1 or 0 or None, then edge lengths of the tree are in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise edge lengths of the tree are in generations.

This function first generates a tree using a pure-birth model with a uniform birth rate of 1.0. If an ages vector is given, it then sweeps through the internal nodes, assigning branch lengths such that the divergence events correspond to the ages in the vector. If a population sizes vector is given, it then visits all the edges in postorder, assigning population sizes to the attribute with the name specified in ‘pop_size_attr’ (which is persisted as an annotation). During this, if an ages vector was not given, then the edge lengths are multiplied by the population size of the edge so the branch length units will be in generations. If an ages vector was given, then it is assumed that the ages are already in the proper scale/units.

dendropy.treesim.pure_kingman(taxon_set, pop_size=1, rng=None)

Generates a tree under the unconstrained Kingman’s coalescent process.

dendropy.treesim.star_tree(taxon_set)

Builds and returns a star tree from the given taxa block.

dendropy.treesim.uniform_pure_birth(taxon_set, birth_rate=1.0, rng=None)

Generates a uniform-rate pure-birth process tree.

Previous topic

dendropy.treemanip – Tree Manipulation

Next topic

dendropy.continuous – Continuous Character Simulation and Analysis

Documentation

Obtaining

AnnouncementsGoogle Groups

Join the "DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-announce" group, or click here to visit this group page directly.

DiscussionGoogle Groups

Join the "DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.