3.5. Tree Simulation and Generation¶

The treesim module provides functions for the simulation of trees under a variety of theoretical models.

3.5.1. Birth-Death Process Trees¶

There are two different birth-death process tree simulation routines in DendroPy:

birth_death
Returns a tree generated under a continuous-time birth-death process, with branch lengths in arbitrary time units.
discrete_birth_death
Returns a tree generated under discrete-time birth-death process, with branch length in generation units.

Both of these functions have identical interfaces, and will grow a tree under a branching process with the specified birth-date and death-rate until the termination condition (pre-specified number of leaves or maximum amount of time) is met.

For example, to get a continuous-time tree with 10 leaves, generated under a birth rate of 1.0 and death rate of 0.5:

```>>> from dendropy import treesim
>>> t = treesim.birth_death(birth_rate=1.0, death_rate=0.5, ntax=10)
>>> t.print_plot()
/-------------------------------------------- T1
|
/-------------+                             /-------------- T2
|             |              /--------------+
|             \--------------+              \-------------- T3
|                            |
|                            \----------------------------- T4
+
|                            /----------------------------- T5
|             /--------------+
|             |              |              /-------------- T6
|             |              \--------------+
\-------------+                             \-------------- T7
|
|                             /-------------- T8
|              /--------------+
\--------------+              \-------------- T9
|
\----------------------------- T10
```

While to get a continuous time tree generated under the same rates after 6 time units:

```>>> t = treesim.birth_death(birth_rate=1.0, death_rate=0.5, max_time=6.0)
```

If both conditions are given simultaneously, then tree growth will terminate when any of the termination conditions (i.e., number of tips == ntax, or number of tips == len(taxon_set) or maximum time == max_time) are met.

3.5.1.1. Specifying a TaxonSet¶

By default, a new Taxon object will be created and associated with each leaf (labeled “T1”, “T2”, etc.), all belonging to a new TaxonSet object associated with the resulting tree.

You can pass in an explicit TaxonSet object using the “taxon_set” keyword:

```>>> import dendropy
>>> from dendropy import treesim
>>> taxa = dendropy.TaxonSet(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
>>> t = treesim.birth_death(0.4, 0.1, taxon_set=taxa)
>>> t.print_plot()
/-------------------------------------- h
|
/-----------+                         /------------ c
|           |            /------------+
|           \------------+            \------------ a
|                        |
+                        \------------------------- g
|
|                                     /------------ e
|                        /------------+
|                        |            \------------ f
\------------------------+
|            /------------ d
\------------+
\------------ b
```

In this case, the branching process underlying the tree generation will terminate when the number of leaves in the tree equals the number of taxa in the TaxonSettaxa”, and the Taxon objects in “taxa” will be randomly assigned to the leaves.

The “taxon_set” keyword can be combined with the “ntax” keyword. If the size of the TaxonSet object given by the taxon_set argument is greater than the specified target tree taxon number, then a random subset of Taxon object in the TaxonSet will be assigned to the leaves:

```>>> import dendropy
>>> from dendropy import treesim
>>> taxa = dendropy.TaxonSet(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
>>> t = treesim.birth_death(birth_rate=1.0, death_rate=0.5, ntax=5, taxon_set=taxa)
>>> t.print_plot()
/-------------------------------------------------- g
|
+                        /------------------------- a
|           /------------+
|           |            |            /------------ d
\-----------+            \------------+
|                         \------------ c
|
\-------------------------------------- f
```

If the size of the TaxonSet object is less than the target taxon number, then new Taxon objects will be created as needed and added to the TaxonSet object as well as associated with the leaves:

```>>> import dendropy
>>> from dendropy import treesim
>>> taxa = dendropy.TaxonSet(['a', 'b'])
>>> t = treesim.birth_death(birth_rate=1.0, death_rate=0.5, ntax=5, taxon_set=taxa)
>>> t.print_plot()
/---------------- a
/--------------------------------+
|                                \---------------- b
+
|               /--------------------------------- T3
\---------------+
|                /---------------- T4
\----------------+
\---------------- T5
```

3.5.1.2. Repeating Failed Branching Processes¶

With a non-zero death rate, it is possible for all lineages of a tree to go extinct before the termination conditions are reached. In this case, by default a TreeSimTotalExtinctionException will be raised:

```>>> t = treesim.birth_death(birth_rate=1.0, death_rate=0.9, ntax=10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jeet/Projects/DendroPy/dendropy/treesim.py", line 188, in birth_death
raise TreeSimTotalExtinctionException()
dendropy.treesim.TreeSimTotalExtinctionException```

If the keyword argument “repeat_until_success” is given, then instead of raising an exception the process starts again and repeats until the termination condition is met:

```>>> t = treesim.birth_death(birth_rate=1.0,
...                         death_rate=0.9,
...                         ntax=10,
...                         repeat_until_success=True)
>>> t.print_plot()
/------------------- T1
/--------------------------------------+
|                                      |         /--------- T2
|                                      \---------+
|                                                \--------- T3
+
|                                                /--------- T4
|        /---------------------------------------+
|        |                                       \--------- T5
|        |
\--------+                   /----------------------------- T6
|         /---------+
|         |         |         /------------------- T7
|         |         \---------+
\---------+                   |         /--------- T8
|                   \---------+
|                             \--------- T9
|
\--------------------------------------- T10
```

3.5.1.3. Suppressing Taxon Assignment¶

You can specify “assign_taxa” to be False to avoid taxa from being automatically assigned to a tree (for example, when you want to build a tree in stages – see below).

3.5.1.4. Extending an Existing Tree¶

Both these functions also accept a Tree object (with valid branch lengths) as an argument passed using the keyword tree. If given, then this tree will be used as the starting point; otherwise a new one will be created.

3.5.1.5. Evolving Birth and Death Rates¶

The same functions can also produce trees generated under variable birth and death rates. The “birth_rate_sd” keyword argument specifies the standard deviation of the normally-distributed error of birth rates as they evolve from parent to child node, while the “death_rate_sd” keyword argument specifies the same of the the death rates. For example, to get a 10-taxon tree generated under a birth- and death-rate that evolves with a standard deviation of 0.1:

```>>> t = treesim.birth_death(birth_rate=1.0,
death_rate=0.5,
birth_rate_sd=0.1,
death_rate_sd=0.1,
ntax=10)
```

3.5.1.6. Building a Tree in Multiple Stages under Different Conditions¶

You might want to generate a tree under different condition in different stages. To do this, you would start with an empty tree and passing it to the birth-death function as an argument using the “tree” keyword argument, and at the same time suppress the automatic taxon assignment using the “assign_taxa=False” keyword argument to avoid taxa being assigned to what will eventually become internal nodes. When the tree is ready, you will call the randomly_assign_taxa function to assign taxa at random to the leaves.

For example, the following generates a birth-death tree with equal birth and death rates, but both rates shifting for a short while to a temporarily higher (though equal) rates:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22``` ```# /usr/bin/env python import random import dendropy from dendropy import treesim def generate(birth_rates, death_rates): assert len(birth_rates) == len(death_rates) tree = dendropy.Tree() for i, br in enumerate(birth_rates): tree = treesim.birth_death(birth_rates[i], death_rates[i], max_time=random.randint(1,8), tree=tree, assign_taxa=False, repeat_until_success=True) print(tree.as_string('newick')) tree.randomly_assign_taxa(create_required_taxa=True) return tree tree = generate([0.1, 0.6, 0.1], [0.1, 0.6, 0.1]) print(tree.as_string('newick')) ```

Another example draws birth and death rates from a normal distribution with the same mean and standard deviation in multiple stages:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20``` ```#! /usr/bin/env python import random import dendropy from dendropy import treesim def generate(mean, sd, num_periods): tree = dendropy.Tree() for i in range(num_periods): tree = treesim.birth_death(birth_rate=random.gauss(mean, sd), death_rate=random.gauss(mean, sd), max_time=random.randint(1,5), tree=tree, assign_taxa=False, repeat_until_success=True) tree.randomly_assign_taxa(create_required_taxa=True) return tree tree = generate(0.1, 0.01, 100) print(tree.as_string('newick')) ```

3.5.2. Star Trees¶

The star_tree generates a simple polytomy tree, with a single node as the immediate ancestor to a set of leaves, with one leaf per Taxon in the TaxonSet object given by the taxon_set argument. For example:

```>>> from dendropy import treesim
>>> taxa = dendropy.TaxonSet(['a', 'b', 'c', 'd', 'e'])
>>> tree = treesim.star_tree(taxa)
>>> print(tree.as_ascii_plot())
/-------------------------------------- a
|
|-------------------------------------- b
|
+-------------------------------------- c
|
|-------------------------------------- d
|
\-------------------------------------- e
```

3.5.3. Population Genetic Trees¶

Coming soon: pop_gen_tree.

3.5.4. (Pure Neutral) Coalescent Trees¶

The pure_kingman function returns a tree generated under an unconstrained neutral coalescent model. The first argument to this function, taxon_set, is a TaxonSet object, where each member Taxon object represents a gene to be coalesced. The second argument, pop_size, specifies the population size in terms of the number of gene copies in the population. This means that for a diploid population of size N, pop_size should be N*2, while for a haploid population of size N, pop_size should be N. If pop_size is None, 1, or 0, then the edge lengths of the returned gene tree will be in population units (i.e., 1 unit of edge length == to 2N generations if a diploid population or 1N generations if a haploid population). Otherwise, the edge lengths will be in generation units. For example:

```#! /usr/bin/env python

import dendropy
from dendropy import treesim

taxa = dendropy.TaxonSet(["z1", "z2", "z3", "z4", "z5", "z6", "z7", "z8"])
tree = treesim.pure_kingman(taxon_set=taxa,
pop_size=10000)
print(tree.as_string("newick"))
print(tree.as_ascii_plot())
```

3.5.5. Contained Coalescent Trees¶

The contained_coalescent function returns a tree generated under a neutral coalescent model conditioned on population splitting times or events given by a containing species or population tree. Such a tree is often referred to as a contained, embedded, censored, truncated, or constrained genealogy/tree. At a minimum, this function takes two arguments: a Tree object representing the containing (species or population) tree, and a TaxonSetMapping object describing how the sampled gene taxa map or are associated with the species/population Tree taxa.

The Tree object representing the containing species or population tree should be rooted and ultrametric. If edge lengths are given in generations, then a meaningful population size needs to be communicated to the contained_coalescent function. In general, for coalescent operations in DendroPy, unless otherwise specified, population sizes are the haploid population size, i.e. the number of genes in the population. This is 2N for a diploid population with N individuals, or N for haploid population of N individuals. If edge lengths are given in population units (e.g., N), then the appropriate population size to use is 1.

If the population size is fixed throughout the containing species/population tree, then simply passing in the appropriate value using the default_pop_size argument to the contained_coalescent function is sufficient. If, on the other hand, the population size varies, the a special attribute must be added to each edge, “pop_size”, that specifies the population size for that edge. For example:

```tree = dendropy.Tree.get_from_path("sp.tre", "newick")
for edge in tree.postorder_edge_iter():
edge.pop_size = 100000
```

The easiest way to get a TaxonSetMapping object is to call the special factory function create_contained_taxon_mapping. This will create a new TaxonSet to manage the gene taxa, and create the associations between the gene taxa and the containing tree taxa for you. It takes two arguments: the TaxonSet of the containing tree, and the number of genes you want sampled from each species.

The following example shows how to create a TaxonSetMapping using create_contained_taxon_mapping, and then calls contained_coalescent to produce a contained coalescent tree:

```#! /usr/bin/env python

import dendropy
from dendropy import treesim

sp_tree_str = """\
[&R] (A:10,(B:6,(C:4,(D:2,E:2):2):2):4)
"""

sp_tree = dendropy.Tree.get_from_string(sp_tree_str, "newick")
gene_to_species_map = dendropy.TaxonSetMapping.create_contained_taxon_mapping(
containing_taxon_set=sp_tree.taxon_set,
num_contained=3)
gene_tree = treesim.contained_coalescent(containing_tree=sp_tree,
gene_to_containing_taxon_map=gene_to_species_map)
print(gene_tree.as_string('newick'))
print(gene_tree.as_ascii_plot())
```

In the above example, the branch lengths were in haploid population units, so we did not specify a population size. If the gene-species associations are more complex, e.g., different numbers of genes per species, we can pass in a list of values as the second argument to ~dendropy.dataobject.taxon.TaxonSetMapping.create_contained_taxon_mapping():

```#! /usr/bin/env python

import dendropy
from dendropy import treesim

sp_tree_str = """\
[&R] (A:10,(B:6,(C:4,(D:2,E:2):2):2):4)
"""

sp_tree = dendropy.Tree.get_from_string(sp_tree_str, "newick")
gene_to_species_map = dendropy.TaxonSetMapping.create_contained_taxon_mapping(
containing_taxon_set=sp_tree.taxon_set,
num_contained=[3, 4, 6, 2, 2,])
gene_tree = treesim.contained_coalescent(containing_tree=sp_tree,
gene_to_containing_taxon_map=gene_to_species_map)
print(gene_tree.as_string('newick'))
print(gene_tree.as_ascii_plot())
```

This approach should be used with caution if we cannot be certain of the order of taxa (as is the case with data read in Newick formats). In these case, and in more complex cases, we might need to directly instantiate the TaxonSetMapping object. The API to describe the associations when constructing this object is very similar to that of the TaxonSetPartition object: you can use a function, attribute or dictionary.

3.5.6. Simulating the Distribution of Number Deep Coalescences Under Different Phylogeographic History Scenarios¶

A typical application for simulating censored coalescent trees is to produce a distribution of trees under different hypotheses of demographic or phylogeographic histories.

For example, imagine we wanted to generate the distribution of the number of deep coalescences under two scenarios: one in which a population underwent sequential or step-wise vicariance, and another when there was simultaneous fragmentation. This can be achieved by generating trees under contained_coalescent, and then using a ContainingTree object to embed the trees and count the number of deep coalescences.

```#! /usr/bin/env python

import dendropy
from dendropy import treesim
from dendropy import reconcile

# simulation parameters and output
num_reps = 10

# population tree descriptions
stepwise_tree_str = "[&R](A:120000,(B:80000,(C:40000,D:40000):40000):40000):100000"
frag_tree_str = "[&R](A:120000,B:120000,C:120000,D:120000):100000"

# taxa and trees
containing_taxa = dendropy.TaxonSet()
stepwise_tree = dendropy.Tree.get_from_string(
stepwise_tree_str,
"newick",
taxon_set=containing_taxa)
frag_tree = dendropy.Tree.get_from_string(
frag_tree_str,
"newick",
taxon_set=containing_taxa)

# taxon set association
genes_to_species = dendropy.TaxonSetMapping.create_contained_taxon_mapping(
containing_taxon_set=containing_taxa,
num_contained=8)

# convert to containing tree
stepwise_tree = reconcile.ContainingTree(stepwise_tree,
contained_taxon_set=genes_to_species.domain_taxon_set,
contained_to_containing_taxon_map=genes_to_species)
frag_tree = reconcile.ContainingTree(frag_tree,
contained_taxon_set=genes_to_species.domain_taxon_set,
contained_to_containing_taxon_map=genes_to_species)

# for each rep
for rep in range(num_reps):
gene_tree1 = treesim.contained_coalescent(containing_tree=stepwise_tree,
gene_to_containing_taxon_map=genes_to_species,
default_pop_size=40000)
stepwise_tree.embed_tree(gene_tree1)
gene_tree2 = treesim.contained_coalescent(containing_tree=frag_tree,
gene_to_containing_taxon_map=genes_to_species,
default_pop_size=40000)
frag_tree.embed_tree(gene_tree2)

# write results

# returns dictionary with contained trees as keys
# and number of deep coalescences as values
stepwise_deep_coals = stepwise_tree.deep_coalescences()
stepwise_out = open("stepwise.txt", "w")
for tree in stepwise_deep_coals:
stepwise_out.write("%d\n" % stepwise_deep_coals[tree])

# returns dictionary with contained trees as keys
# and number of deep coalescences as values
frag_deep_coals = frag_tree.deep_coalescences()
frag_out = open("frag.txt", "w")
for tree in frag_deep_coals:
frag_out.write("%d\n" % frag_deep_coals[tree])
```

Actually, the ContainingTree class has its own native contained coalescent simulator, embed_contained_kingman, which simulates and embeds a contained coalescent tree at the same time. So a more practical approach might be:

```#! /usr/bin/env python

import dendropy
from dendropy import treesim
from dendropy import reconcile

# simulation parameters and output
num_reps = 10

# population tree descriptions
stepwise_tree_str = "[&R](A:120000,(B:80000,(C:40000,D:40000):40000):40000):100000"
frag_tree_str = "[&R](A:120000,B:120000,C:120000,D:120000):100000"

# taxa and trees
containing_taxa = dendropy.TaxonSet()
stepwise_tree = dendropy.Tree.get_from_string(
stepwise_tree_str,
"newick",
taxon_set=containing_taxa)
frag_tree = dendropy.Tree.get_from_string(
frag_tree_str,
"newick",
taxon_set=containing_taxa)

# taxon set association
genes_to_species = dendropy.TaxonSetMapping.create_contained_taxon_mapping(
containing_taxon_set=containing_taxa,
num_contained=8)

# convert to containing tree
stepwise_tree = reconcile.ContainingTree(stepwise_tree,
contained_taxon_set=genes_to_species.domain_taxon_set,
contained_to_containing_taxon_map=genes_to_species)
frag_tree = reconcile.ContainingTree(frag_tree,
contained_taxon_set=genes_to_species.domain_taxon_set,
contained_to_containing_taxon_map=genes_to_species)

# for each rep
for rep in range(num_reps):
stepwise_tree.embed_contained_kingman(default_pop_size=40000)
frag_tree.embed_contained_kingman(default_pop_size=40000)

# write results

# returns dictionary with contained trees as keys
# and number of deep coalescences as values
stepwise_deep_coals = stepwise_tree.deep_coalescences()
stepwise_out = open("stepwise.txt", "w")
for tree in stepwise_deep_coals:
stepwise_out.write("%d\n" % stepwise_deep_coals[tree])

# returns dictionary with contained trees as keys
# and number of deep coalescences as values
frag_deep_coals = frag_tree.deep_coalescences()
frag_out = open("frag.txt", "w")
for tree in frag_deep_coals:
frag_out.write("%d\n" % frag_deep_coals[tree])
```

Previous topic

3.4. Tree Manipulation and Restructuring

Next topic

4. Working with Characters

Announcements

Join the "DendroPy Announcements" group to receive announcements of new releases, updates, changes and other news of interest to DendroPy users and developers.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-announce" group, or click here to visit this group page directly.

Discussion

Join the "DendroPy Users" group to follow and participate in discussion, troubleshooting, help, information, suggestions, etc. on the usage and development of the DendroPy phylogenetic computing library.

Enter your e-mail address in the box above and click the "subscribe" button to subscribe to the "dendropy-users" group, or click here to visit this group page directly.