Master Tree class¶
-
class
TreeNode
(newick=None, format=0, dist=None, support=None, name=None)¶ TreeNode (Tree) class is used to store a tree structure. A tree consists of a collection of TreeNode instances connected in a hierarchical way. Trees can be loaded from the New Hampshire Newick format (newick).
Parameters: - newick – Path to the file containing the tree or, alternatively, the text string containing the same information.
- format (0) –
subnewick format
FORMAT DESCRIPTION 0 flexible with support values 1 flexible with internal node names 2 all branches + leaf names + internal supports 3 all branches + all names 4 leaf branches + leaf names 5 internal and leaf branches + leaf names 6 internal branches + leaf names 7 leaf branches + all names 8 all names 9 leaf names 100 topology only
Returns: a tree node object which represents the base of the tree.
** Examples: **
t1 = Tree() # creates an empty tree t2 = Tree('(A:1,(B:1,(C:1,D:1):0.5):0.5);') t3 = Tree('/home/user/myNewickFile.txt')
-
add_child
(child=None, name=None, dist=None, support=None)¶ Adds a new child to this node. If child node is not suplied as an argument, a new node instance will be created.
Parameters: - child (None) – the node instance to be added as a child.
- name (None) – the name that will be given to the child.
- dist (None) – the distance from the node to the child.
- support' (None) – the support value of child partition.
Returns: The child node instance
-
add_face
(face, column, position='branch-right')¶ Add a fixed face to the node. This type of faces will be always attached to nodes, independently of the layout function.
Parameters: - face – a Face or inherited instance
- column – An integer number starting from 0
- position (“branch-right”) – Posible values are: “branch-right”, “branch-top”, “branch-bottom”, “float”, “aligned”
-
add_feature
(pr_name, pr_value)¶ Add or update a node’s feature.
-
add_features
(**features)¶ Add or update several features.
-
add_sister
(sister=None, name=None, dist=None)¶ Adds a sister to this node. If sister node is not supplied as an argument, a new TreeNode instance will be created and returned.
-
check_monophyly
(values, target_attr, ignore_missing=False, unrooted=False)¶ Returns True if a given target attribute is monophyletic under this node for the provided set of values.
If not all values are represented in the current tree structure, a ValueError exception will be raised to warn that strict monophyly could never be reached (this behaviour can be avoided by enabling the ignore_missing flag.
Parameters: - values – a set of values for which monophyly is expected.
- target_attr – node attribute being used to check monophyly (i.e. species for species trees, names for gene family trees, or any custom feature present in the tree).
- ignore_missing (False) – Avoid raising an Exception when missing attributes are found.
Parameters: unrooted (False) – If True, tree will be treated as unrooted, thus allowing to find monophyly even when current outgroup is spliting a monophyletic group. Returns: the following tuple IsMonophyletic (boolean), clade type (‘monophyletic’, ‘paraphyletic’ or ‘polyphyletic’), leaves breaking the monophyly (set)
-
children
¶ A list of children nodes
-
compare
(ref_tree, use_collateral=False, min_support_source=0.0, min_support_ref=0.0, has_duplications=False, expand_polytomies=False, unrooted=False, max_treeko_splits_to_be_artifact=1000, ref_tree_attr='name', source_tree_attr='name')¶ compare this tree with another using robinson foulds symmetric difference and number of shared edges. Trees of different sizes and with duplicated items allowed.
returns: a Python dictionary with results
-
convert_to_ultrametric
(tree_length=None, strategy='balanced')¶ Converts a tree into ultrametric topology (all leaves must have the same distance to root). Note that, for visual inspection of ultrametric trees, node.img_style[“size”] should be set to 0.
-
copy
(method='cpickle')¶ Returns a copy of the current node.
Variables: method (cpickle) – Protocol used to copy the node structure. The following values are accepted:
- “newick”: Tree topology, node names, branch lengths and branch support values will be copied by as represented in the newick string (copy by newick string serialisation).
- “newick-extended”: Tree topology and all node features will be copied based on the extended newick format representation. Only node features will be copied, thus excluding other node attributes. As this method is also based on newick serialisation, features will be converted into text strings when making the copy.
- “cpickle”: The whole node structure and its content is cloned based on cPickle object serialisation (slower, but recommended for full tree copying)
- “deepcopy”: The whole node structure and its content is copied based on the standard “copy” Python functionality (this is the slowest method but it allows to copy complex objects even if attributes point to lambda functions, etc.)
-
del_feature
(pr_name)¶ Permanently deletes a node’s feature.
-
delete
(prevent_nondicotomic=True, preserve_branch_length=False)¶ Deletes node from the tree structure. Notice that this method makes ‘disappear’ the node from the tree structure. This means that children from the deleted node are transferred to the next available parent.
Parameters: prevent_nondicotomic (True) – When True (default), delete function will be execute recursively to prevent single-child nodes.
Parameters: preserve_branch_length (False) – If True, branch lengths of the deleted nodes are transferred (summed up) to its parent’s branch, thus keeping original distances among nodes.
Example:
/ C root-| | / B \--- H | \ A > H.delete() will produce this structure: / C | root-|--B | \ A
-
describe
()¶ Prints general information about this node and its connections.
-
detach
()¶ Detachs this node (and all its descendants) from its parent and returns the referent to itself.
Detached node conserves all its structure of descendants, and can be attached to another node through the ‘add_child’ function. This mechanism can be seen as a cut and paste.
-
dist
¶ Branch length distance to parent node. Default = 0.0
-
expand_polytomies
(map_attr='name', polytomy_size_limit=5, skip_large_polytomies=False)¶ New in version 2.3.
Given a tree with one or more polytomies, this functions returns the list of all trees (in newick format) resulting from the combination of all possible solutions of the multifurcated nodes.
-
get_ancestors
()¶ versionadded: 2.2
Returns the list of all ancestor nodes from current node to the current tree root.
-
get_ascii
(show_internal=True, compact=False, attributes=None)¶ Returns a string containing an ascii drawing of the tree.
Parameters: - show_internal – includes internal edge names.
- compact – use exactly one line per tip.
- attributes – A list of node attributes to shown in the ASCII representation.
-
get_cached_content
(store_attr=None, container_type=<type 'set'>, _store=None)¶ Returns a dictionary pointing to the preloaded content of each internal node under this tree. Such a dictionary is intended to work as a cache for operations that require many traversal operations.
Parameters: store_attr (None) – Specifies the node attribute that should be cached (i.e. name, distance, etc.). When none, the whole node instance is cached.
Parameters: _store – (internal use)
-
get_children
()¶ Returns an independent list of node’s children.
-
get_closest_leaf
(topology_only=False, is_leaf_fn=None)¶ Returns node’s closest descendant leaf and the distance to it.
Parameters: topology_only (False) – If set to True, distance between nodes will be referred to the number of nodes between them. In other words, topological distance will be used instead of branch length distances. Returns: A tuple containing the closest leaf referred to the current node and the distance to it.
-
get_common_ancestor
(*target_nodes, **kargs)¶ Returns the first common ancestor between this node and a given list of ‘target_nodes’.
Examples:
t = tree.Tree("(((A:0.1, B:0.01):0.001, C:0.0001):1.0[&&NHX:name=common], (D:0.00001):0.000001):2.0[&&NHX:name=root];") A = t.get_descendants_by_name("A")[0] C = t.get_descendants_by_name("C")[0] common = A.get_common_ancestor(C) print common.name
-
get_descendants
(strategy='levelorder', is_leaf_fn=None)¶ Returns a list of all (leaves and internal) descendant nodes.
Parameters: is_leaf_fn (None) – See TreeNode.traverse()
for documentation.
-
get_distance
(target, target2=None, topology_only=False)¶ Returns the distance between two nodes. If only one target is specified, it returns the distance bewtween the target and the current node.
Parameters: - target – a node within the same tree structure.
- target2 – a node within the same tree structure. If not specified, current node is used as target2.
- topology_only (False) – If set to True, distance will refer to the number of nodes between target and target2.
Returns: branch length distance between target and target2. If topology_only flag is True, returns the number of nodes between target and target2.
-
get_edges
(cached_content=None)¶ New in version 2.3.
Returns the list of edges of a tree. Each egde is represented as a tuple of two elements, each containing the list of nodes separated by the edge.
-
get_farthest_leaf
(topology_only=False, is_leaf_fn=None)¶ Returns node’s farthest descendant node (which is always a leaf), and the distance to it.
Parameters: topology_only (False) – If set to True, distance between nodes will be referred to the number of nodes between them. In other words, topological distance will be used instead of branch length distances. Returns: A tuple containing the farthest leaf referred to the current node and the distance to it.
-
get_farthest_node
(topology_only=False)¶ Returns the node’s farthest descendant or ancestor node, and the distance to it.
Parameters: topology_only (False) – If set to True, distance between nodes will be referred to the number of nodes between them. In other words, topological distance will be used instead of branch length distances. Returns: A tuple containing the farthest node referred to the current node and the distance to it.
-
get_leaf_names
(is_leaf_fn=None)¶ Returns the list of terminal node names under the current node.
Parameters: is_leaf_fn (None) – See TreeNode.traverse()
for documentation.
-
get_leaves
(is_leaf_fn=None)¶ Returns the list of terminal nodes (leaves) under this node.
Parameters: is_leaf_fn (None) – See TreeNode.traverse()
for documentation.
-
get_leaves_by_name
(name)¶ Returns a list of leaf nodes matching a given name.
-
get_midpoint_outgroup
()¶ Returns the node that divides the current tree into two distance-balanced partitions.
-
get_monophyletic
(values, target_attr)¶ New in version 2.2.
Returns a list of nodes matching the provided monophyly criteria. For a node to be considered a match, all target_attr values within and node, and exclusively them, should be grouped.
Parameters: - values – a set of values for which monophyly is expected.
- target_attr – node attribute being used to check monophyly (i.e. species for species trees, names for gene family trees).
-
get_sisters
()¶ Returns an indepent list of sister nodes.
-
get_topology_id
(attr='name')¶ New in version 2.3.
Returns the unique ID representing the topology of the current tree. Two trees with the same topology will produce the same id. If trees are unrooted, make sure that the root node is not binary or use the tree.unroot() function before generating the topology id.
This is useful to detect the number of unique topologies over a bunch of trees, without requiring full distance methods.
The id is, by default, calculated based on the terminal node’s names. Any other node attribute could be used instead.
-
get_tree_root
()¶ Returns the absolute root node of current tree structure.
-
img_style
¶ Branch length distance to parent node. Default = 0.0
-
is_leaf
()¶ Return True if current node is a leaf.
-
is_root
()¶ Returns True if current node has no parent
-
iter_ancestors
()¶ versionadded: 2.2
Iterates over the list of all ancestor nodes from current node to the current tree root.
-
iter_descendants
(strategy='levelorder', is_leaf_fn=None)¶ Returns an iterator over all descendant nodes.
Parameters: is_leaf_fn (None) – See TreeNode.traverse()
for documentation.
-
iter_edges
(cached_content=None)¶ New in version 2.3.
Iterate over the list of edges of a tree. Each egde is represented as a tuple of two elements, each containing the list of nodes separated by the edge.
-
iter_leaf_names
(is_leaf_fn=None)¶ Returns an iterator over the leaf names under this node.
Parameters: is_leaf_fn (None) – See TreeNode.traverse()
for documentation.
-
iter_leaves
(is_leaf_fn=None)¶ Returns an iterator over the leaves under this node.
Parameters: is_leaf_fn (None) – See TreeNode.traverse()
for documentation.
-
iter_prepostorder
(is_leaf_fn=None)¶ Iterate over all nodes in a tree yielding every node in both pre and post order. Each iteration returns a postorder flag (True if node is being visited in postorder) and a node instance.
-
iter_search_nodes
(**conditions)¶ Search nodes in an interative way. Matches are being yield as they are being found. This avoids to scan the full tree topology before returning the first matches. Useful when dealing with huge trees.
-
ladderize
(direction=0)¶ Sort the branches of a given tree (swapping children nodes) according to the size of each partition.
t = Tree("(f,((d, ((a,b),c)),e));") print t # # /-f # | # | /-d # ----| | # | /---| /-a # | | | /---| # | | \---| \-b # \---| | # | \-c # | # \-e t.ladderize() print t # /-f # ----| # | /-e # \---| # | /-d # \---| # | /-c # \---| # | /-a # \---| # \-b
-
populate
(size, names_library=None, reuse_names=False, random_branches=False, branch_range=(0, 1), support_range=(0, 1))¶ Generates a random topology by populating current node.
Parameters: - names_library (None) – If provided, names library (list, set, dict, etc.) will be used to name nodes.
- reuse_names (False) – If True, node names will not be necessarily unique, which makes the process a bit more efficient.
- random_branches (False) – If True, branch distances and support values will be randomized.
- branch_range ((0,1)) – If random_branches is True, this
range of values will be used to generate random distances.
Parameters: support_range ((0,1)) – If random_branches is True, this range of values will be used to generate random branch support values.
-
prune
(nodes, preserve_branch_length=False)¶ Prunes the topology of a node to conserve only the selected list of leaf internal nodes. The minimum number of nodes that conserve the topological relationships among the requested nodes will be retained. Root node is always conserved.
Variables: nodes – a list of node names or node objects that should be retained Parameters: preserve_branch_length (False) – If True, branch lengths of the deleted nodes are transferred (summed up) to its parent’s branch, thus keeping original distances among nodes.
Examples:
t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1) t1.prune(['A', 'B']) # /-A # /D /C| # /F| \-B # | | # /H| \-E # | | /-A #-root \-G -root # | \-B # | /-I # \K| # \-J t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1) t1.prune(['A', 'B', 'C']) # /-A # /D /C| # /F| \-B # | | # /H| \-E # | | /-A #-root \-G -root- C| # | \-B # | /-I # \K| # \-J t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1) t1.prune(['A', 'B', 'I']) # /-A # /D /C| # /F| \-B # | | # /H| \-E /-I # | | -root #-root \-G | /-A # | \C| # | /-I \-B # \K| # \-J t1 = Tree('(((((A,B)C)D,E)F,G)H,(I,J)K)root;', format=1) t1.prune(['A', 'B', 'F', 'H']) # /-A # /D /C| # /F| \-B # | | # /H| \-E # | | /-A #-root \-G -root-H /F| # | \-B # | /-I # \K| # \-J
-
remove_child
(child)¶ Removes a child from this node (parent and child nodes still exit but are no longer connected).
-
remove_sister
(sister=None)¶ Removes a sister node. It has the same effect as `TreeNode.up.remove_child(sister)`
If a sister node is not supplied, the first sister will be deleted and returned.
Parameters: sister – A node instance Returns: The node removed
-
render
(file_name, layout=None, w=None, h=None, tree_style=None, units='px', dpi=90)¶ Renders the node structure as an image.
Variables: - file_name – path to the output image file. valid extensions are .SVG, .PDF, .PNG
- layout – a layout function or a valid layout function name
- tree_style – a TreeStyle instance containing the image properties
- units (px) – “px”: pixels, “mm”: millimeters, “in”: inches
- h (None) – height of the image in
units
- w (None) – weight of the image in
units
- dpi (300) – dots per inches.
-
resolve_polytomy
(default_dist=0.0, default_support=0.0, recursive=True)¶ Resolve all polytomies under current node by creating an arbitrary dicotomic structure among the affected nodes. This function randomly modifies current tree topology and should only be used for compatibility reasons (i.e. programs rejecting multifurcated node in the newick representation).
Parameters: - default_dist (0.0) – artificial branch distance of new nodes.
- default_support (0.0) – artificial branch support of new nodes.
- recursive (True) – Resolve any polytomy under this node. When False, only current node will be checked and fixed.
-
robinson_foulds
(t2, attr_t1='name', attr_t2='name', unrooted_trees=False, expand_polytomies=False, polytomy_size_limit=5, skip_large_polytomies=False, correct_by_polytomy_size=False, min_support_t1=0.0, min_support_t2=0.0)¶ Returns the Robinson-Foulds symmetric distance between current tree and a different tree instance.
Parameters: - t2 – reference tree
- attr_t1 (name) – Compare trees using a custom node attribute as a node name.
- attr_t2 (False) – Compare trees using a custom node attribute as a node name in target tree.
- attr_t2 – If True, consider trees as unrooted.
- expand_polytomies (False) – If True, all polytomies in the reference
and target tree will be expanded into all possible binary
trees. Robinson-foulds distance will be calculated between all
tree combinations and the minimum value will be returned.
See also,
NodeTree.expand_polytomy()
.
Returns: (rf, rf_max, common_attrs, names, edges_t1, edges_t2, discarded_edges_t1, discarded_edges_t2)
-
search_nodes
(**conditions)¶ Returns the list of nodes matching a given set of conditions.
Example:
tree.search_nodes(dist=0.0, name="human")
-
set_outgroup
(outgroup)¶ Sets a descendant node as the outgroup of a tree. This function can be used to root a tree or even an internal node.
Parameters: outgroup – a node instance within the same tree structure that will be used as a basal node.
-
set_style
(node_style)¶ Set ‘node_style’ as the fixed style for the current node.
-
show
(layout=None, tree_style=None, name='ETE')¶ Starts an interative session to visualize current node structure using provided layout and TreeStyle.
-
sort_descendants
(attr='name')¶ This function sort the branches of a given tree by considerening node names. After the tree is sorted, nodes are labeled using ascendent numbers. This can be used to ensure that nodes in a tree with the same node names are always labeled in the same way. Note that if duplicated names are present, extra criteria should be added to sort nodes.
Unique id is stored as a node._nid attribute
-
standardize
(delete_orphan=True, preserve_branch_length=True)¶ New in version 2.3.
process current tree structure to produce a standardized topology: nodes with only one child are removed and multifurcations are automatically resolved.
-
support
¶ Branch support for current node
-
swap_children
()¶ Swaps current children order.
-
traverse
(strategy='levelorder', is_leaf_fn=None)¶ Returns an iterator to traverse the tree structure under this node.
Parameters: - strategy (“levelorder”) – set the way in which tree will be traversed. Possible values are: “preorder” (first parent and then children) ‘postorder’ (first children and the parent) and “levelorder” (nodes are visited in order from root to leaves)
- is_leaf_fn (None) – If supplied,
is_leaf_fn
function will be used to interrogate nodes about if they are terminal or internal.is_leaf_fn
function should receive a node instance as first argument and return True or False. Use this argument to traverse a tree by dynamically collapsing internal nodes matchingis_leaf_fn
.
-
unroot
()¶ Unroots current node. This function is expected to be used on the absolute tree root node, but it can be also be applied to any other internal node. It will convert a split into a multifurcation.
-
up
¶ Pointer to parent node
-
write
(features=None, outfile=None, format=0, is_leaf_fn=None, format_root_node=False, dist_formatter=None, support_formatter=None, name_formatter=None)¶ Returns the newick representation of current node. Several arguments control the way in which extra data is shown for every node:
Parameters: - features – a list of feature names to be exported using the Extended Newick Format (i.e. features=[“name”, “dist”]). Use an empty list to export all available features in each node (features=[])
- outfile – writes the output to a given file
- format – defines the newick standard used to encode the tree. See tutorial for details.
- format_root_node (False) – If True, it allows features and branch information from root node to be exported as a part of the newick text string. For newick compatibility reasons, this is False by default.
- is_leaf_fn – See
TreeNode.traverse()
for documentation.
Example:
t.get_newick(features=["species","name"], format=1)