NCBITaxa class

class NCBITaxa(dbfile=None)

Bases: object

versionadded: 2.3

Provides a local transparent connector to the NCBI taxonomy database.

annotate_tree(t, taxid_attr='name', tax2name=None, tax2track=None, tax2rank=None)

Annotate a tree containing taxids as leaf names by adding the ‘taxid’, ‘sci_name’, ‘lineage’, ‘named_lineage’ and ‘rank’ additional attributes.

Parameters:
  • t – a Tree (or Tree derived) instance.
  • taxid_attr (name) – Allows to set a custom node attribute containing

the taxid number associated to each node (i.e. species in PhyloTree instances).

Parameters:tax2name,tax2track,tax2rank – Use these arguments to provide

pre-calculated dictionaries providing translation from taxid number and names,track lineages and ranks.

get_broken_branches(t, taxa_lineages, n2content=None)

Returns a list of NCBI lineage names that are not monophyletic in the provided tree, as well as the list of affected branches and their size.

CURRENTLY EXPERIMENTAL

get_common_names(taxids)
get_descendant_taxa(parent, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, return_tree=False)

given a parent taxid or scientific species name, returns a list of all its descendants taxids. If intermediate_nodes is set to True, internal nodes will also be dumped.

get_fuzzy_name_translation(name, sim=0.9)

Given an inexact species name, returns the best match in the NCBI database of taxa names.

Parameters:sim (0.9) – Min word similarity to report a match (from 0 to 1).
Returns:taxid, species-name-match, match-score
get_lineage(taxid)

Given a valid taxid number, return its corresponding lineage track as a hierarchically sorted list of parent taxids.

get_name_translator(names)

Given a list of taxid scientific names, returns a dictionary translating them into their corresponding taxids.

Exact name match is required for translation.

get_rank(taxids)

return a dictionary converting a list of taxids into their corresponding NCBI taxonomy rank

get_taxid_translator(taxids)

Given a list of taxids, returns a dictionary with their corresponding scientific names.

get_topology(taxids, intermediate_nodes=False, rank_limit=None, collapse_subspecies=False, annotate=True)

Given a list of taxid numbers, return the minimal pruned NCBI taxonomy tree containing all of them.

Parameters:intermediate_nodes (False) – If True, single child nodes

representing the complete lineage of leaf nodes are kept. Otherwise, the tree is pruned to contain the first common ancestor of each group.

Parameters:rank_limit (None) – If valid NCBI rank name is provided, the tree is

pruned at that given level. For instance, use rank=”species” to get rid of sub-species or strain leaf nodes.

Parameters:collapse_subspecies (False) – If True, any item under the species

rank will be collapsed into the species upper node.

translate_to_names(taxids)

Given a list of taxid numbers, returns another list with their corresponding scientific names.

update_taxonomy_database(taxdump_file=None)

Updates the ncbi taxonomy database by downloading and parsing the latest taxdump.tar.gz file from the NCBI FTP site.

Parameters:taxdump_file (None) – an alternative location of the taxdump.tax.gz file.