Taxonomylite

A simple one-file solution for those times when you want to check if one organism is a descended from another, but don’t need a full phylogenetic tree manipulation library.

The library is just a single file that depends only upon the standard library. You can easily embed it in another library by copying this script.

from taxonomylite import Taxonomy

# Create a new database from NCBI sources
# This process may take some time.
taxa_db = Taxonomy.from_source("taxonomy.db")

# Later... in a new session
from taxonomylite import Taxonomy

taxa_db = Taxonomy("taxonomy.db")

tid = taxa_db.tid_to_name("Felidae")

immediate_children = taxa_db.children(tid)
# [338151, 338152, 338153, 339610]

all_children = taxa_db.children(tid, deep=True)
# [9682, 9683, 9685, 9687, 9688, 9689, 9690, 9691, 9692, 9693, ...]

print taxa_db.tid_to_rank(db.name_to_tid("Panthera leo leo"))
# subspecies
taxonomylite.SEP_TOKEN = 'zzz'

The separator used to tokenize the in-database lineage string

taxonomylite.SOURCE_URL = 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz'

The default location to download taxonomy information from

class taxonomylite.Taxonomy(store_path)

Bases: object

Operate on taxonomic hierarchies downloaded from the NCBI Taxonomy database using a compact SQLite database.

Parameters:store_path (str) – Path to the sqlite database containing the hierarchies
connection

sqlite3.Connection

The underlying connection to the sqlite database

children(tid, deep=False)

Retrieve all child taxonomic id numbers of tid. If deep is True, retrieve all descendants

Parameters:
  • tid (int) –
  • deep (bool) – Retrieve all descendants, not just direct children
Returns:

Return type:

list of ints

close()

Close the underlying database connection.

See sqlite3.Connection.close()

commit()

Save pending changes to the underlying database

See sqlite3.Connection.commit()

execute(stmt, args='')

Execute raw SQL against the underlying database.

See sqlite3.Connection.execute()

executemany(stmt, args='')
classmethod from_source(store_path='taxonomy.db', url='ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz')

Construct a new Taxonomy instance and associated database file from source data downloaded from NCBI’s FTP servers.

If url is None, then it will look for the source information in the current directory at the name “taxdump.tar.gz”.

Parameters:
  • store_path (str) – Path to construct the database at. Defaults to “taxonomy.db” in the current directory
  • url (str) – The URL to download the taxonomy information from. Defaults to SOURCE_URL
Returns:

Return type:

class:Taxonomy

is_child(child_tid, parent_tid)

Test if child_tid is a child taxa of parent_tid

Parameters:
  • child_tid (int) –
  • parent_tid (int) –
Returns:

Return type:

bool

is_parent(child_tid, parent_tid)

Test if parent_tid is a parent taxa of child_tid

Parameters:
  • child_tid (int) –
  • parent_tid (int) –
Returns:

Return type:

bool

lineage(db, tid)

Construct the taxonomic “path” from tid to the root of the phylogenetic hierarchy

Parameters:tid (int) –
Returns:
Return type:list of ints
name_to_tid(name)

Translates a scientific name name string into its equivalent taxonomic id number

Parameters:name (str) – A scientific name like “Homo sapiens”
Returns:tid
Return type:int
nearest_common_ancestor(a, b)
parent(tid)

Extract the taxonomic id number of the parent of tid

Parameters:tid (int) –
Returns:
Return type:int
relatives(tid, degree=1)

Retrieve relatives of tid out to degree steps removed

Parameters:
  • tid (int) –
  • degree (int) –
Returns:

Return type:

list of ints

siblings(tid)

Extract the taxonomic id numbers of the siblings (same parent) of tid

Parameters:tid (int) –
Returns:
Return type:list of ints
tid_to_name(tid)

Translates a taxonomic id number tid into its equivalent scientific name

Parameters:tid (int) – A taxonomic id number like 9606
Returns:name – A scientific name like “Homo sapiens”
Return type:str
tid_to_rank(tid)

Indices and tables