KEGG - Kyoto Encyclopedia of Genes and Genomes (kegg)

KEGG - Kyoto Encyclopedia of Genes and Genomes

kegg is a python module for accessing KEGG (Kyoto Encyclopedia of Genes and Genomes) using its web services.

Note

This module requires slumber and requests packages.

>>> from orangecontrib.bio.kegg import *
>>> # Create a KEGG Genes database interface
>>> genome = KEGGGenome()
>>> # List all available entry ids
>>> keys = genome.keys()
>>> print(keys[0])
T01001
>>> # Retrieve the entry for the key.
>>> entry = genome[keys[0]]
>>> print(entry.entry_key)
T01001
>>> print(entry.definition)
Homo sapiens (human)
>>> print(str(entry)) 
ENTRY       T01001            Complete  Genome
NAME        hsa, HUMAN, 9606
DEFINITION  Homo sapiens (human)
...

The Organism class can be a convenient starting point for organism specific databases.

>>> organism = Organism("Homo sapiens")  # searches for the organism by name
>>> print(organism.org_code)  # prints the KEGG organism code
hsa
>>> genes = organism.genes  # get the genes database for the organism
>>> gene_ids = genes.keys() # KEGG gene identifiers
>>> entry = genes["hsa:672"]
>>> print(entry.definition)
breast cancer 1, early onset
>>> # print the entry in DBGET database format.
>>> print(entry) 
ENTRY       672               CDS       T01001
NAME        BRCA1, BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4, PPP1R53, PSCP, RNF53
DEFINITION  breast cancer 1, early onset
...
class orangecontrib.bio.kegg.Organism(org, genematcher=None)

A convenience class for retrieving information regarding an organism in the KEGG Genes database.

Parameters:org (str) – KEGG organism code (e.g. “hsa”, “sce”). Can also be a descriptive name (e.g. ‘yeast’, “homo sapiens”) in which case the organism code will be searched for by using KEGG find api.

See also

organism_name_search()
Search KEGG for an organism code
org

KEGG organism code.

genes

An Genes database instance for this organism.

gene_aliases()

Return a list of sets of equal genes (synonyms) in KEGG for this organism.

Note

This only includes ‘ncbi-geneid’ and ‘ncbi-gi’ records from the KEGG Genes DBLINKS entries.

pathways(with_ids=None)

Return a list of all pathways for this organism.

list_pathways()

List all pathways for this organism.

get_enriched_pathways(genes, reference=None, prob=<orangecontrib.bio.utils.stats.Binomial object>, callback=None)

Return a dictionary with enriched pathways ids as keys and (list_of_genes, p_value, num_of_reference_genes) tuples as items.

get_pathways_by_genes(gene_ids)

Pathways that include all genes in gene_ids.

get_unique_gene_ids(genes, case_sensitive=True)

Return a tuple with three elements. The first is a dictionary mapping from unique geneids to gene names in genes, the second is a list of conflicting gene names and the third is a list of unknown genes.

orangecontrib.bio.kegg.KEGGOrganism

alias of Organism

Search for a organism by name and return it’s KEGG organism code.

orangecontrib.bio.kegg.pathways(org)

Return a list of all KEGG pathways for an KEGG organism code org.

orangecontrib.bio.kegg.from_taxid(taxid)

Return a KEGG organism code for a an NCBI Taxonomy id string taxid.

orangecontrib.bio.kegg.to_taxid(name)

Return a NCBI Taxonomy id for a given KEGG Organism name

DBEntry (entry)

The entry.DBEntry represents a DBGET databas entry. The individual KEGG Database interfaces below provide their own specialization for this base class.

class orangecontrib.bio.kegg.entry.DBEntry(text=None)

Bases: object

A DBGET entry object.

entry_key

Primary entry key used for identifying the entry.

parse(text)

Parse text string containing a formated DBGET entry.

format(section_indent=12)

Return a DBGET formated string representation.

KEGG Databases interface (databases)

class orangecontrib.bio.kegg.databases.DBDataBase(**kwargs)

Bases: object

Base class for a DBGET database interface.

ENTRY_TYPE

ENTRY_TYPE constructor (a DBEntry subclass). This should be redefined in subclasses.

alias of DBEntry

DB = None

A database name/abbreviation (e.g. ‘pathway’). Needs to be set in a subclass or object instance’s constructor before calling the base. __init__

keys()

Return a list of database keys. These are unique KEGG identifiers that can be used to query the database.

iterkeys()

Return an iterator over the keys.

items()

Return a list of all (key, DBDataBase.ENTRY_TYPE instance) tuples.

iteritems()

Return an iterator over the items.

values()

Return a list of all DBDataBase.ENTRY_TYPE instances.

itervalues()

Return an iterator over all DBDataBase.ENTRY_TYPE instances.

get(key, default=None)

Return an DBDataBase.ENTRY_TYPE instance for the key. Raises KeyError if not found.

get_text(key)

Return the database entry for key as plain text.

get_entry(key)

Return the database entry for key as an instance of ENTRY_TYPE.

find(name)

Find name using kegg find api.

pre_cache(keys=None, batch_size=10, progress_callback=None)

Retrieve all the entries for keys and cache them locally for faster subsequent retrieval. If keys is None then all entries will be retrieved.

batch_get(keys)

Batch retrieve all entries for keys. This can be significantly faster then getting each entry separately especially if entries are not yet cached.

class orangecontrib.bio.kegg.databases.GenomeEntry(text)

Bases: orangecontrib.bio.kegg.entry.DBEntry

Entry for a KEGG Genome database.

organism_code

A three or four letter KEGG organism code (e.g. ‘hsa’, ‘sce’, ...)

taxid

Organism NCBI taxonomy id.

annotation

ANNOTATION

chromosome

CHROMOSOME

comment

COMMENT

data_source

DATA_SOURCE

definition

DEFINITION

disease

DISEASE

entry

ENTRY

keywords

KEYWORDS

name

NAME

original_db

ORIGINAL_DB

plasmid

PLASMID

reference

REFERENCE

statistics

STATISTICS

taxonomy

TAXONOMY

class orangecontrib.bio.kegg.databases.Genome

Bases: orangecontrib.bio.kegg.databases.DBDataBase

An interface to the A KEGG GENOME database.

ENTRY_TYPE

alias of GenomeEntry

org_code_to_entry_key(code)

Map an organism code (‘hsa’, ‘sce’, ...) to the corresponding kegg identifier (T + 5 digit number).

search(string, relevance=False)

Search the genome database for string using bfind.

class orangecontrib.bio.kegg.databases.GeneEntry(text=None)

Bases: orangecontrib.bio.kegg.entry.DBEntry

aaseq

AASEQ

class_

CLASS

DBLINKS

definition

DEFINITION

disease

DISEASE

drug_target

DRUG_TARGET

entry

ENTRY

module

MODULE

motif

MOTIF

name

NAME

ntseq

NTSEQ

organism

ORGANISM

orthology

ORTHOLOGY

pathway

PATHWAY

position

POSITION

structure

STRUCTURE

class orangecontrib.bio.kegg.databases.Genes(org_code)

Bases: orangecontrib.bio.kegg.databases.DBDataBase

Interface to the KEGG Genes database.

Parameters:org_code (str) – KEGG organism code (e.g. ‘hsa’).
class orangecontrib.bio.kegg.databases.CompoundEntry(text=None)

Bases: orangecontrib.bio.kegg.entry.DBEntry

atom

ATOM

bond

BOND

brite

BRITE

comment

COMMENT

DBLINKS

entry

ENTRY

enzyme

ENZYME

exact_mass

EXACT_MASS

formula

FORMULA

mol_weight

MOL_WEIGHT

name

NAME

pathway

PATHWAY

reaction

REACTION

reference

REFERENCE

remark

REMARK

class orangecontrib.bio.kegg.databases.Compound

Bases: orangecontrib.bio.kegg.databases.DBDataBase

class orangecontrib.bio.kegg.databases.ReactionEntry(text=None)

Bases: orangecontrib.bio.kegg.entry.DBEntry

definition

DEFINITION

entry

ENTRY

enzyme

ENZYME

equation

EQUATION

name

NAME

class orangecontrib.bio.kegg.databases.Reaction

Bases: orangecontrib.bio.kegg.databases.DBDataBase

class orangecontrib.bio.kegg.databases.EnzymeEntry(text=None)

Bases: orangecontrib.bio.kegg.entry.DBEntry

all_reac

ALL_REAC

class_

CLASS

comment

COMMENT

DBLINKS

entry

ENTRY

genes

GENES

name

NAME

orthology

ORTHOLOGY

pathway

PATHWAY

product

PRODUCT

reaction

REACTION

reference

REFERENCE

substrate

SUBSTRATE

sysname

SYSNAME

class orangecontrib.bio.kegg.databases.Enzyme

Bases: orangecontrib.bio.kegg.databases.DBDataBase

class orangecontrib.bio.kegg.databases.PathwayEntry(text=None)

Bases: orangecontrib.bio.kegg.entry.DBEntry

class_

CLASS

compound

COMPOUND

DBLINKS

description

DESCRIPTION

disease

DISEASE

drug

DRUG

entry

ENTRY

enzyme

ENZYME

ko_pathway

KO_PATHWAY

module

MODULE

name

NAME

organism

ORGANISM

pathway_map

PATHWAY_MAP

reference

REFERENCE

rel_pathway

REL_PATHWAY

class orangecontrib.bio.kegg.databases.Pathway(prefix='map')

Bases: orangecontrib.bio.kegg.databases.DBDataBase

KEGG Pathway database

Parameters:prefix (str) – KEGG Organism code (‘hsa’, ...) or ‘map’, ‘ko’, ‘ec’ or ‘rn’

KEGG Pathway (pathway)

class orangecontrib.bio.kegg.pathway.Pathway(pathway_id, local_cache=None, connection=None)

Bases: object

Class representing a KEGG Pathway (parsed from a “kgml” file)

Parameters:pathway_id (str) – A KEGG pathway id (e.g. ‘path:hsa05130’)
name

Pathway name/id (e.g. “path:hsa05130”)

org

Pathway organism code (e.g. ‘hsa’)

number

Pathway number as a string (e.g. ‘05130’)

title

Pathway title string.

image

URL of the pathway image.

URL to a pathway on the KEGG web site.

get_image()

Return an local filesystem path to an image of the pathway. The image will be downloaded if not already cached.

classmethod list(organism)

List all pathways for KEGG organism code organism.

Utilities

class orangecontrib.bio.kegg.entry.parser.DBGETEntryParser

A DBGET entry parser (inspired by xml.dom.pulldom).

>>> stream = StringIO("ENTRY foo\n"
...                   "NAME  foo's name\n"
...                   "  BAR A subsection of 'NAME'\n")
...
>>> parser = DBGETEntryParser()
>>> for event, title, contents_part in parser.parse(stream):
...    print(parser.EVENTS[event], title, repr(contents_part))
...
ENTRY_START None None
SECTION_START ENTRY 'foo\n'
SECTION_END ENTRY None
SECTION_START NAME "foo's name\n"
SUBSECTION_START BAR "A subsection of 'NAME'\n"
SUBSECTION_END BAR None
SECTION_END NAME None
ENTRY_END None None
ENTRY_END = 1

Entry end event

ENTRY_START = 0

Entry start events

SECTION_END = 3

Section end event

SECTION_START = 2

Section start event

SUBSECTION_END = 5

Subsection end event

SUBSECTION_START = 4

Subsection start event

TEXT = 6

Text element event