KEGG - Kyoto Encyclopedia of Genes and Genomes (kegg)¶
KEGG - Kyoto Encyclopedia of Genes and Genomes¶
kegg is a python module for accessing KEGG (Kyoto Encyclopedia
of Genes and Genomes) using its web services.
>>> from orangecontrib.bio.kegg import *
>>> # Create a KEGG Genes database interface
>>> genome = KEGGGenome()
>>> # List all available entry ids
>>> keys = genome.keys()
>>> print(keys[0])
T01001
>>> # Retrieve the entry for the key.
>>> entry = genome[keys[0]]
>>> print(entry.entry_key)
T01001
>>> print(entry.definition)
Homo sapiens (human)
>>> print(str(entry))
ENTRY T01001 Complete Genome
NAME hsa, HUMAN, 9606
DEFINITION Homo sapiens (human)
...
The Organism class can be a convenient starting point
for organism specific databases.
>>> organism = Organism("Homo sapiens") # searches for the organism by name
>>> print(organism.org_code) # prints the KEGG organism code
hsa
>>> genes = organism.genes # get the genes database for the organism
>>> gene_ids = genes.keys() # KEGG gene identifiers
>>> entry = genes["hsa:672"]
>>> print(entry.definition)
breast cancer 1, early onset
>>> # print the entry in DBGET database format.
>>> print(entry)
ENTRY 672 CDS T01001
NAME BRCA1, BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4, PPP1R53, PSCP, RNF53
DEFINITION breast cancer 1, early onset
...
-
class
orangecontrib.bio.kegg.Organism(org, genematcher=None)¶ A convenience class for retrieving information regarding an organism in the KEGG Genes database.
Parameters: org (str) – KEGG organism code (e.g. “hsa”, “sce”). Can also be a descriptive name (e.g. ‘yeast’, “homo sapiens”) in which case the organism code will be searched for by using KEGG find api. See also
organism_name_search()- Search KEGG for an organism code
-
org¶ KEGG organism code.
-
gene_aliases()¶ Return a list of sets of equal genes (synonyms) in KEGG for this organism.
Note
This only includes ‘ncbi-geneid’ and ‘ncbi-gi’ records from the KEGG Genes DBLINKS entries.
-
pathways(with_ids=None)¶ Return a list of all pathways for this organism.
-
list_pathways()¶ List all pathways for this organism.
-
get_enriched_pathways(genes, reference=None, prob=<orangecontrib.bio.utils.stats.Binomial object>, callback=None)¶ Return a dictionary with enriched pathways ids as keys and (list_of_genes, p_value, num_of_reference_genes) tuples as items.
-
get_pathways_by_genes(gene_ids)¶ Pathways that include all genes in gene_ids.
-
get_unique_gene_ids(genes, case_sensitive=True)¶ Return a tuple with three elements. The first is a dictionary mapping from unique geneids to gene names in genes, the second is a list of conflicting gene names and the third is a list of unknown genes.
-
orangecontrib.bio.kegg.organism_name_search(name)¶ Search for a organism by name and return it’s KEGG organism code.
-
orangecontrib.bio.kegg.pathways(org)¶ Return a list of all KEGG pathways for an KEGG organism code org.
-
orangecontrib.bio.kegg.from_taxid(taxid)¶ Return a KEGG organism code for a an NCBI Taxonomy id string taxid.
-
orangecontrib.bio.kegg.to_taxid(name)¶ Return a NCBI Taxonomy id for a given KEGG Organism name
DBEntry (entry)¶
The entry.DBEntry represents a DBGET databas entry.
The individual KEGG Database interfaces below provide their own
specialization for this base class.
KEGG Databases interface (databases)¶
-
class
orangecontrib.bio.kegg.databases.DBDataBase(**kwargs)¶ Bases:
objectBase class for a DBGET database interface.
-
ENTRY_TYPE¶ ENTRY_TYPE constructor (a
DBEntrysubclass). This should be redefined in subclasses.alias of
DBEntry
-
DB= None¶ A database name/abbreviation (e.g. ‘pathway’). Needs to be set in a subclass or object instance’s constructor before calling the base. __init__
-
keys()¶ Return a list of database keys. These are unique KEGG identifiers that can be used to query the database.
-
iterkeys()¶ Return an iterator over the keys.
-
items()¶ Return a list of all (key,
DBDataBase.ENTRY_TYPEinstance) tuples.
-
iteritems()¶ Return an iterator over the items.
-
values()¶ Return a list of all
DBDataBase.ENTRY_TYPEinstances.
-
itervalues()¶ Return an iterator over all
DBDataBase.ENTRY_TYPEinstances.
-
get(key, default=None)¶ Return an
DBDataBase.ENTRY_TYPEinstance for the key. RaisesKeyErrorif not found.
-
get_text(key)¶ Return the database entry for key as plain text.
-
get_entry(key)¶ Return the database entry for key as an instance of ENTRY_TYPE.
-
find(name)¶ Find name using kegg find api.
-
pre_cache(keys=None, batch_size=10, progress_callback=None)¶ Retrieve all the entries for keys and cache them locally for faster subsequent retrieval. If keys is
Nonethen all entries will be retrieved.
-
batch_get(keys)¶ Batch retrieve all entries for keys. This can be significantly faster then getting each entry separately especially if entries are not yet cached.
-
-
class
orangecontrib.bio.kegg.databases.GenomeEntry(text)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntryEntry for a KEGG Genome database.
-
organism_code¶ A three or four letter KEGG organism code (e.g. ‘hsa’, ‘sce’, ...)
-
taxid¶ Organism NCBI taxonomy id.
-
annotation¶ ANNOTATION
-
chromosome¶ CHROMOSOME
-
comment¶ COMMENT
-
data_source¶ DATA_SOURCE
-
definition¶ DEFINITION
-
disease¶ DISEASE
-
entry¶ ENTRY
-
keywords¶ KEYWORDS
-
name¶ NAME
-
original_db¶ ORIGINAL_DB
-
plasmid¶ PLASMID
-
reference¶ REFERENCE
-
statistics¶ STATISTICS
-
taxonomy¶ TAXONOMY
-
-
class
orangecontrib.bio.kegg.databases.Genome¶ Bases:
orangecontrib.bio.kegg.databases.DBDataBaseAn interface to the A KEGG GENOME database.
-
ENTRY_TYPE¶ alias of
GenomeEntry
-
org_code_to_entry_key(code)¶ Map an organism code (‘hsa’, ‘sce’, ...) to the corresponding kegg identifier (T + 5 digit number).
-
search(string, relevance=False)¶ Search the genome database for string using
bfind.
-
-
class
orangecontrib.bio.kegg.databases.GeneEntry(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry-
aaseq¶ AASEQ
-
class_¶ CLASS
-
dblinks¶ DBLINKS
-
definition¶ DEFINITION
-
disease¶ DISEASE
-
drug_target¶ DRUG_TARGET
-
entry¶ ENTRY
-
module¶ MODULE
-
motif¶ MOTIF
-
name¶ NAME
-
ntseq¶ NTSEQ
-
organism¶ ORGANISM
-
orthology¶ ORTHOLOGY
-
pathway¶ PATHWAY
-
position¶ POSITION
-
structure¶ STRUCTURE
-
-
class
orangecontrib.bio.kegg.databases.Genes(org_code)¶ Bases:
orangecontrib.bio.kegg.databases.DBDataBaseInterface to the KEGG Genes database.
Parameters: org_code (str) – KEGG organism code (e.g. ‘hsa’).
-
class
orangecontrib.bio.kegg.databases.CompoundEntry(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry-
atom¶ ATOM
-
bond¶ BOND
-
brite¶ BRITE
-
comment¶ COMMENT
-
dblinks¶ DBLINKS
-
entry¶ ENTRY
-
enzyme¶ ENZYME
-
exact_mass¶ EXACT_MASS
-
formula¶ FORMULA
-
mol_weight¶ MOL_WEIGHT
-
name¶ NAME
-
pathway¶ PATHWAY
-
reaction¶ REACTION
-
reference¶ REFERENCE
-
remark¶ REMARK
-
-
class
orangecontrib.bio.kegg.databases.Compound¶
-
class
orangecontrib.bio.kegg.databases.ReactionEntry(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry-
definition¶ DEFINITION
-
entry¶ ENTRY
-
enzyme¶ ENZYME
-
equation¶ EQUATION
-
name¶ NAME
-
-
class
orangecontrib.bio.kegg.databases.Reaction¶
-
class
orangecontrib.bio.kegg.databases.EnzymeEntry(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry-
all_reac¶ ALL_REAC
-
class_¶ CLASS
-
comment¶ COMMENT
-
dblinks¶ DBLINKS
-
entry¶ ENTRY
-
genes¶ GENES
-
name¶ NAME
-
orthology¶ ORTHOLOGY
-
pathway¶ PATHWAY
-
product¶ PRODUCT
-
reaction¶ REACTION
-
reference¶ REFERENCE
-
substrate¶ SUBSTRATE
-
sysname¶ SYSNAME
-
-
class
orangecontrib.bio.kegg.databases.Enzyme¶
-
class
orangecontrib.bio.kegg.databases.PathwayEntry(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry-
class_¶ CLASS
-
compound¶ COMPOUND
-
dblinks¶ DBLINKS
-
description¶ DESCRIPTION
-
disease¶ DISEASE
-
drug¶ DRUG
-
entry¶ ENTRY
-
enzyme¶ ENZYME
-
ko_pathway¶ KO_PATHWAY
-
module¶ MODULE
-
name¶ NAME
-
organism¶ ORGANISM
-
pathway_map¶ PATHWAY_MAP
-
reference¶ REFERENCE
-
rel_pathway¶ REL_PATHWAY
-
-
class
orangecontrib.bio.kegg.databases.Pathway(prefix='map')¶ Bases:
orangecontrib.bio.kegg.databases.DBDataBaseKEGG Pathway database
Parameters: prefix (str) – KEGG Organism code (‘hsa’, ...) or ‘map’, ‘ko’, ‘ec’ or ‘rn’
KEGG Pathway (pathway)¶
-
class
orangecontrib.bio.kegg.pathway.Pathway(pathway_id, local_cache=None, connection=None)¶ Bases:
objectClass representing a KEGG Pathway (parsed from a “kgml” file)
Parameters: pathway_id (str) – A KEGG pathway id (e.g. ‘path:hsa05130’) -
name¶ Pathway name/id (e.g. “path:hsa05130”)
-
org¶ Pathway organism code (e.g. ‘hsa’)
-
number¶ Pathway number as a string (e.g. ‘05130’)
-
title¶ Pathway title string.
-
image¶ URL of the pathway image.
-
link¶ URL to a pathway on the KEGG web site.
-
get_image()¶ Return an local filesystem path to an image of the pathway. The image will be downloaded if not already cached.
-
classmethod
list(organism)¶ List all pathways for KEGG organism code organism.
-
Utilities¶
-
class
orangecontrib.bio.kegg.entry.parser.DBGETEntryParser¶ A DBGET entry parser (inspired by
xml.dom.pulldom).>>> stream = StringIO("ENTRY foo\n" ... "NAME foo's name\n" ... " BAR A subsection of 'NAME'\n") ... >>> parser = DBGETEntryParser() >>> for event, title, contents_part in parser.parse(stream): ... print(parser.EVENTS[event], title, repr(contents_part)) ... ENTRY_START None None SECTION_START ENTRY 'foo\n' SECTION_END ENTRY None SECTION_START NAME "foo's name\n" SUBSECTION_START BAR "A subsection of 'NAME'\n" SUBSECTION_END BAR None SECTION_END NAME None ENTRY_END None None
-
ENTRY_END= 1¶ Entry end event
-
ENTRY_START= 0¶ Entry start events
-
SECTION_END= 3¶ Section end event
-
SECTION_START= 2¶ Section start event
-
SUBSECTION_END= 5¶ Subsection end event
-
SUBSECTION_START= 4¶ Subsection start event
-
TEXT= 6¶ Text element event
-