KEGG - Kyoto Encyclopedia of Genes and Genomes (kegg
)¶
KEGG - Kyoto Encyclopedia of Genes and Genomes¶
kegg
is a python module for accessing KEGG (Kyoto Encyclopedia
of Genes and Genomes) using its web services.
>>> from orangecontrib.bio.kegg import *
>>> # Create a KEGG Genes database interface
>>> genome = KEGGGenome()
>>> # List all available entry ids
>>> keys = genome.keys()
>>> print(keys[0])
T01001
>>> # Retrieve the entry for the key.
>>> entry = genome[keys[0]]
>>> print(entry.entry_key)
T01001
>>> print(entry.definition)
Homo sapiens (human)
>>> print(str(entry))
ENTRY T01001 Complete Genome
NAME hsa, HUMAN, 9606
DEFINITION Homo sapiens (human)
...
The Organism
class can be a convenient starting point
for organism specific databases.
>>> organism = Organism("Homo sapiens") # searches for the organism by name
>>> print(organism.org_code) # prints the KEGG organism code
hsa
>>> genes = organism.genes # get the genes database for the organism
>>> gene_ids = genes.keys() # KEGG gene identifiers
>>> entry = genes["hsa:672"]
>>> print(entry.definition)
breast cancer 1, early onset
>>> # print the entry in DBGET database format.
>>> print(entry)
ENTRY 672 CDS T01001
NAME BRCA1, BRCAI, BRCC1, BROVCA1, FANCS, IRIS, PNCA4, PPP1R53, PSCP, RNF53
DEFINITION breast cancer 1, early onset
...
-
class
orangecontrib.bio.kegg.
Organism
(org, genematcher=None)¶ A convenience class for retrieving information regarding an organism in the KEGG Genes database.
Parameters: org (str) – KEGG organism code (e.g. “hsa”, “sce”). Can also be a descriptive name (e.g. ‘yeast’, “homo sapiens”) in which case the organism code will be searched for by using KEGG find api. See also
organism_name_search()
- Search KEGG for an organism code
-
org
¶ KEGG organism code.
-
gene_aliases
()¶ Return a list of sets of equal genes (synonyms) in KEGG for this organism.
Note
This only includes ‘ncbi-geneid’ and ‘ncbi-gi’ records from the KEGG Genes DBLINKS entries.
-
pathways
(with_ids=None)¶ Return a list of all pathways for this organism.
-
list_pathways
()¶ List all pathways for this organism.
-
get_enriched_pathways
(genes, reference=None, prob=<orangecontrib.bio.utils.stats.Binomial object>, callback=None)¶ Return a dictionary with enriched pathways ids as keys and (list_of_genes, p_value, num_of_reference_genes) tuples as items.
-
get_pathways_by_genes
(gene_ids)¶ Pathways that include all genes in gene_ids.
-
get_unique_gene_ids
(genes, case_sensitive=True)¶ Return a tuple with three elements. The first is a dictionary mapping from unique geneids to gene names in genes, the second is a list of conflicting gene names and the third is a list of unknown genes.
-
orangecontrib.bio.kegg.
organism_name_search
(name)¶ Search for a organism by name and return it’s KEGG organism code.
-
orangecontrib.bio.kegg.
pathways
(org)¶ Return a list of all KEGG pathways for an KEGG organism code org.
-
orangecontrib.bio.kegg.
from_taxid
(taxid)¶ Return a KEGG organism code for a an NCBI Taxonomy id string taxid.
-
orangecontrib.bio.kegg.
to_taxid
(name)¶ Return a NCBI Taxonomy id for a given KEGG Organism name
DBEntry (entry
)¶
The entry.DBEntry
represents a DBGET databas entry.
The individual KEGG Database interfaces below provide their own
specialization for this base class.
KEGG Databases interface (databases
)¶
-
class
orangecontrib.bio.kegg.databases.
DBDataBase
(**kwargs)¶ Bases:
object
Base class for a DBGET database interface.
-
ENTRY_TYPE
¶ ENTRY_TYPE constructor (a
DBEntry
subclass). This should be redefined in subclasses.alias of
DBEntry
-
DB
= None¶ A database name/abbreviation (e.g. ‘pathway’). Needs to be set in a subclass or object instance’s constructor before calling the base. __init__
-
keys
()¶ Return a list of database keys. These are unique KEGG identifiers that can be used to query the database.
-
iterkeys
()¶ Return an iterator over the keys.
-
items
()¶ Return a list of all (key,
DBDataBase.ENTRY_TYPE
instance) tuples.
-
iteritems
()¶ Return an iterator over the items.
-
values
()¶ Return a list of all
DBDataBase.ENTRY_TYPE
instances.
-
itervalues
()¶ Return an iterator over all
DBDataBase.ENTRY_TYPE
instances.
-
get
(key, default=None)¶ Return an
DBDataBase.ENTRY_TYPE
instance for the key. RaisesKeyError
if not found.
-
get_text
(key)¶ Return the database entry for key as plain text.
-
get_entry
(key)¶ Return the database entry for key as an instance of ENTRY_TYPE.
-
find
(name)¶ Find name using kegg find api.
-
pre_cache
(keys=None, batch_size=10, progress_callback=None)¶ Retrieve all the entries for keys and cache them locally for faster subsequent retrieval. If keys is
None
then all entries will be retrieved.
-
batch_get
(keys)¶ Batch retrieve all entries for keys. This can be significantly faster then getting each entry separately especially if entries are not yet cached.
-
-
class
orangecontrib.bio.kegg.databases.
GenomeEntry
(text)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry
Entry for a KEGG Genome database.
-
organism_code
¶ A three or four letter KEGG organism code (e.g. ‘hsa’, ‘sce’, ...)
-
taxid
¶ Organism NCBI taxonomy id.
-
annotation
¶ ANNOTATION
-
chromosome
¶ CHROMOSOME
-
comment
¶ COMMENT
-
data_source
¶ DATA_SOURCE
-
definition
¶ DEFINITION
-
disease
¶ DISEASE
-
entry
¶ ENTRY
-
keywords
¶ KEYWORDS
-
name
¶ NAME
-
original_db
¶ ORIGINAL_DB
-
plasmid
¶ PLASMID
-
reference
¶ REFERENCE
-
statistics
¶ STATISTICS
-
taxonomy
¶ TAXONOMY
-
-
class
orangecontrib.bio.kegg.databases.
Genome
¶ Bases:
orangecontrib.bio.kegg.databases.DBDataBase
An interface to the A KEGG GENOME database.
-
ENTRY_TYPE
¶ alias of
GenomeEntry
-
org_code_to_entry_key
(code)¶ Map an organism code (‘hsa’, ‘sce’, ...) to the corresponding kegg identifier (T + 5 digit number).
-
search
(string, relevance=False)¶ Search the genome database for string using
bfind
.
-
-
class
orangecontrib.bio.kegg.databases.
GeneEntry
(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry
-
aaseq
¶ AASEQ
-
class_
¶ CLASS
-
dblinks
¶ DBLINKS
-
definition
¶ DEFINITION
-
disease
¶ DISEASE
-
drug_target
¶ DRUG_TARGET
-
entry
¶ ENTRY
-
module
¶ MODULE
-
motif
¶ MOTIF
-
name
¶ NAME
-
ntseq
¶ NTSEQ
-
organism
¶ ORGANISM
-
orthology
¶ ORTHOLOGY
-
pathway
¶ PATHWAY
-
position
¶ POSITION
-
structure
¶ STRUCTURE
-
-
class
orangecontrib.bio.kegg.databases.
Genes
(org_code)¶ Bases:
orangecontrib.bio.kegg.databases.DBDataBase
Interface to the KEGG Genes database.
Parameters: org_code (str) – KEGG organism code (e.g. ‘hsa’).
-
class
orangecontrib.bio.kegg.databases.
CompoundEntry
(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry
-
atom
¶ ATOM
-
bond
¶ BOND
-
brite
¶ BRITE
-
comment
¶ COMMENT
-
dblinks
¶ DBLINKS
-
entry
¶ ENTRY
-
enzyme
¶ ENZYME
-
exact_mass
¶ EXACT_MASS
-
formula
¶ FORMULA
-
mol_weight
¶ MOL_WEIGHT
-
name
¶ NAME
-
pathway
¶ PATHWAY
-
reaction
¶ REACTION
-
reference
¶ REFERENCE
-
remark
¶ REMARK
-
-
class
orangecontrib.bio.kegg.databases.
Compound
¶
-
class
orangecontrib.bio.kegg.databases.
ReactionEntry
(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry
-
definition
¶ DEFINITION
-
entry
¶ ENTRY
-
enzyme
¶ ENZYME
-
equation
¶ EQUATION
-
name
¶ NAME
-
-
class
orangecontrib.bio.kegg.databases.
Reaction
¶
-
class
orangecontrib.bio.kegg.databases.
EnzymeEntry
(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry
-
all_reac
¶ ALL_REAC
-
class_
¶ CLASS
-
comment
¶ COMMENT
-
dblinks
¶ DBLINKS
-
entry
¶ ENTRY
-
genes
¶ GENES
-
name
¶ NAME
-
orthology
¶ ORTHOLOGY
-
pathway
¶ PATHWAY
-
product
¶ PRODUCT
-
reaction
¶ REACTION
-
reference
¶ REFERENCE
-
substrate
¶ SUBSTRATE
-
sysname
¶ SYSNAME
-
-
class
orangecontrib.bio.kegg.databases.
Enzyme
¶
-
class
orangecontrib.bio.kegg.databases.
PathwayEntry
(text=None)¶ Bases:
orangecontrib.bio.kegg.entry.DBEntry
-
class_
¶ CLASS
-
compound
¶ COMPOUND
-
dblinks
¶ DBLINKS
-
description
¶ DESCRIPTION
-
disease
¶ DISEASE
-
drug
¶ DRUG
-
entry
¶ ENTRY
-
enzyme
¶ ENZYME
-
ko_pathway
¶ KO_PATHWAY
-
module
¶ MODULE
-
name
¶ NAME
-
organism
¶ ORGANISM
-
pathway_map
¶ PATHWAY_MAP
-
reference
¶ REFERENCE
-
rel_pathway
¶ REL_PATHWAY
-
-
class
orangecontrib.bio.kegg.databases.
Pathway
(prefix='map')¶ Bases:
orangecontrib.bio.kegg.databases.DBDataBase
KEGG Pathway database
Parameters: prefix (str) – KEGG Organism code (‘hsa’, ...) or ‘map’, ‘ko’, ‘ec’ or ‘rn’
KEGG Pathway (pathway
)¶
-
class
orangecontrib.bio.kegg.pathway.
Pathway
(pathway_id, local_cache=None, connection=None)¶ Bases:
object
Class representing a KEGG Pathway (parsed from a “kgml” file)
Parameters: pathway_id (str) – A KEGG pathway id (e.g. ‘path:hsa05130’) -
name
¶ Pathway name/id (e.g. “path:hsa05130”)
-
org
¶ Pathway organism code (e.g. ‘hsa’)
-
number
¶ Pathway number as a string (e.g. ‘05130’)
-
title
¶ Pathway title string.
-
image
¶ URL of the pathway image.
-
link
¶ URL to a pathway on the KEGG web site.
-
get_image
()¶ Return an local filesystem path to an image of the pathway. The image will be downloaded if not already cached.
-
classmethod
list
(organism)¶ List all pathways for KEGG organism code organism.
-
Utilities¶
-
class
orangecontrib.bio.kegg.entry.parser.
DBGETEntryParser
¶ A DBGET entry parser (inspired by
xml.dom.pulldom
).>>> stream = StringIO("ENTRY foo\n" ... "NAME foo's name\n" ... " BAR A subsection of 'NAME'\n") ... >>> parser = DBGETEntryParser() >>> for event, title, contents_part in parser.parse(stream): ... print(parser.EVENTS[event], title, repr(contents_part)) ... ENTRY_START None None SECTION_START ENTRY 'foo\n' SECTION_END ENTRY None SECTION_START NAME "foo's name\n" SUBSECTION_START BAR "A subsection of 'NAME'\n" SUBSECTION_END BAR None SECTION_END NAME None ENTRY_END None None
-
ENTRY_END
= 1¶ Entry end event
-
ENTRY_START
= 0¶ Entry start events
-
SECTION_END
= 3¶ Section end event
-
SECTION_START
= 2¶ Section start event
-
SUBSECTION_END
= 5¶ Subsection end event
-
SUBSECTION_START
= 4¶ Subsection start event
-
TEXT
= 6¶ Text element event
-