Handling ontologies and annotation corpora

This module contains all the classes and scripts used to load and represent ontologies and annotation corpora by fastSemSim

Ontology.ontologies

Set of functions to parse and handle ontologies.

class fastsemsim.Ontology.ontologies.OboParser(ontology_class, parameters={})
__init__(ontology_class, parameters={})
__module__ = 'fastsemsim.Ontology.ontologies'
alt_id_tag = 'alt_id'
comment_tag = '!'
consider_tag = 'consider'
def_tag = 'def'
delimiter_tag = ':'
find_next_term()
has_tag_structure(st)
id_tag = 'id'
is_a_tag = 'is_a'
is_obsolete_tag = 'is_obsolete'
name_tag = 'name'
namespace_tag = 'namespace'
parse(_handle)
relationship_delimiter_tag = ' '
relationship_tag = 'relationship'
replaced_by_tag = 'replaced_by'
split_rel_tag(st)
split_tag(st)
term_tag = '[Term]'
typedef_tag = '[Typedef]'
class fastsemsim.Ontology.ontologies.OboXmlParser(ontology_class, parameters={})

Bases: xml.sax.handler.ContentHandler

__init__(ontology_class, parameters={})
__module__ = 'fastsemsim.Ontology.ontologies'
alt_id_tag = 'alt_id'
characters(ch)
comment_tag = '!'
consider_tag = 'consider'
def_tag = 'def'
delimiter_tag = ':'
endElement(name)
id_tag = 'id'
is_a_tag = 'is_a'
is_obsolete_tag = 'is_obsolete'
name_tag = 'name'
namespace_tag = 'namespace'
part_of_tag = 'part_of'
relationship_delimiter_tag = ' '
relationship_tag = 'relationship'
relationship_to_tag = 'to'
relationship_type_tag = 'type'
replaced_by_tag = 'replaced_by'
startElement(name, attrs)
term_tag = 'term'
fastsemsim.Ontology.ontologies.load(source=None, source_type='obo', ontology_type='GeneOntology', parameters={})
fastsemsim.Ontology.ontologies.parse(source=None, source_type='obo', ontology_type='GeneOntology', parameters={})

Ontology.Ontology

Supported ontologies are those representable as multirooted DAGs. It is not required DAGs to be disconnected, but ‘inter-DAG’ edges are required to be specified. Class Ontology provides a function is_consistent that checks whether this contraints is satisfied. Inconsistent DAGs are NOT currently usable.

Different datastructures can be used to represent ontologies. The section Variables lists a set of different alternatives. Currently Ontology is tuned for using a parent-children and a node-edge representation.

Superclasses can extend the basic datastructure with additional layers of information.

class fastsemsim.Ontology.Ontology.Ontology(terms, edges, parameters=None)

Base class for the representation of an ontology. It currently supports any multi-rooted DAG (Directed Acyclic Graph).

class fastsemsim.Ontology.Ontology.Ontology(terms, edges, parameters=None)

Bases: object

Base class for the representation of an ontology. It currently supports any multi-rooted DAG (Directed Acyclic Graph).

__init__(terms, edges, parameters=None)

Initialization.

Parameters:
terms : dict

Data of the ontology terms

edges : list

List of ontological relationships

parameters: dict [Default = None]

parameters affecting the construction of the ontology

__weakref__

list of weak references to the object (if defined)

Ontology.GeneOntology

Gene Ontology class

class fastsemsim.Ontology.GeneOntology.GeneOntology(terms, edges, parameters)

Bases: fastsemsim.Ontology.Ontology.Ontology

Ontology.CellOntology

@mail marco.mina.85@gmail.com @version 2.0 @desc CellOntology class handles CellOntology

Cell Ontology class

class fastsemsim.Ontology.CellOntology.CellOntology(terms, edges, parameters)

Bases: fastsemsim.Ontology.Ontology.Ontology

alt_ids = None
obsolete_ids = None

Ontology.DiseaseOntology

@mail marco.mina.85@gmail.com @version 2.0 @desc DiseaseOntology class handles DiseaseOntology

Disease Ontology class

class fastsemsim.Ontology.DiseaseOntology.DiseaseOntology(terms, edges, parameters)

Bases: fastsemsim.Ontology.Ontology.Ontology

alt_ids = None
obsolete_ids = None

Ontology.AnnotationCorpus

This class provides a unified interface to handle Annotation Corpora.

annotations: dict with protein ids as primary key. Each key is associated with a dictionary of GO Terms annotated for the protein. Detailed information, when available, are included as values within the latter dictionary.

reverse_annotations: dict with GO Terms as primary key. Each key is associated with a dict of proteins/gene products annotated with the GO term.

obj_set: set of proteins/gene products present in the annotation table, connected with the taxon id of the organism they belong to, when this information is available. This table is useful to filter out proteins from uninteresting species.

term_set: set of terms present in the annotation table.

If a GO object is passed as input data, annotation corpus is corrected removing obsolete annotations and resolving alternative ids. This can be done later by calling sanitize method after supplying a valid GO object.

general_parameters: filtering options and parameters that apply in general specific_parameters: parameter that should be used to load a particular file format

Each type of file carries different types of information. How to deal with that? Every operation is rerouted to the original file parser, that will take care of it. This is good since it avoids to duplicate data.

Constraint: an Ontology MUST be loaded and provided as an AnnotationCorpus object is istantiated.

class fastsemsim.Ontology.AnnotationCorpus.AnnotationCorpus(go)

Bases: object

class ECFilter(params)
EC = {}
filter(EC)
inclusive = False
name = 'EC'
class AnnotationCorpus.GOFilter(params=None)
GO = None
filter(GO)
inclusive = True
int_go = None
name = 'GO'
set(params)
class AnnotationCorpus.TaxonomyFilter(params)
filter(taxonomy)
inclusive = False
name = 'taxonomy'
taxonomy = {}
AnnotationCorpus.annotations = {}
AnnotationCorpus.annotations_field2pos = {}
AnnotationCorpus.annotations_fields = []
AnnotationCorpus.constrain()
AnnotationCorpus.initCommonFilter()
AnnotationCorpus.int_checkConsistency()
AnnotationCorpus.int_resetFields()
AnnotationCorpus.isConsistent()
AnnotationCorpus.isOk(field, value)
AnnotationCorpus.load(fname, ftype, params={})
AnnotationCorpus.obj_field2pos = {}
AnnotationCorpus.obj_fields = []
AnnotationCorpus.obj_set = {}
AnnotationCorpus.parse(fname, ftype, params={})
AnnotationCorpus.reset()
AnnotationCorpus.resetCommonfilter(i)
AnnotationCorpus.resetFilter(field)
AnnotationCorpus.reverse_annotations = {}
AnnotationCorpus.reverse_annotations_field2pos = {}
AnnotationCorpus.reverse_annotations_fields = []
AnnotationCorpus.sanitize()
AnnotationCorpus.setCommonfilters(inf)
AnnotationCorpus.setFilter(field, selector)
AnnotationCorpus.term_field2pos = {}
AnnotationCorpus.term_fields = []
AnnotationCorpus.term_set = {}

Ontology.GAF2AnnotationCorpus

#@desc Class to parse Annotation Corporus files in GAF-2.0 format [i.e. Gene Ontology Annotation files] tab separated file. Format as defined in http://geneontology.org/page/go-annotation-file-gaf-format-20

class fastsemsim.Ontology.GAF2AnnotationCorpus.GAF2AnnotationCorpus(ac, parameters=None)

Bases: object

int_comment = '!'
int_interpretParameters()
int_separator = '\t'
isOk()
parse(fname)
setFields()

Ontology.PlainAnnotationCorpus

Plain annotation corpus files parsing utility.
Plain format 1: object (eg. gene) ID - Term ID Plain format 1: Term ID - object (eg. gene) ID
class fastsemsim.Ontology.PlainAnnotationCorpus.PlainAnnotationCorpus(ac, parameters=None)

Bases: object

int_interpretParameters()
isOk()
parse(fname)
setFields()