This module contains all the classes and scripts used to load and represent ontologies and annotation corpora by fastSemSim
Set of functions to parse and handle ontologies.
Bases: xml.sax.handler.ContentHandler
Supported ontologies are those representable as multirooted DAGs. It is not required DAGs to be disconnected, but ‘inter-DAG’ edges are required to be specified. Class Ontology provides a function is_consistent that checks whether this contraints is satisfied. Inconsistent DAGs are NOT currently usable.
Different datastructures can be used to represent ontologies. The section Variables lists a set of different alternatives. Currently Ontology is tuned for using a parent-children and a node-edge representation.
Superclasses can extend the basic datastructure with additional layers of information.
Base class for the representation of an ontology. It currently supports any multi-rooted DAG (Directed Acyclic Graph).
Bases: object
Base class for the representation of an ontology. It currently supports any multi-rooted DAG (Directed Acyclic Graph).
Initialization.
Parameters: |
|
---|
list of weak references to the object (if defined)
Gene Ontology class
@mail marco.mina.85@gmail.com @version 2.0 @desc CellOntology class handles CellOntology
Cell Ontology class
Bases: fastsemsim.Ontology.Ontology.Ontology
@mail marco.mina.85@gmail.com @version 2.0 @desc DiseaseOntology class handles DiseaseOntology
Disease Ontology class
Bases: fastsemsim.Ontology.Ontology.Ontology
This class provides a unified interface to handle Annotation Corpora.
annotations: dict with protein ids as primary key. Each key is associated with a dictionary of GO Terms annotated for the protein. Detailed information, when available, are included as values within the latter dictionary.
reverse_annotations: dict with GO Terms as primary key. Each key is associated with a dict of proteins/gene products annotated with the GO term.
obj_set: set of proteins/gene products present in the annotation table, connected with the taxon id of the organism they belong to, when this information is available. This table is useful to filter out proteins from uninteresting species.
term_set: set of terms present in the annotation table.
If a GO object is passed as input data, annotation corpus is corrected removing obsolete annotations and resolving alternative ids. This can be done later by calling sanitize method after supplying a valid GO object.
general_parameters: filtering options and parameters that apply in general specific_parameters: parameter that should be used to load a particular file format
Each type of file carries different types of information. How to deal with that? Every operation is rerouted to the original file parser, that will take care of it. This is good since it avoids to duplicate data.
Constraint: an Ontology MUST be loaded and provided as an AnnotationCorpus object is istantiated.
Bases: object
#@desc Class to parse Annotation Corporus files in GAF-2.0 format [i.e. Gene Ontology Annotation files] tab separated file. Format as defined in http://geneontology.org/page/go-annotation-file-gaf-format-20