Semantic Similarity

This module contains the classes for the evaluation of the Semantic Similarity. Please refer to the single classes for details on the implemented measures.

SemSim.TermSemSim

This class provides the prototype for Term semantic similarity measures (TSS)

There are two types of Term semantic similarity: a first group that can evaluate the semantic similarity between two sets of terms (groupwise - G_TSS), and a second group that can only evaluate the similarity between pairs of GO terms (pairwise - P_TSS). Each class extending TermSemSim should declare whether it is groupwise or pairwise.

TermSemSim relies on SemSimUtils to perform a lot of tasks (e.g. evaluating Term IC or common ancestors). A SemSimUtils object can be passed to the constructor as input data. Otherwise, a new instance will be created. Using only one copy of SemSimUtils helps reducing time and space requirements and is strongly recommended.

exception fastsemsim.SemSim.TermSemSim.MissingAcException(message)

Bases: exceptions.Exception

class fastsemsim.SemSim.TermSemSim.TermSemSim(ontology, ac=None, util=None, do_log=False)

Bases: object

G_TSS = 'Groupwise'
IC_based = None
P_TSS = 'Pairwise'
SS_type = None
SemSim(term1, term2, ontology=None)
format_and_check_data = True
setSanityCheck(en)

SemSim.ObjSemSim

This class provides the prototype for a generic Object Semantic Similarity measure

class fastsemsim.SemSim.ObjSemSim.ObjSemSim(ontology, ac, TSS=None, MSS=None, util=None, do_log=False)

Bases: object

SemSim(obj1, obj2, root=None)

SemSim.ObjSetSemSim

This class provides the prototype for a generic Object Set Semantic Similarity measure (PSS)

class fastsemsim.SemSim.ObjSetSemSim.ObjSetSemSim(ontology, ac, TSS=None, MSS=None, util=None, do_log=False)
SemSim(obj1, obj2, root=None)

SemSim.SetSemSim

This class provides the prototype for a generic Pairwise Object Semantic Similarity measure

class fastsemsim.SemSim.SetSemSim.SetSemSim(ontology, ac=None, TSS=None, MSS=None, util=None, do_log=False)
SemSim(obj1, obj2, root=None)

Specific Semantic Similarity measures

SemSim.ResnikSemSim

Resnik Semantic Similarity Measure

Reference: Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11, 95-130.

class fastsemsim.SemSim.ResnikSemSim.ResnikSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Pairwise'

SemSim.CosineSemSim

Cosine Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.CosineSemSim.CosineSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Groupwise'
dotprod(vector1, vector2)
extend_annotations = True

SemSim.CzekanowskiDiceSemSim

Czekanowski and Dice Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.CzekanowskiDiceSemSim.CzekanowskiDiceSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Groupwise'
extend_annotations = True

SemSim.DiceSemSim

Dice Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.DiceSemSim.DiceSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Groupwise'
extend_annotations = True

SemSim.GSESAMESemSim

G-SESAME Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.GSESAMESemSim.GSESAMESemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Pairwise'
generic_score = 0.5
is_a_score = 0.8
neg_regulates_score = 0.6
part_of_score = 0.6
pos_regulates_score = 0.6
regulates_score = 0.6
score_ancestors(term)
score_edge(tp, t)

SemSim.JaccardSemSim

Jaccard Index based Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.JaccardSemSim.JaccardSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Groupwise'
extend_annotations = True

SemSim.JiangConrathSemSim

Jiang and Conrath Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.JiangConrathSemSim.JiangConrathSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Pairwise'

SemSim.LinSemSim

Lin Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.LinSemSim.LinSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Pairwise'

SemSim.SimGICSemSim

SimGIC Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimGICSemSim.SimGICSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Groupwise'

SemSim.SimICNDSemSim

ICND Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimICNDSemSim.ICNDSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Pairwise'
generic_score = 1.0
is_a_score = 1.0
neg_regulates_score = 1.0
part_of_score = 1.0
pos_regulates_score = 1.0
regulates_score = 1.0
score_ancestors(term)
score_edge(tp, t)

SemSim.SimICNPSemSim

ICNP Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimICNPSemSim.ICNPSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Pairwise'
generic_score = 1.0
is_a_score = 1.0
neg_regulates_score = 1.0
part_of_score = 1.0
pos_regulates_score = 1.0
regulates_score = 1.0
score_ancestors(term)
score_edge(tp, t)

SemSim.SimICSemSim

Information Content Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimICSemSim.SimICSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Pairwise'
use_Lin = True

SemSim.SimNTOSemSim

Normalized Term Overlap Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimNTOSemSim.SimNTOSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Groupwise'
extend_annotations = True

SemSim.SimRelSemSim

SimRel Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimRelSemSim.SimRelSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = True
SS_type = 'Pairwise'
use_Lin = True

SemSim.SimTOSemSim

SimTO Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimTOSemSim.SimTOSemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Groupwise'

SemSim.SimUISemSim

SimUI Semantic Similarity Measure

Reference:

class fastsemsim.SemSim.SimUISemSim.SimUISemSim(ontology, ac=None, util=None, do_log=False)

Bases: fastsemsim.SemSim.TermSemSim.TermSemSim

IC_based = False
SS_type = 'Groupwise'

SemSim.MixSemSim

This class provides the prototype for a generic mixing strategy for pairwise Term Semantic Similarity measures

class fastsemsim.SemSim.MixSemSim.MixSemSim(ontology, ac, util=None, do_log=False)

Bases: object

SemSim(set1, set2, TSS)

SemSim.avgSemSim

This class defines the prototype for a generic mixing strategy for pairwise term Protein Semantic Similarity measures

class fastsemsim.SemSim.avgSemSim.avgSemSim(ontology, ac, util=None, do_log=False)

Bases: fastsemsim.SemSim.MixSemSim.MixSemSim

SemSim.maxSemSim

Max mixing strategy

class fastsemsim.SemSim.maxSemSim.maxSemSim(ontology, ac, util=None, do_log=False)

Bases: fastsemsim.SemSim.MixSemSim.MixSemSim

SemSim.BMASemSim

Best Match Average (BMA) mixing strategy for pairwise term Protein Semantic Similarity measures

class fastsemsim.SemSim.BMASemSim.BMASemSim(ontology, ac, util=None, do_log=False)

Bases: fastsemsim.SemSim.MixSemSim.MixSemSim

fair = True

SemSim.SemSimUtils

This class provides some routines to calculate basic properties used by different SS measures. In particular this class provides code for evaluating:

  • term ICs
  • term frequency within an annotation corpus
  • term’s ancestors
  • term’s offspring
  • terms’s children
  • terms’s parents
  • MICA/DCA/LCA
  • term’s distance
class fastsemsim.SemSim.SemSimUtils.SemSimUtils(ontology, ac=None)

Bases: object

det_IC(term)
det_IC_table()
det_MICA(term1, term2)
det_ancestors_union(term1, term2)
det_common_ancestors(term1, term2)
difference(set1, set2)
get_ancestors(term1)
int_det_IC(term_id)
int_det_IC_table()
int_det_ancestors(goid, temp_intra)
int_det_ancestors_table()
int_det_freq(term_id)
int_det_freq_table()
int_det_lineage()
int_det_offspring(goid, temp_intra)
int_det_offspring_table()
int_det_p(term_id)
int_det_p_table()
int_merge_sets(set1, set2)
intersection(set1, set2)