Relationships

This document contains important information about how relationships between objects are represented in the database, and how it affects the way queries about an object’s neighbors will behave.

Relationships are directed

An important concept in MetagenomeDB is that a relationship between two objects is always directed. I.e., in any relationship there is a source and a target. As such, linking for example a sequence A to another sequence B (using its Sequence.relate_to_sequence() method) doesn’t imply that B is linked to A (as tested by the has_relationships_with() method):

>>> # we retrieve two sequences, 'sequence1' and 'sequence2'
>>> s1 = mdb.Sequence.find_one({"name": "sequence1"})
>>> s2 = mdb.Sequence.find_one({"name": "sequence2"})

>>> # if we link s1 to s2,
>>> s1.relate_to_sequence(s2)

>>> # then a relationship from s1 to s2 do exists
>>> print s1.has_relationships_with(s2)
True

>>> # however the opposite is not true
>>> print s2.has_relationships_with(s1)
False

This concept is important when you search for objects that are related to a given object of interest (neighbors). Methods in the API allow you to distinguish between objects pointing to your object, and objects your object points to:

Note

All those methods accept filters for both neighbor objects and the relationship between the current object and its neighbors. See Queries for information about how to create filters.

Relationships can be annotated

All methods that create a relationship between objects (Sequence.add_to_collection(), Sequence.relate_to_sequence(), and Collection.add_to_collection()) accept as a second argument a description of this relationship. This description is represented the same way as for objects’ annotations; i.e., as a dictionary (see Annotations):

>>> s1.relate_to_sequence(s2, {"property_of_relationship": "foo"})

As for objects’ annotations, relationship descriptions can have nested properties, declared using either nested dictionaries or dot notation:

>>> s1.relate_to_sequence(s2, {"monty": {"python": "bar"}})

Describing a relationship between objects is useful to explain why the source should be connected to its target. For example, a convention enforced by the command-line tools (see Tools Documentation) is to annotate relationships between sequences with a property type, of which value is either part-of (when a sequence is part of another sequence; e.g., a read that is part of a contig) or similar-to (when a sequence is similar to another sequence; e.g., as shown by a BLAST run). Those same tools also add annotations about how the relationship was established, using which technique, any associated score, etc.

Note

Two objects can be linked by more than one relationship, as long as their annotations are distinct. In the following example,

>>> s1.relate_to_sequence(s2, {"relationship_property": "foo"})
>>> s1.relate_to_sequence(s2, {"relationship_property": "bar"})
>>> s1.relate_to_sequence(s2, {"relationship_property": "foo"})

Only the two first instructions will result in the addition of a relationship between s1 and s2. The third is a duplicate of the first, and will be ignored.

Relationships must be committed

Relationships between a source and a target object are stored as a property of the source object. As such, and as with any property an object is annotated with, this relationship will be lost if the source object is not committed before being discarded (see Annotations).

There is another important consequence of not committing a new relationship: this relationship will be invisible to most methods that list or count neighbor objects:

>>> # let's retrieve two collections, previously unrelated:
>>> c1 = mdb.Collection.find_one({"name": "collection1"})
>>> c2 = mdb.Collection.find_one({"name": "collection2"})

>>> # if we create a link from c1 (source) to c2 (target), but do not commit it,
>>> c1.add_to_collection(c2)
>>> print c1.is_committed()
False

>>> # this relationship is only visible from c1:
>>> print list(c1.list_related_collections(mdb.Direction.OUTGOING))
[<Collection id:... name:'collection2' state:'committed'>]
>>> print list(c1.list_super_collections())
[<Collection id:... name:'collection2' state:'committed'>]

>>> # but not from c2:
>>> print list(c2.list_related_collections(mdb.Direction.INGOING))
[]
>>> print list(c2.list_sub_collections())
[]

Knowing which method will or will not assess a new relationship can be confusing. As a general rule, you should always commit a source object after you connected it to a target object.

Note

When creating a relationship between a source and a target object the target must have been committed at least once in the database. If not an exception will be thrown. The reason for this constraint is the need for the API to have an internal identifier in the database when linking both objects. This identifier is only created the first time an object is committed.

Table Of Contents

Previous topic

Queries

Next topic

Sequence