Computing the dependency graph

Specifying the working set

When a graph is instantiated without any working set given, it will use the global working set of active distributions defined by pkg_resources:

>>> from tl.eggdeps.graph import Graph
>>> graph = Graph()
>>> sort_specs(graph.working_set)
[...setuptools ... tl.eggdeps ... zope.testing ...]

For testing the graph builder, we will use custom working sets and distributions. Using the convenience distribution factory defined by our test setup, we pass a working set of some mock distributions to the graph builder:

>>> anton_1 = make_dist("anton-1.egg", depends="berta")
>>> berta_2 = make_dist("berta-2.egg", depends="""charlie>1.5
...                                               [extra]
...                                               dora[test]""")
>>> ws = make_working_set(anton_1, berta_2)
>>> graph = Graph(working_set=ws)
>>> sort_specs(graph.working_set)
[anton 1 (.../anton-1.egg), berta 2 (.../berta-2.egg)]

Helper methods

Extracting project names from specifications

Graphs have a method that extracts project names from an iterable of distributions or requirements and returns them as a set:

>>> graph.names(ws)
set([...])
>>> sprint(graph.names(ws))
set(['anton', 'berta'])

A Graph instance has a filter function that determines by project name which distributions to include in the graph. This filter applies to the project names returned by the names method. As it allows any distribution by default, we have to specify something interesting to see an effect:

>>> graph = Graph(ws, show=lambda name: name < "b")
>>> graph.names(ws)
set(['anton'])

Another method actually named filter yields pairs of matching specifications and associated graph nodes for the distribution to be included by said rules. (For more on nodes, see below.)

>>> graph = Graph(working_set=ws)
>>> list(graph.filter(ws))
[(anton 1 (.../anton-1.egg), {}), (berta 2 (.../berta-2.egg), {})]
>>> graph = Graph(ws, show=lambda name: name < "b")
>>> list(graph.filter(ws))
[(anton 1 (.../anton-1.egg), {})]

Filtering distributions

The filter for distributions to be shown is stored on the graph instance:

>>> graph.show("anton")
True
>>> graph.show("berta")
False

Finding distributions

Working sets have a find method that returns a distribution matching a requirement if one can be found. It is wrapped by a convenience method of the Graph class that handles a special case.

If we ask for distributions active at a compatible version or not active at all, both find methods behave the same:

>>> import pkg_resources
>>> req = pkg_resources.Requirement.parse("anton")
>>> ws.find(req)
anton 1 (.../anton-1.egg)
>>> graph.find(req)
anton 1 (.../anton-1.egg)
>>> req = pkg_resources.Requirement.parse("charlie")
>>> ws.find(req)
>>> graph.find(req)

Unfortunately, the working set’s find method raises an exception if a distribution for the same project is found, but at an incompatible version. As we treat distributions active at the wrong version the same as distributions not active at all, a convenience method handles the exception for us:

>>> req = pkg_resources.Requirement.parse("anton>5")
>>> ws.find(req)
...
VersionConflict: (anton 1 (.../anton-1.egg), Requirement.parse('anton>5'))
>>> graph.find(req)

Nodes

The graph contains nodes which are instances of the Node class. They get bound to a graph upon instantiation and represent a distribution by its project name. The distribution is specified by an instance of either a Distribution or a Requirement:

>>> from tl.eggdeps.graph import Node
>>> node = Node(graph, anton_1)
>>> node.name
'anton'

The node has a require method that tries to find a distribution matching a specification in the graph’s working set. It returns a boolean indicating success or failure. If it succeeds, it stores the distribution in an attribute of the node. Another attribute keeps record of whether the distribution has been compatible to all specifications required so far. The require method has already been called once when the node was instantiated:

>>> node.dist
anton 1 (.../anton-1.egg)
>>> node.compatible
True

When an attempt to match a specification to the node fails, the distribution remains associated with the node, but the node’s compatibility flag is unset:

>>> req_anton_5 = pkg_resources.Requirement.parse("anton>5")
>>> node.require(req_anton_5)
False
>>> node.dist
anton 1 (.../anton-1.egg)
>>> node.compatible
False

On the other hand, if a distribution is not found upon instantiation, it may well be found by a later attempt at matching some specification:

>>> node = Node(graph, req_anton_5)
>>> print node.dist
None
>>> node.compatible
False
>>> node.require(anton_1)
True
>>> node.dist
anton 1 (.../anton-1.egg)
>>> node.compatible
False

If a node receives a requirement for a distribution of another project than its own, it will complain:

>>> node.require(berta_2)
...
ValueError: A 'anton' node cannot satisfy a 'berta' requirement.

Nodes have an API for having them store and list dependencies on other nodes, either with or without extras involved:

>>> node.depend('berta')
>>> list(node.iter_deps())
[('berta', set([]))]
>>> node.extra_depend('cool-extra', 'charlie')
>>> list(node.iter_deps())
[('charlie', set(['cool-extra'])), ('berta', set([]))]

Dependencies can target packages with one or more extras activated. In that case, an iterable of extras of the dependency can be specified. We will not see anything about this the way we have looked at the stored dependencies so far, but there is another method that reads the storage and returns more detailed information:

>>> node.depend('berta', ('hot-extra',))
>>> node.extra_depend('cool-feature', 'charlie', ('more-coolness',))
>>> list(node.iter_deps())
[('charlie', set(['cool-feature', 'cool-extra'])),
 ('berta', set([]))]
>>> list(node.iter_deps_with_extras())
[('charlie', None, set(['cool-extra'])),
 ('charlie', 'more-coolness', set(['cool-feature'])),
 ('berta', None, set([])),
 ('berta', 'hot-extra', set([]))]

If the same package is a dependency both via an extra and without one, the information on the extra is discarded:

>>> node.depend('charlie')
>>> list(node.iter_deps())
[('charlie', set([])), ('berta', set([]))]
>>> list(node.iter_deps_with_extras())
[('charlie', None, set([])),
 ('charlie', 'more-coolness', set(['cool-feature'])),
 ('berta', None, set([])),
 ('berta', 'hot-extra', set([]))]

A node remembers which of its extras have been used over time:

>>> sorted(node.extras_used)
['cool-extra', 'cool-feature']

Analysing the working set

A dependency graph may be built from the complete working set by finding all possible dependencies between any distributions. The graph will be a mapping from project names to node objects which describe each node’s dependencies. Node objects in turn are mappings from project names of each dependency to a set of dependency descriptions. The empty set signals a mandatory dependency, a set of names means that the dependency is by way of any of the named extras. Dependencies which are not active will be ignored.

Operating on the full working set

By default, all dependencies between any distributions in the working set will be reported, including mandatory as well as extra dependencies:

>>> dora_0_5 = make_dist("dora-0.5.egg")
>>> ws = make_working_set(anton_1, berta_2, dora_0_5)
>>> graph = Graph(ws)
>>> graph.from_working_set()
>>> sprint(graph)
{'anton': {'berta': {None: set([])}},
 'berta': {'dora': {'test': set(['extra'])}},
 'dora': {}}

The graph has a set of roots, which are the names of those distributions that are not a dependency of any other node:

>>> graph.roots
set(['anton'])

If a distribution depends on another one both mandatorily and by some extras (which is possible though not very useful), the dependency is considered a plain mandatory dependency:

>>> emil_1 = make_dist("emil-1.egg", """anton
...                                     [pointless-extra]
...                                     anton""")
>>> ws = make_working_set(anton_1, emil_1)
>>> graph = Graph(ws)
>>> graph.from_working_set()
>>> sprint(graph)
{'anton': {},
 'emil': {'anton': {None: set([])}}}
>>> graph.roots
set(['emil'])

If the graph contains a cycle none of whose constituents is a dependency of a distribution outside the cycle, one of those constituents will be considered a graph root:

>>> emil_1 = make_dist("emil-1.egg", "fritz")
>>> fritz_5 = make_dist("fritz-5.egg", depends="emil")
>>> ws = make_working_set(anton_1, emil_1, fritz_5)
>>> graph = Graph(ws)
>>> graph.from_working_set()
>>> sprint(graph)
{'anton': {},
 'emil': {'fritz': {None: set([])}},
 'fritz': {'emil': {None: set([])}}}
>>> graph.roots
set(['anton', 'emil'])

Dependencies from a working set analysis take versions into account. The following does not report a dependency of berta on charlie as berta requires at least charlie 1.5:

>>> charlie_1_4 = make_dist("charlie-1.4.egg")
>>> ws = make_working_set(berta_2, charlie_1_4)
>>> graph = Graph(ws)
>>> graph.from_working_set()
>>> sprint(graph)
{'berta': {},
 'charlie': {}}

Reducing the graph

Extra dependencies may be ignored completely to simplify a complex graph:

>>> ws = make_working_set(anton_1, berta_2, dora_0_5)
>>> graph = Graph(ws, extras=False)
>>> graph.from_working_set()
>>> sprint(graph)
{'anton': {'berta': {None: set([])}},
 'berta': {},
 'dora': {}}
>>> sorted(graph.roots)
['anton', 'dora']

Alternatively, specific distributions may be ignored. The graph will then not contain any node for those distributions, nor any edges for dependencies on them or their own dependencies. This is achieved by specifying a filter function that determines which distributions ought to be shown:

>>> graph = Graph(ws, show=lambda name: name != "berta")
>>> graph.from_working_set()
>>> sprint(graph)
{'anton': {},
 'dora': {}}
>>> sorted(graph.roots)
['anton', 'dora']

If certain distributions should themselves be included in the graph but their dependencies not be followed, they can be made “dead ends” by passing a filter function that determines which distributions to follow the dependencies of:

>>> graph = Graph(ws, follow=lambda name: name != "berta")
>>> graph.from_working_set()
>>> sprint(graph)
{'anton': {'berta': {None: set([])}},
 'berta': {},
 'dora': {}}
>>> sorted(graph.roots)
['anton', 'dora']
>>> graph["anton"].follow
True
>>> graph["berta"].follow
False
>>> graph["dora"].follow
True

Analysing specific distributions’ dependencies

The second way of building a dependency graph is by inspecting the dependencies of one or more specified distributions. In this scenario, unrelated active distributions are ignored.

Operating on the full working set

In this example, anton does not depend on berta and berta’s dependencies:

>>> charlie_1_6 = make_dist("charlie-1.6.egg")
>>> ws = make_working_set(anton_1, berta_2, charlie_1_6)
>>> graph = Graph(ws)
>>> graph.from_specifications("berta")
>>> sprint(graph)
{'berta': {'charlie': {None: set([])}},
 'charlie': {}}

The roots of the graph are the specified distributions now:

>>> graph.roots
set(['berta'])

On the other hand, required distributions which are not in the working set are included now. In the example, this applies to dora:

>>> graph = Graph(ws)
>>> graph.from_specifications("berta [extra]")
>>> sprint(graph)
{'berta': {'charlie': {None: set([])},
           'dora': {None: set(['extra'])}},
 'charlie': {},
 'dora': {}}
>>> graph.roots
set(['berta'])

Node objects store their associated distribution on an attribute. Since dora is inactive it doesn’t have one, in contrast to berta and charlie:

>>> graph["berta"].dist
berta 2 (.../berta-2.egg)
>>> graph["charlie"].dist
charlie 1.6 (.../charlie-1.6.egg)
>>> graph["dora"].dist

If a version of charlie incompatible with the requirement by berta is active, charlie is treated as if it wasn’t active at all:

>>> ws = make_working_set(berta_2, charlie_1_4)
>>> graph = Graph(ws)
>>> graph.from_specifications("berta")
>>> sprint(graph)
{'berta': {'charlie': {None: set([])}},
 'charlie': {}}
>>> graph["charlie"].dist

Reducing the graph

In contrast to analysing the whole working set, turning off extra dependencies will remove those packages from the graph which are dependencies of the root nodes only by way of extras. In our example, this applies to anton:

>>> ws = make_working_set(anton_1, berta_2, charlie_1_6)
>>> graph = Graph(ws, extras=False)
>>> graph.from_specifications("berta [extra]")
>>> sprint(graph)
{'berta': {'charlie': {None: set([])}},
 'charlie': {}}
>>> graph.roots
set(['berta'])

Ignoring specific distributions has different effects than in whole working set analysis as well. Whichever other distributions are connected to the roots only through a distribution which is to be ignored (charlie as a dependency of berta in this case), will be left out of the graph themselves:

>>> graph = Graph(ws, show=lambda name: name != "berta")
>>> graph.from_specifications("anton")
>>> sprint(graph)
{'anton': {}}
>>> graph.roots
set(['anton'])

Similarly, distributions depended on by dead ends only (charlie again) will be missing from the graph:

>>> graph = Graph(ws, follow=lambda name: name != "berta")
>>> graph.from_specifications("anton")
>>> sprint(graph)
{'anton': {'berta': {None: set([])}},
 'berta': {}}
>>> graph.roots
set(['anton'])
>>> graph["anton"].follow
True
>>> graph["berta"].follow
False

But of course, distributions depended upon by ignored distributions and dead ends are not ignored, they may just be missed because of dependencies not being followed. If there are other paths from the roots to them, those distributions will be included in the graph, but with some connections missing:

>>> fritz_5 = make_dist("fritz-5.egg", depends="""berta
...                                               charlie""")
>>> ws = make_working_set(berta_2, charlie_1_6, fritz_5)
>>> graph = Graph(ws, show=lambda name: name != "berta")
>>> graph.from_specifications("fritz")
>>> sprint(graph)
{'charlie': {},
 'fritz': {'charlie': {None: set([])}}}
>>> graph = Graph(ws, follow=lambda name: name != "berta")
>>> graph.from_specifications("fritz")
>>> sprint(graph)
{'berta': {},
 'charlie': {},
 'fritz': {'berta': {None: set([])},
           'charlie': {None: set([])}}}
>>> graph["berta"].follow
False
>>> graph["charlie"].follow
True
>>> graph["fritz"].follow
True