Note
This page can be skipped if all that is needed are efficient sorted drop-in replacemnt containers for Python’s sets and dicts.
Consider the following tree, drawn rotated 90 degrees counter-clockwise (the root of the tree is 40, and the leftmost leaf is 3).
/-99
/-96
| \-88
/-73
| | /-64
| \-57
| \-54
| \-52
40
| /-37
| /-34
| | \-29
| | \-28
\-27
| /-24
\-20
\-6
\-3
It is well known that some queries can be supported efficiently by augmenting each node with metadata having the property that the metadata at each node can be calculated from the metadata of its direct children. See the following lecture, or Chapter 14 of Introduction To Algorithms (3rd edition), for the specifics.
E.g., consider the same tree, but with each node augmented by the size of its subtree (shown in square brackets). Then this tree can answer efficiently queries such as what is the ordinal position of this key in the set, or which is the kth key. The size of a node’s subtree can indeed be calculated by the size of the subtrees sizes of the direct children.
/-99 [ 1 ]
/-96 [ 3 ]
| \-88 [ 1 ]
/-73 [ 8 ]
| | /-64 [ 1 ]
| \-57 [ 4 ]
| \-54 [ 2 ]
| \-52 [ 1 ]
40 [ 18 ]
| /-37 [ 1 ]
| /-34 [ 4 ]
| | \-29 [ 2 ]
| | \-28 [ 1 ]
\-27 [ 9 ]
| /-24 [ 1 ]
\-20 [ 4 ]
\-6 [ 2 ]
\-3 [ 1 ]
As another example, here is the same tree, but with each node augmented by the smallest and largest keys in its subtrees (shown as a pair in square brackets). Then, obviously, this tree can answer efficiently what are the smallest and largest keys in the entire tree. Here, too, the relevant information can be calculated directly from that of the direct children.
/-99 [ (99, 99) ]
/-96 [ (88, 99) ]
| \-88 [ (88, 88) ]
/-73 [ (52, 99) ]
| | /-64 [ (64, 64) ]
| \-57 [ (52, 64) ]
| \-54 [ (52, 54) ]
| \-52 [ (52, 52) ]
40 [ (3, 99) ]
| /-37 [ (37, 37) ]
| /-34 [ (28, 37) ]
| | \-29 [ (28, 29) ]
| | \-28 [ (28, 28) ]
\-27 [ (3, 37) ]
| /-24 [ (24, 24) ]
\-20 [ (3, 24) ]
\-6 [ (3, 6) ]
\-3 [ (3, 3) ]
The library comes with a number of predefined classes, as well as the ability to plug in new algorithms.
Using the predefined classes is easy. Simply specify the class as the updator parameter of a set or dict, and its methods become available as those of the set or dict. For example:
>>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], updator = RankUpdator)
>>> t
SortedSet['hao', 'jian', 'jiu', 'mei'])
>>>
>>> # 'hao' is key no. 0
>>> t.kth(0)
'hao'
>>> t.order('hao')
0
>>>
>>> # 'mei' is key no. 3
>>> t.kth(3)
'mei'
>>> t.order('mei')
3
The kth and order methods of the updator class become part of the set, in this case.
Writing a new updator is probably best understood by an example. Following is a SumUpdator updator class, which provides the additional functionality of asking for the sum of the keys of all keys stored in a set or dict. That is, after writing the class, we will be able to use it as follows:
>>> t = SortedSet(range(4), updator = SumUpdator)
>>> t
SortedSet([0, 1, 2, 3])
>>> t.sum_()
6
Of course, there are simpler ways to track the sum of keys in a container, and this example is just to illustrate the point of writing a new updator.
The entire code of the updator is as follows:
class SumUpdator(object):
class Metadata(object):
def update(self, key, key_fn, l, r):
self.sum = key
for c in [l, r]:
if c is not None:
self.sum += r.sum
def __repr__(self):
return str(self.sum)
def sum_(self):
return 0 if self.root is None else self.root.metadata.sum
It contains two types of entries:
The Metadata Class
The metadata class needs to support a single method, update. This method is called with the following parameters (besides self):
The code of update, in this case, is as follows:
def update(self, key, key_fn, l, r):
self.sum = key
for c in [l, r]:
if c is not None:
self.sum += r.sum
The first line sets the sum using the key parameter. The next three lines add the sums of the children (if any), to the sum.
The sum_ method
Recall that any method appearing in the updator will be adopted by the container. In this case, the code is:
def sum_(self):
return 0 if self.root is None else self.root.metadata.sum
and, as it becomes part of the code of the set or dict, it may access their methods and properties. In this case, the code accesses the root property, to obtain a banyan.Node object. It uses the metadata property of the node to return the sum.
Augmentation does not alter the running time of non-modifying operations (e.g., __contains__), and while it does not change the order of growth of modifying operations, it adds a multiplicative factor. For the predefined classes, the multiplicative factor is small. Due to the dynamic nature of Python, however, pure-Python augmentation incurrs a significant multiplicative factor, and is therefore primarily useful for very large trees. See the Updators Section in the Performance Page.