Augmenting Trees

Note

This page can be skipped if all that is needed are efficient sorted drop-in replacemnt containers for Python’s sets and dicts.

Introduction

Consider the following tree, drawn rotated 90 degrees counter-clockwise (the root of the tree is 40, and the leftmost leaf is 3).

    /-99
  /-96
  | \-88
/-73
| | /-64
| \-57
|   \-54
|     \-52
40
|   /-37
| /-34
| | \-29
| |   \-28
\-27
  | /-24
  \-20
    \-6
      \-3

It is well known that some queries can be supported efficiently by augmenting each node with metadata having the property that the metadata at each node can be calculated from the metadata of its direct children. See the following lecture, or Chapter 14 of Introduction To Algorithms (3rd edition), for the specifics.

E.g., consider the same tree, but with each node augmented by the size of its subtree (shown in square brackets). Then this tree can answer efficiently queries such as what is the ordinal position of this key in the set, or which is the kth key. The size of a node’s subtree can indeed be calculated by the size of the subtrees sizes of the direct children.

    /-99 [ 1 ]
  /-96 [ 3 ]
  | \-88 [ 1 ]
/-73 [ 8 ]
| | /-64 [ 1 ]
| \-57 [ 4 ]
|   \-54 [ 2 ]
|     \-52 [ 1 ]
40 [ 18 ]
|   /-37 [ 1 ]
| /-34 [ 4 ]
| | \-29 [ 2 ]
| |   \-28 [ 1 ]
\-27 [ 9 ]
  | /-24 [ 1 ]
  \-20 [ 4 ]
    \-6 [ 2 ]
      \-3 [ 1 ]

As another example, here is the same tree, but with each node augmented by the smallest and largest keys in its subtrees (shown as a pair in square brackets). Then, obviously, this tree can answer efficiently what are the smallest and largest keys in the entire tree. Here, too, the relevant information can be calculated directly from that of the direct children.

    /-99 [ (99, 99) ]
  /-96 [ (88, 99) ]
  | \-88 [ (88, 88) ]
/-73 [ (52, 99) ]
| | /-64 [ (64, 64) ]
| \-57 [ (52, 64) ]
|   \-54 [ (52, 54) ]
|     \-52 [ (52, 52) ]
40 [ (3, 99) ]
|   /-37 [ (37, 37) ]
| /-34 [ (28, 37) ]
| | \-29 [ (28, 29) ]
| |   \-28 [ (28, 28) ]
\-27 [ (3, 37) ]
  | /-24 [ (24, 24) ]
  \-20 [ (3, 24) ]
    \-6 [ (3, 6) ]
      \-3 [ (3, 3) ]

Implementation In Banyan

The library comes with a number of predefined classes, as well as the ability to plug in new algorithms.

Using The Predefined Classes

Using the predefined classes is easy. Simply specify the class as the updator parameter of a set or dict, and its methods become available as those of the set or dict. For example:

>>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], updator = RankUpdator)
>>> t
SortedSet['hao', 'jian', 'jiu', 'mei'])
>>>
>>> # 'hao' is key no. 0
>>> t.kth(0)
'hao'
>>> t.order('hao')
0
>>>
>>> # 'mei' is key no. 3
>>> t.kth(3)
'mei'
>>> t.order('mei')
3

The kth and order methods of the updator class become part of the set, in this case.

Plugging In New Algorithms

Writing a new updator is probably best understood by an example. Following is a SumUpdator updator class, which provides the additional functionality of asking for the sum of the keys of all keys stored in a set or dict. That is, after writing the class, we will be able to use it as follows:

>>> t = SortedSet(range(4), updator = SumUpdator)
>>> t
SortedSet([0, 1, 2, 3])
>>> t.sum_()
6

Of course, there are simpler ways to track the sum of keys in a container, and this example is just to illustrate the point of writing a new updator.

The entire code of the updator is as follows:

class SumUpdator(object):

    class Metadata(object):
        def update(self, key, key_fn, l, r):
            self.sum = key
            for c in [l, r]:
                if c is not None:
                    self.sum += r.sum

        def __repr__(self):
            return str(self.sum)

    def sum_(self):
        return 0 if self.root is None else self.root.metadata.sum

It contains two types of entries:

  • A Metadata class (which must be called by this name exactly), which describes what will be augmented by each node.
  • Any number of methods (in this case, a single one, sum_), which will be adopted by the container using this updator.

The Metadata Class

The metadata class needs to support a single method, update. This method is called with the following parameters (besides self):

  • key - The key in the node corresponding to this metadata.
  • key_fn - A key-comparison function; e.g., key_fn(a) < key_fn(b) can be used to check if, logically, a is less than b.
  • l - The metadata of the left child (None if there isn’t any).
  • r - The metadata of the right child (None if there isn’t any).

The code of update, in this case, is as follows:

def update(self, key, key_fn, l, r):
    self.sum = key
    for c in [l, r]:
        if c is not None:
            self.sum += r.sum

The first line sets the sum using the key parameter. The next three lines add the sums of the children (if any), to the sum.

The sum_ method

Recall that any method appearing in the updator will be adopted by the container. In this case, the code is:

def sum_(self):
    return 0 if self.root is None else self.root.metadata.sum

and, as it becomes part of the code of the set or dict, it may access their methods and properties. In this case, the code accesses the root property, to obtain a banyan.Node object. It uses the metadata property of the node to return the sum.

Performance

Augmentation does not alter the running time of non-modifying operations (e.g., __contains__), and while it does not change the order of growth of modifying operations, it adds a multiplicative factor. For the predefined classes, the multiplicative factor is small. Due to the dynamic nature of Python, however, pure-Python augmentation incurrs a significant multiplicative factor, and is therefore primarily useful for very large trees. See the Updators Section in the Performance Page.

Table Of Contents

Previous topic

Introduction

Next topic

Performance

This Page