================ Augmenting Trees ================ .. Note:: This page can be skipped if all that is needed are efficient sorted drop-in replacemnt containers for Python's sets and dicts. .. _augmenting_trees_chapter: ------------ Introduction ------------ Consider the following tree, drawn rotated 90 degrees counter-clockwise (the root of the tree is 40, and the leftmost leaf is 3). :: /-99 /-96 | \-88 /-73 | | /-64 | \-57 | \-54 | \-52 40 | /-37 | /-34 | | \-29 | | \-28 \-27 | /-24 \-20 \-6 \-3 It is well known that some queries can be supported efficiently by augmenting each node with metadata having the property that the metadata at each node can be calculated from the metadata of its direct children. See the following `lecture`_, or Chapter 14 of `Introduction To Algorithms`_ (3rd edition), for the specifics. .. _`lecture`: http://www.cs.cmu.edu/~avrim/451/lectures/lect0927.txt .. _`Introduction To Algorithms`: http://en.wikipedia.org/wiki/Introduction_to_Algorithms E.g., consider the same tree, but with each node augmented by the size of its subtree (shown in square brackets). Then this tree can answer efficiently queries such as what is the ordinal position of this key in the set, or which is the kth key. The size of a node's subtree can indeed be calculated by the size of the subtrees sizes of the direct children. :: /-99 [ 1 ] /-96 [ 3 ] | \-88 [ 1 ] /-73 [ 8 ] | | /-64 [ 1 ] | \-57 [ 4 ] | \-54 [ 2 ] | \-52 [ 1 ] 40 [ 18 ] | /-37 [ 1 ] | /-34 [ 4 ] | | \-29 [ 2 ] | | \-28 [ 1 ] \-27 [ 9 ] | /-24 [ 1 ] \-20 [ 4 ] \-6 [ 2 ] \-3 [ 1 ] As another example, here is the same tree, but with each node augmented by the smallest and largest keys in its subtrees (shown as a pair in square brackets). Then, obviously, this tree can answer efficiently what are the smallest and largest keys in the entire tree. Here, too, the relevant information can be calculated directly from that of the direct children. :: /-99 [ (99, 99) ] /-96 [ (88, 99) ] | \-88 [ (88, 88) ] /-73 [ (52, 99) ] | | /-64 [ (64, 64) ] | \-57 [ (52, 64) ] | \-54 [ (52, 54) ] | \-52 [ (52, 52) ] 40 [ (3, 99) ] | /-37 [ (37, 37) ] | /-34 [ (28, 37) ] | | \-29 [ (28, 29) ] | | \-28 [ (28, 28) ] \-27 [ (3, 37) ] | /-24 [ (24, 24) ] \-20 [ (3, 24) ] \-6 [ (3, 6) ] \-3 [ (3, 3) ] ------------------------ Implementation In Banyan ------------------------ The library comes with a number of :ref:`predefined classes `, as well as the ability to plug in new algorithms. Using The Predefined Classes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using the :ref:`predefined classes ` is easy. Simply specify the class as the ``updator`` parameter of a set or dict, and its methods become available as those of the set or dict. For example: :: >>> t = SortedSet(['hao', 'jiu', 'mei', 'jian'], updator = RankUpdator) >>> t SortedSet['hao', 'jian', 'jiu', 'mei']) >>> >>> # 'hao' is key no. 0 >>> t.kth(0) 'hao' >>> t.order('hao') 0 >>> >>> # 'mei' is key no. 3 >>> t.kth(3) 'mei' >>> t.order('mei') 3 The ``kth`` and ``order`` methods of the updator class become part of the set, in this case. Plugging In New Algorithms ~~~~~~~~~~~~~~~~~~~~~~~~~~ Writing a new updator is probably best understood by an example. Following is a ``SumUpdator`` updator class, which provides the additional functionality of asking for the sum of the keys of all keys stored in a set or dict. That is, after writing the class, we will be able to use it as follows: :: >>> t = SortedSet(range(4), updator = SumUpdator) >>> t SortedSet([0, 1, 2, 3]) >>> t.sum_() 6 Of course, there are simpler ways to track the sum of keys in a container, and this example is just to illustrate the point of writing a new updator. The entire code of the updator is as follows: :: class SumUpdator(object): class Metadata(object): def update(self, key, key_fn, l, r): self.sum = key for c in [l, r]: if c is not None: self.sum += r.sum def __repr__(self): return str(self.sum) def sum_(self): return 0 if self.root is None else self.root.metadata.sum It contains two types of entries: * A ``Metadata`` class (which must be called by this name exactly), which describes what will be augmented by each node. * Any number of methods (in this case, a single one, ``sum_``), which will be adopted by the container using this updator. The ``Metadata`` Class The metadata class needs to support a single method, ``update``. This method is called with the following parameters (besides ``self``): * ``key`` - The key in the node corresponding to this metadata. * ``key_fn`` - A key-comparison function; e.g., ``key_fn(a) < key_fn(b)`` can be used to check if, logically, ``a`` is less than ``b``. * ``l`` - The metadata of the left child (``None`` if there isn't any). * ``r`` - The metadata of the right child (``None`` if there isn't any). The code of ``update``, in this case, is as follows: :: def update(self, key, key_fn, l, r): self.sum = key for c in [l, r]: if c is not None: self.sum += r.sum The first line sets the sum using the ``key`` parameter. The next three lines add the sums of the children (if any), to the sum. The ``sum_`` method Recall that any method appearing in the updator will be adopted by the container. In this case, the code is: :: def sum_(self): return 0 if self.root is None else self.root.metadata.sum and, as it becomes part of the code of the set or dict, it may access their methods and properties. In this case, the code accesses the ``root`` property, to obtain a :py:class:`banyan.Node` object. It uses the ``metadata`` property of the node to return the sum. ----------- Performance ----------- Augmentation does not alter the running time of non-modifying operations (e.g., ``__contains__``), and while it does not change the order of growth of modifying operations, it adds a multiplicative factor. For the :ref:`predefined classes `, the multiplicative factor is small. Due to the dynamic nature of Python, however, pure-Python augmentation incurrs a significant multiplicative factor, and is therefore primarily useful for very large trees. See the :ref:`Updators ` Section in the :doc:`performance` Page.