Module adt
source code
Algebraic Data Types for Python.
Algebraic Data Types is not an attempt to add a strict typing
discipline to Python, and the word ``type'' here has a much broader
meaning. Types represent models of reasoning about objects. This
models we, humans, employ everyday (at least those of us, who do the
thinking). These are just methods (among others) that we're using to
structure our knowledge. For example, we can say, that both
``Bananas`` and ``Apples`` are ``Fruits`` (biologists, please stop
reading at this point). With this phrase we constructively defined a
new type (concept, idea), that we named the ``Fruit``. To contrast
with abstraction, we didn't try to find anything common between these
two entities, and to remove the differences, we just stated, that the
Fruit is either Banana or Apple. No more, no less. We just used an
alteration to define something. Another example of the alteration
would be to say, that a human is either a man or woman.
If we will reason about types, as sets, then the alteration can be
viewed as a union. A disjoint union in our case, as we're not loosing
any information (we are not abstracting anything out). The union
operation is isomorphic to the summation in arithmetic, that's why we
call such types - sum types. A dual of the sum is a product. The
product models and idea of a composition, i.e., when an entity is
composed of other entities. For example, a ``Bicycle`` is a
combination of ``Wheels``, ``Frame`` and ``Handlebars``. And a ``Car``
is a combination of ``Wheels``, ``Body``, ``Doors``, and
``Engine``. Again, we described concepts constructively, we didn't try
to make any abstractions. (In fact, we employed an abstraction, when
we made a choice how to represent the compound object, by omitting
parts that are not relevant, with respect to our task. But this is a
completely different modus of reasoning, that is in fact orthogonal to
ADT).
Finally, we can mix both concepts together to model even more complex
ideas. For example, we can define that a ``Vehicle`` is either a
``Car`` or ``Bicycle``. Suppose, that we're trying to model road
traffic. In that case we can tell that we have two kinds of road
users, either a ``Motorist`` that is a combination of a ``Car``,
``Driver``, ``Passengers`` and ``Luggage``, and a ``Bicyclist`` that
is a composition of ``Bicycle`` and the ``Driver``. You may see, that
we apply the sum and product recursively, that's why the ADT types are
also called recursive types. The same way as you can build complex
algebraic expressions using sum and product, we can build complex data
using a combination of sum and product. The whole set of algebraic
data types is a closure of sum and product operations.
We can define such complex concepts as lists, tables, trees and, even,
natural numbers, using only ADT. For example, a list is either Empty,
or it is a Pair of an element and the rest of a List (note that since
the type is recursive, we can use the type in its own definition). For
example, ``[1,2,3]`` can be represented as
``Pair(1,Pair(2,Pair(3,Empty())))``. A Natural number is either Zero
or a Successor of a Natural number, so that we can represent 3 as
``Successor(Successor(Successor(Zero())))``. So, we don't even need
numerals, to represent the list [1,2,3]:
```
Pair(Successor(Zero()),
Pair(Successor(Successor(Zero())),
Pair(Successor(Successor(Successor(Zero()))),
Empty())))
```
You may notice, that these examples are actually syntactically valid
Python code. So we're now close to the point, where we can define,
how we will represent ADT in Python. It is believed, that Python
doesn't support ADT (at least it is not listed in wikipedia as one of
such languages), but as examples above show, this is not true.
We will use inheritance to represent sum types. For example to say, that
Fruit is Banana or Apple, we do the following:
class Fruit(ADT): pass
class Banana(Fruit): pass
class Apple(Fruit): pass
The product types, aka tuples, are already in the language, so we're
done. We will use the following syntax, to say that a Bicycle is a
product of Wheels, Frame and Handlebars:
class Bicycle(ADT) : pass
class Wheels(ADT) : pass
class Frame(ADT) : pass
class Handlebars(ADT) : pass
Bicycle(Wheels(), Frame(), Handlebars())
We're not trying to enforce the type discipline here, by guaranteeing,
that it is only possible to construct a Bicycle only from this three
things. This is Python anyway.
So, it looks like that we didn't introduce anything at all, other than
extra verbose syntax, hidden by some type theoretic mumbo jumbo. Well
yes, but this is only on a surface. The idea behind this library is
that ADT is a great generalization, which we can employ to write code,
that will work for any ADT.
The first generalization, is that we can easily print any ADT in a
unified syntax, and this syntax can be chosen to be a valid subset of
Python syntax. In fact it is also a valid subset of many other
programming languages, such as Ruby, JavaScript, Java, C, OCaml,
Haskell, etc. That also mean, that we can easily parse them back,
especially if the language provides an access to the parser (like
Python). Thus, ADT is a nice data representation format (like json,
xml, S-expressions), that is very suitable for storing hierarchical data.
The second generalization, is that we can employ the same method of
processing ADT. A usual way of processing lists and other iterable
objects, is to apply some operation over every consecutive element of
the list. ADT are more general, than lists (in fact lists a special
case of ADT). ADT are hierarchical, so the elements have also
ancestor/descendant relationships in addition to the
successor/predecessor. Also, every element of an ADT value, is tagged
by a name. And theses names also forms a separate type hierarchy, so
that we have both object and type hierarchies. Given such a general
structure, we need to find a general way of iteration over it. We will
call it visiting. So visiting is a generalization of an iteration,
where the computation is represented by an object called Visitor, that
applies itself to each structural element of the ADT object. The
visitor object has a method for each type of structural component, and
thanks to a unified representation of the ADT type, it knows how to
deconstruct any instance of ADT. So, we generalized a way of
traversing data structure, so that a user of it needs only to specify
the computation, that needs to be applied for each, or some
elements.
We can compare visiting with a regular iteration over some
hierarchical data structures, like compounds of lists and
maps. Suppose, that we're modeling a library, and started with the
following representation:
Library -> Shelf -> Book -> (Author, Title)
And we wrote a function that will count a total number of distinct authors:
def count_authors(library):
authors = set()
for shelf in library:
for book in shelf:
authors.add(book.author)
return len(authors)
The code looks fine, but it has one problem, it hardcodes the
structure of our library. If at some point of time we decide, that we
chose a wrong representation and it is much better to represent it as:
Author -> Title -> Library -> Shelf
Then we need to rewrite our ``count_authors`` function. On the other
hand, with the visitor approach the following code will work with both
representations.
class AuthorCounter(Visitor):
def __init__(self):
self.authors = set()
def visit_Author(self, author):
self.authors.add(author)
def count_authors(library):
counter = AuthorCounter()
counter.run(library)
return len(counter.authors)
This variant is slightly more verbose, but is easier to implement, as
we don't need to know the hierarchical structure of the data, and
anything about the data representation. Moreover, it is easier to
support, as it will not break, when something is added or removed from
the library structure.
The visitor pattern really starts to shine, when the hierarchy is much
more complex, than in the example, that we provided above. For
example, Abstract Syntax Trees (AST) tend to be very complex even for
toy languages, and writing the traversing code for them is very
tedious. Moreover, the code needed to be repeated over and over again,
leading to fragile and hard to support programs.