1
2 """Algebraic Data Types for Python.
3
4 Algebraic Data Types is not an attempt to add a strict typing
5 discipline to Python, and the word ``type'' here has a much broader
6 meaning. Types represent models of reasoning about objects. This
7 models we, humans, employ everyday (at least those of us, who do the
8 thinking). These are just methods (among others) that we're using to
9 structure our knowledge. For example, we can say, that both
10 ``Bananas`` and ``Apples`` are ``Fruits`` (biologists, please stop
11 reading at this point). With this phrase we constructively defined a
12 new type (concept, idea), that we named the ``Fruit``. To contrast
13 with abstraction, we didn't try to find anything common between these
14 two entities, and to remove the differences, we just stated, that the
15 Fruit is either Banana or Apple. No more, no less. We just used an
16 alteration to define something. Another example of the alteration
17 would be to say, that a human is either a man or woman.
18
19 If we will reason about types, as sets, then the alteration can be
20 viewed as a union. A disjoint union in our case, as we're not loosing
21 any information (we are not abstracting anything out). The union
22 operation is isomorphic to the summation in arithmetic, that's why we
23 call such types - sum types. A dual of the sum is a product. The
24 product models and idea of a composition, i.e., when an entity is
25 composed of other entities. For example, a ``Bicycle`` is a
26 combination of ``Wheels``, ``Frame`` and ``Handlebars``. And a ``Car``
27 is a combination of ``Wheels``, ``Body``, ``Doors``, and
28 ``Engine``. Again, we described concepts constructively, we didn't try
29 to make any abstractions. (In fact, we employed an abstraction, when
30 we made a choice how to represent the compound object, by omitting
31 parts that are not relevant, with respect to our task. But this is a
32 completely different modus of reasoning, that is in fact orthogonal to
33 ADT).
34
35 Finally, we can mix both concepts together to model even more complex
36 ideas. For example, we can define that a ``Vehicle`` is either a
37 ``Car`` or ``Bicycle``. Suppose, that we're trying to model road
38 traffic. In that case we can tell that we have two kinds of road
39 users, either a ``Motorist`` that is a combination of a ``Car``,
40 ``Driver``, ``Passengers`` and ``Luggage``, and a ``Bicyclist`` that
41 is a composition of ``Bicycle`` and the ``Driver``. You may see, that
42 we apply the sum and product recursively, that's why the ADT types are
43 also called recursive types. The same way as you can build complex
44 algebraic expressions using sum and product, we can build complex data
45 using a combination of sum and product. The whole set of algebraic
46 data types is a closure of sum and product operations.
47
48 We can define such complex concepts as lists, tables, trees and, even,
49 natural numbers, using only ADT. For example, a list is either Empty,
50 or it is a Pair of an element and the rest of a List (note that since
51 the type is recursive, we can use the type in its own definition). For
52 example, ``[1,2,3]`` can be represented as
53 ``Pair(1,Pair(2,Pair(3,Empty())))``. A Natural number is either Zero
54 or a Successor of a Natural number, so that we can represent 3 as
55 ``Successor(Successor(Successor(Zero())))``. So, we don't even need
56 numerals, to represent the list [1,2,3]:
57
58 ```
59 Pair(Successor(Zero()),
60 Pair(Successor(Successor(Zero())),
61 Pair(Successor(Successor(Successor(Zero()))),
62 Empty())))
63 ```
64
65 You may notice, that these examples are actually syntactically valid
66 Python code. So we're now close to the point, where we can define,
67 how we will represent ADT in Python. It is believed, that Python
68 doesn't support ADT (at least it is not listed in wikipedia as one of
69 such languages), but as examples above show, this is not true.
70
71 We will use inheritance to represent sum types. For example to say, that
72 Fruit is Banana or Apple, we do the following:
73
74 class Fruit(ADT): pass
75 class Banana(Fruit): pass
76 class Apple(Fruit): pass
77
78
79 The product types, aka tuples, are already in the language, so we're
80 done. We will use the following syntax, to say that a Bicycle is a
81 product of Wheels, Frame and Handlebars:
82
83 class Bicycle(ADT) : pass
84 class Wheels(ADT) : pass
85 class Frame(ADT) : pass
86 class Handlebars(ADT) : pass
87
88 Bicycle(Wheels(), Frame(), Handlebars())
89
90 We're not trying to enforce the type discipline here, by guaranteeing,
91 that it is only possible to construct a Bicycle only from this three
92 things. This is Python anyway.
93
94 So, it looks like that we didn't introduce anything at all, other than
95 extra verbose syntax, hidden by some type theoretic mumbo jumbo. Well
96 yes, but this is only on a surface. The idea behind this library is
97 that ADT is a great generalization, which we can employ to write code,
98 that will work for any ADT.
99
100 The first generalization, is that we can easily print any ADT in a
101 unified syntax, and this syntax can be chosen to be a valid subset of
102 Python syntax. In fact it is also a valid subset of many other
103 programming languages, such as Ruby, JavaScript, Java, C, OCaml,
104 Haskell, etc. That also mean, that we can easily parse them back,
105 especially if the language provides an access to the parser (like
106 Python). Thus, ADT is a nice data representation format (like json,
107 xml, S-expressions), that is very suitable for storing hierarchical data.
108
109 The second generalization, is that we can employ the same method of
110 processing ADT. A usual way of processing lists and other iterable
111 objects, is to apply some operation over every consecutive element of
112 the list. ADT are more general, than lists (in fact lists a special
113 case of ADT). ADT are hierarchical, so the elements have also
114 ancestor/descendant relationships in addition to the
115 successor/predecessor. Also, every element of an ADT value, is tagged
116 by a name. And theses names also forms a separate type hierarchy, so
117 that we have both object and type hierarchies. Given such a general
118 structure, we need to find a general way of iteration over it. We will
119 call it visiting. So visiting is a generalization of an iteration,
120 where the computation is represented by an object called Visitor, that
121 applies itself to each structural element of the ADT object. The
122 visitor object has a method for each type of structural component, and
123 thanks to a unified representation of the ADT type, it knows how to
124 deconstruct any instance of ADT. So, we generalized a way of
125 traversing data structure, so that a user of it needs only to specify
126 the computation, that needs to be applied for each, or some
127 elements.
128
129 We can compare visiting with a regular iteration over some
130 hierarchical data structures, like compounds of lists and
131 maps. Suppose, that we're modeling a library, and started with the
132 following representation:
133
134
135 Library -> Shelf -> Book -> (Author, Title)
136
137 And we wrote a function that will count a total number of distinct authors:
138
139 def count_authors(library):
140 authors = set()
141 for shelf in library:
142 for book in shelf:
143 authors.add(book.author)
144 return len(authors)
145
146 The code looks fine, but it has one problem, it hardcodes the
147 structure of our library. If at some point of time we decide, that we
148 chose a wrong representation and it is much better to represent it as:
149
150 Author -> Title -> Library -> Shelf
151
152 Then we need to rewrite our ``count_authors`` function. On the other
153 hand, with the visitor approach the following code will work with both
154 representations.
155
156
157 class AuthorCounter(Visitor):
158 def __init__(self):
159 self.authors = set()
160 def visit_Author(self, author):
161 self.authors.add(author)
162
163 def count_authors(library):
164 counter = AuthorCounter()
165 counter.run(library)
166 return len(counter.authors)
167
168
169 This variant is slightly more verbose, but is easier to implement, as
170 we don't need to know the hierarchical structure of the data, and
171 anything about the data representation. Moreover, it is easier to
172 support, as it will not break, when something is added or removed from
173 the library structure.
174
175 The visitor pattern really starts to shine, when the hierarchy is much
176 more complex, than in the example, that we provided above. For
177 example, Abstract Syntax Trees (AST) tend to be very complex even for
178 toy languages, and writing the traversing code for them is very
179 tedious. Moreover, the code needed to be repeated over and over again,
180 leading to fragile and hard to support programs.
181
182
183 """
184
185 from collections import Iterable,Sequence,Mapping
186
188 """Algebraic Data Type.
189
190 This is a base class for all ADTs. ADT represented by a tuple of
191 arguments, stored in a `arg` field. Arguments should be instances
192 of ADT class, numbers, strings or lists. Empty set of arguments is
193 permitted. A one-tuple is automatically untupled, i.e., `Int(12)`
194 has value `12`, not `(12,)`. A name of the constructor is stored
195 in the `constr` field
196
197 A structural comparison is provided.
198
199 """
201 self.constr = self.__class__.__name__
202 self.arg = args if len(args) != 1 else args[0]
203
205 return self.__dict__.__cmp__(other.__dict__)
206
208 def qstr(x):
209 if isinstance(x, (int)):
210 return '0x{0:x}'.format(x)
211 elif isinstance(x, ADT):
212 return str(x)
213 elif isinstance(x, tuple):
214 return "(" + ", ".join(qstr(i) for i in x) + ")"
215 else:
216 return '"{0}"'.format(x)
217 def args():
218 if isinstance(self.arg, tuple):
219 return ", ".join(qstr(x) for x in self.arg)
220 else:
221 return qstr(self.arg)
222
223 return "{0}({1})".format(self.constr, args())
224
225
227 """ADT Visitor.
228 This class helps to perform iterations over arbitrary ADTs.
229
230
231 When visitor runs, it will visit each constituent of an ADT.
232 When an ADT instance is visited, the visitor will first look
233 for method named `enter_C` for each class `C` in the MRO of
234 the ADT instance. All found methods will be invoked.
235
236 Then it will look for a method called `enter_C` for each class `C`
237 in the MRO sequence of the ADT class. If one is found,
238 then it will be called, other classes in the MRO sequence will not
239 be considered.
240
241 Finally, the visitor will look for a method called `leave_C` using
242 the same algorithm as described for the `enter_C` method.
243
244 The algorithm, described above, actually implements the
245 depth-first traversal. Methods starting with the prefix `enter`
246 are called right before the corresponding subtree is visited
247 (preorder). Methods starting with the `leave` are called just
248 after the subtree is visited. Methods starting with `visit`
249 actually perform the visiting. If it is not overridden, then
250 `visit_ADT` method is invoked, that will continue traversal to the
251 subtree. If `visit_C` method is overridden (where `C` is name of
252 class in the MRO of the ADT instance), then it is responsibility
253 of the `visit_C` method to call `run` method to continue
254 traversal. If `run` is not called, then the traversal will not
255 continue. It is possible to change the order of traversal, by
256 overriding `visit` methods. Usually, it is better to keep away
257 from the `visit` methods, and use `enter` (the preorder traversal)
258 if possible. However, if it is needed to inject some code between
259 the traversal of two subtrees of a tree, or if an order should be
260 changed, then the visit method is a way to go.
261
262 By default, every element of an ADT is traversed. It is possible
263 to terminate the traversal abnormally (to short-circuit) by
264 returning not-a-None value from any of the methods. The returned
265 value will be a result of the `run` method.
266
267
268 Example
269 -------
270
271 Suppose we have a small expression language with defined as
272 follows:
273
274 >>> class Exp(ADT) : pass
275 >>> class Binop(Exp) : pass
276 >>> class Unop(Exp) : pass
277 >>> class Value(Exp) : pass
278 >>> class Add(Binop) : pass
279 >>> class Mul(Binop) : pass
280 >>> class Neg(Unop) : pass
281 >>> class Var(Value) : pass
282 >>> class Int(Value) : pass
283
284
285 We will write an abstract interpreter that will calculate a sign
286 of expression. In our abstraction, we now a sign of constants,
287 signs of variables are unknown. The negation operation negates the
288 sign of expression, and any binary operation preserves the sign,
289 if both operands have the same sign, otherwise the sign is
290 undefined. We will use the following lattice to represent our
291 abstraction:
292
293
294 True False
295 | |
296 +--+--+
297 |
298 None
299
300 The same expressed in Python:
301
302
303 >>> class Sign(Visitor) :
304 def __init__(self):
305 self.neg = None
306
307 def visit_Binop(self,exp):
308 self.run(exp.arg[0])
309 lhs = self.neg
310 self.run(exp.arg[1])
311 rhs = self.neg
312 if lhs != rhs:
313 self.neg = None
314
315 def leave_Neg(self,exp):
316 if self.neg is not None:
317 self.neg = not self.neg
318
319 def enter_Var(self,var):
320 self.neg = None
321
322 def enter_Int(self,n):
323 self.neg = n < Int(0)
324
325 We overrode method ``visit_Binop`` that will be invoked for both,
326 addition and subtraction, since in our abstraction they behave the
327 same. We chose to override the ``visit`` stage instead of the
328 ``enter`` or leave, because we wanted to inject our code between
329 visiting left and right branch of the expression. We overrode
330 `leave_Neg` to switch the sign _after_ the enclosed expression is
331 visited. Since variable can have arbitrary sign, we're must stop
332 the sign analysis as soon as we have a variable. Finally, for constants
333 we just look at their sign.
334
335
336 To test our sign analysis let's write a simple expression,
337
338 >>> exp = Add((Neg(Neg(Int(1)))), Mul(Int(2), Neg(Neg(Int(3)))))
339
340 It is easy to see that it is positive (in fact it is not). In the
341 infix notation, the expression corresponds to
342
343
344 >>> -(-1) + 2 * -(-3)
345 7
346
347 So, let's run the analysis:
348
349 >>> exp = Add((Neg(Neg(Int(1)))), Mul(Int(2), Neg(Neg(Int(3)))))
350 >>> ai = Sign()
351 >>> ai.run(exp)
352 >>> print("exp {0} is {1}".format(exp,
353 "negative" if ai.neg else
354 "unknown" if ai.neg is None else
355 "positive"))
356
357 For an ADT of type C the method `visit_C` is looked up in the
358 visitors methods dictionary. If it doesn't exist, then `visit_B` is
359 looked up, where `B` is the base class of `C`. The process continues,
360 until the method is found. This is guaranteed to terminate,
361 since visit_ADT method is defined.
362
363 Note: Non ADTs will be silently ignored.
364
365 Once the method is found it is called. It is the method's responsiblity
366 to recurse into sub-elements, e.g., call run method.
367
368 For example, suppose that we want to count negative values in
369 some BIL expression:
370
371 class CountNegatives(Visitor):
372 def __init__(self):
373 self.neg = False
374 self.count = 0
375
376 def visit_Int(self, int):
377 if int.arg < 0 and not self.neg \
378 or int.arg > 0 and self.neg:
379 self.count += 1
380
381 def visit_NEG(self, op):
382 was = self.neg
383 self.neg = not was
384 self.run(op.arg)
385 self.neg = was
386
387 We need to keep track on the unary negation operator, and, of
388 course, we need to look for immediates, so we override two methods:
389 visit_Int for Int constructor and visit_NEG for counting unary minuses.
390 (Actually we should count for bitwise NOT operation also, since it will
391 change the sign bit also, but lets forget about it for the matter of the
392 exercise (and it can be easily fixed just by matching visit_UnOp)).
393
394 When we hit visit_NEG we toggle current sign, storing its previous value
395 and recurse into the operand. After we return from the recursion, we restore
396 the sign.
397
398 """
399
401 """Default visitor.
402
403 This method will be called for those data types that has
404 no specific visitors. It will recursively descent into all
405 ADT values.
406 """
407 if isinstance(adt.arg, tuple):
408 return self.__induct(adt.arg)
409 elif isinstance(adt.arg, ADT):
410 return self.run(adt.arg)
411
413 return next((r for r in (self.run(x) for x in xs) if r), None)
414
416 """Deconstructs sequences"""
417 return self.__induct(adt.arg[0])
418
420 """Deconstructs maps"""
421 return self.__induct(adt.arg[0])
422
423
424 - def run(self, adt):
425 """visitor.run(adt) -> result
426
427 """
428 if isinstance(adt, ADT):
429
430 for meth in ("enter", "visit", "leave"):
431 for cls in adt.__class__.mro():
432 name = "{0}_{1}".format(meth, cls.__name__)
433 fn = getattr(self, name, None)
434 if fn is not None:
435 r = fn(adt)
436 if r is not None:
437 return r
438 if meth == "visit":
439 break
440
441 -class Seq(ADT,Sequence) :
445
448
451
452 - def find(self,key, d=None) :
453 """find(key[, d=None]) -> t
454
455 Looks up for a term that matches with a given key.
456
457 If the key is a string, starting with `@' or `%', then a term
458 with the given identifier name is returned. Otherwise a term
459 with a matching `name' attribute is returned (useful to find
460 subroutines).
461
462 If a key is an instance of Tid class, then a term with
463 corresponding tid is returned.
464
465 If a key is a number, or an instance of `bil.Int' class, then
466 a term with a matching address is returned.
467
468 Example
469 -------
470
471 In the following example, all searches return the
472 same object
473
474
475 >>> main = proj.program.subs.find('main')
476 >>> main = proj.program.subs.find(main.id)
477 >>> main = proj.program.subs.find(main.id.name)
478 """
479 def by_id(t,key) : return t.id == key
480 def by_name(t,key) :
481 if key.startswith(('@','%')):
482 return t.id.name == key
483 else:
484 return hasattr(t,'name') and t.name == key
485 def by_addr(t,key) :
486 value = t.attrs.get('address', None)
487 if value is not None:
488 return parse_addr(value) == key
489
490 test = by_addr
491 if isinstance(key,str):
492 test = by_name
493 elif isinstance(key,Tid):
494 test = by_id
495 elif isinstance(key,Int):
496 key = key.value
497 test = by_addr
498
499 for t in self :
500 if test(t,key) : return t
501 return d
502
503
504 -class Map(ADT,Mapping) :
506 super(Map,self).__init__(args)
507 self.elements = dict((x.arg[0],x.arg[1]) for x in args[0])
508
511
514
517
518
520
521 if isinstance(adt, Iterable):
522 for x in adt:
523 visitor.run(x)
524 else:
525 visitor.run(adt)
526 return visitor
527
528
529
530
531 if __name__ == "__main__":
534 - class Apple(Fruit) : pass
535
536 assert(Bannana() == Bannana())
537 assert(Bannana() != Apple())
538 assert( Apple() < Bannana())
539