[frames] | no frames]

# Source Code for Module bap.adt

```  1  #!/usr/bin/env python
2  """Algebraic Data Types for Python.
3
4  Algebraic Data Types is not an attempt to add a strict typing
5  discipline to Python, and the word ``type'' here has a much broader
6  meaning. Types represent models of reasoning about objects. This
7  models we, humans, employ everyday (at least those of us, who do the
8  thinking). These are just methods (among others) that we're using to
9  structure our knowledge. For example, we can say, that both
10  ``Bananas`` and ``Apples`` are ``Fruits`` (biologists, please stop
11  reading at this point). With this phrase we constructively defined a
12  new type (concept, idea), that we named the ``Fruit``. To contrast
13  with abstraction, we didn't try to find anything common between these
14  two entities, and to remove the differences, we just stated, that the
15  Fruit is either Banana or Apple. No more, no less. We just used an
16  alteration to define something. Another example of the alteration
17  would be to say, that a human is either a man or woman.
18
19  If we will reason about types, as sets, then the alteration can be
20  viewed as a union. A disjoint union in our case, as we're not loosing
21  any information (we are not abstracting anything out). The union
22  operation is isomorphic to the summation in arithmetic, that's why we
23  call such types - sum types. A dual of the sum is a product. The
24  product models and idea of a composition, i.e., when an entity is
25  composed of other entities. For example, a ``Bicycle`` is a
26  combination of ``Wheels``, ``Frame`` and ``Handlebars``. And a ``Car``
27  is a combination of ``Wheels``, ``Body``, ``Doors``, and
28  ``Engine``. Again, we described concepts constructively, we didn't try
29  to make any abstractions. (In fact, we employed an abstraction, when
30  we made a choice how to represent the compound object, by omitting
31  parts that are not relevant, with respect to our task. But this is a
32  completely different modus of reasoning, that is in fact orthogonal to
34
35  Finally, we can mix both concepts together to model even more complex
36  ideas. For example, we can define that a ``Vehicle`` is either a
37  ``Car`` or ``Bicycle``. Suppose, that we're trying to model road
38  traffic. In that case we can tell that we have two kinds of road
39  users, either a ``Motorist`` that is a combination of a ``Car``,
40  ``Driver``, ``Passengers`` and ``Luggage``, and a ``Bicyclist`` that
41  is a composition of ``Bicycle`` and the ``Driver``. You may see, that
42  we apply the sum and product recursively, that's why the ADT types are
43  also called recursive types. The same way as you can build complex
44  algebraic expressions using sum and product, we can build complex data
45  using a combination of sum and product. The whole set of algebraic
46  data types is a closure of sum and product operations.
47
48  We can define such complex concepts as lists, tables, trees and, even,
49  natural numbers, using only ADT. For example, a list is either Empty,
50  or it is a Pair of an element and the rest of a List (note that since
51  the type is recursive, we can use the type in its own definition). For
52  example, ``[1,2,3]`` can be represented as
53  ``Pair(1,Pair(2,Pair(3,Empty())))``. A Natural number is either Zero
54  or a Successor of a Natural number, so that we can represent 3 as
55  ``Successor(Successor(Successor(Zero())))``. So, we don't even need
56  numerals, to represent the list [1,2,3]:
57
58  ```
59  Pair(Successor(Zero()),
60    Pair(Successor(Successor(Zero())),
61      Pair(Successor(Successor(Successor(Zero()))),
62        Empty())))
63  ```
64
65  You may notice, that these examples are actually syntactically valid
66  Python code.  So we're now close to the point, where we can define,
67  how we will represent ADT in Python. It is believed, that Python
68  doesn't support ADT (at least it is not listed in wikipedia as one of
69  such languages), but as examples above show, this is not true.
70
71  We will use inheritance to represent sum types. For example to say, that
72  Fruit is Banana or Apple, we do the following:
73
75      class Banana(Fruit): pass
76      class Apple(Fruit): pass
77
78
79  The product types, aka tuples, are already in the language, so we're
80  done. We will use the following syntax, to say that a Bicycle is a
81  product of Wheels, Frame and Handlebars:
82
87
88      Bicycle(Wheels(), Frame(), Handlebars())
89
90  We're not trying to enforce the type discipline here, by guaranteeing,
91  that it is only possible to construct a Bicycle only from this three
92  things. This is Python anyway.
93
94  So, it looks like that we didn't introduce anything at all, other than
95  extra verbose syntax, hidden by some type theoretic mumbo jumbo. Well
96  yes, but this is only on a surface. The idea behind this library is
97  that ADT is a great generalization, which we can employ to write code,
98  that will work for any ADT.
99
100  The first generalization, is that we can easily print any ADT in a
101  unified syntax, and this syntax can be chosen to be a valid subset of
102  Python syntax. In fact it is also a valid subset of many other
103  programming languages, such as Ruby, JavaScript, Java, C, OCaml,
104  Haskell, etc. That also mean, that we can easily parse them back,
105  especially if the language provides an access to the parser (like
106  Python). Thus, ADT is a nice data representation format (like json,
107  xml, S-expressions), that is very suitable for storing hierarchical data.
108
109  The second generalization, is that we can employ the same method of
110  processing ADT. A usual way of processing lists and other iterable
111  objects, is to apply some operation over every consecutive element of
112  the list. ADT are more general, than lists (in fact lists a special
113  case of ADT). ADT are hierarchical, so the elements have also
114  ancestor/descendant relationships in addition to the
115  successor/predecessor. Also, every element of an ADT value, is tagged
116  by a name. And theses names also forms a separate type hierarchy, so
117  that we have both object and type hierarchies. Given such a general
118  structure, we need to find a general way of iteration over it. We will
119  call it visiting. So visiting is a generalization of an iteration,
120  where the computation is represented by an object called Visitor, that
121  applies itself to each structural element of the ADT object. The
122  visitor object has a method for each type of structural component, and
123  thanks to a unified representation of the ADT type, it knows how to
124  deconstruct any instance of ADT. So, we generalized a way of
125  traversing data structure, so that a user of it needs only to specify
126  the computation, that needs to be applied for each, or some
127  elements.
128
129  We can compare visiting with a regular iteration over some
130  hierarchical data structures, like compounds of lists and
131  maps. Suppose, that we're modeling a library, and started with the
132  following representation:
133
134
135       Library -> Shelf -> Book -> (Author, Title)
136
137  And we wrote a function that will count a total number of distinct authors:
138
139       def count_authors(library):
140           authors = set()
141           for shelf in library:
142               for book in shelf:
144           return len(authors)
145
146  The code looks fine, but it has one problem, it hardcodes the
147  structure of our library. If at some point of time we decide, that we
148  chose a wrong representation and it is much better to represent it as:
149
150      Author -> Title -> Library -> Shelf
151
152  Then we need to rewrite our ``count_authors`` function. On the other
153  hand, with the visitor approach the following code will work with both
154  representations.
155
156
157      class AuthorCounter(Visitor):
158          def __init__(self):
159              self.authors = set()
160          def visit_Author(self, author):
162
163       def count_authors(library):
164          counter = AuthorCounter()
165          counter.run(library)
166          return len(counter.authors)
167
168
169  This variant is slightly more verbose, but is easier to implement, as
170  we don't need to know the hierarchical structure of the data, and
171  anything about the data representation. Moreover, it is easier to
172  support, as it will not break, when something is added or removed from
173  the library structure.
174
175  The visitor pattern really starts to shine, when the hierarchy is much
176  more complex, than in the example, that we provided above. For
177  example, Abstract Syntax Trees (AST) tend to be very complex even for
178  toy languages, and writing the traversing code for them is very
179  tedious. Moreover, the code needed to be repeated over and over again,
180  leading to fragile and hard to support programs.
181
182
183  """
184
185  from collections import Iterable,Sequence,Mapping
186
188      """Algebraic Data Type.
189
190      This is a base class for all ADTs. ADT represented by a tuple of
191      arguments, stored in a `arg` field. Arguments should be instances
192      of ADT class, numbers, strings or lists. Empty set of arguments is
193      permitted.  A one-tuple is automatically untupled, i.e., `Int(12)`
194      has value `12`, not `(12,)`.  A name of the constructor is stored
195      in the `constr` field
196
197      A structural comparison is provided.
198
199      """
200 -    def __init__(self, *args):
201          self.constr = self.__class__.__name__
202          self.arg = args if len(args) != 1 else args[0]
203
204 -    def __cmp__(self,other):
205          return self.__dict__.__cmp__(other.__dict__)
206
207 -    def __repr__(self):
208          def qstr(x):
209              if isinstance(x, (int)):
210                  return '0x{0:x}'.format(x)
212                  return str(x)
213              elif isinstance(x, tuple):
214                  return "(" + ", ".join(qstr(i) for i in x) + ")"
215              else:
216                  return '"{0}"'.format(x)
217          def args():
218              if isinstance(self.arg, tuple):
219                  return ", ".join(qstr(x) for x in self.arg)
220              else:
221                  return qstr(self.arg)
222
223          return "{0}({1})".format(self.constr, args())
224
225
226 -class Visitor(object):
228      This class helps to perform iterations over arbitrary ADTs.
229
230
231      When visitor runs, it will visit each constituent of an ADT.
232      When an ADT instance is visited, the visitor will first look
233      for method named `enter_C` for each class `C` in the MRO of
234      the ADT instance. All found methods will be invoked.
235
236      Then it will look for a method called `enter_C` for each class `C`
237      in the MRO sequence of the ADT class. If one is found,
238      then it will be called, other classes in the MRO sequence will not
239      be considered.
240
241      Finally, the visitor will look for a method called `leave_C` using
242      the same algorithm as described for the `enter_C` method.
243
244      The algorithm, described above, actually implements the
245      depth-first traversal. Methods starting with the prefix `enter`
246      are called right before the corresponding subtree is visited
247      (preorder).  Methods starting with the `leave` are called just
248      after the subtree is visited. Methods starting with `visit`
249      actually perform the visiting. If it is not overridden, then
250      `visit_ADT` method is invoked, that will continue traversal to the
251      subtree. If `visit_C` method is overridden (where `C` is name of
252      class in the MRO of the ADT instance), then it is responsibility
253      of the `visit_C` method to call `run` method to continue
254      traversal. If `run` is not called, then the traversal will not
255      continue. It is possible to change the order of traversal, by
256      overriding `visit` methods. Usually, it is better to keep away
257      from the `visit` methods, and use `enter` (the preorder traversal)
258      if possible. However, if it is needed to inject some code between
259      the traversal of two subtrees of a tree, or if an order should be
260      changed, then the visit method is a way to go.
261
262      By default, every element of an ADT is traversed. It is possible
263      to terminate the traversal abnormally (to short-circuit) by
264      returning not-a-None value from any of the methods. The returned
265      value will be a result of the `run` method.
266
267
268      Example
269      -------
270
271      Suppose we have a small expression language with defined as
272      follows:
273
274      >>> class Exp(ADT)   : pass
275      >>> class Binop(Exp) : pass
276      >>> class Unop(Exp)  : pass
277      >>> class Value(Exp) : pass
278      >>> class Add(Binop) : pass
279      >>> class Mul(Binop) : pass
280      >>> class Neg(Unop)  : pass
281      >>> class Var(Value) : pass
282      >>> class Int(Value) : pass
283
284
285      We will write an abstract interpreter that will calculate a sign
286      of expression. In our abstraction, we now a sign of constants,
287      signs of variables are unknown. The negation operation negates the
288      sign of expression, and any binary operation preserves the sign,
289      if both operands have the same sign, otherwise the sign is
290      undefined. We will use the following lattice to represent our
291      abstraction:
292
293
294                            True  False
295                              |     |
296                              +--+--+
297                                 |
298                                None
299
300      The same expressed in Python:
301
302
303      >>> class Sign(Visitor) :
304              def __init__(self):
305                  self.neg = None
306
307              def visit_Binop(self,exp):
308                  self.run(exp.arg[0])
309                  lhs = self.neg
310                  self.run(exp.arg[1])
311                  rhs = self.neg
312                  if lhs != rhs:
313                      self.neg = None
314
315              def leave_Neg(self,exp):
316                  if self.neg is not None:
317                      self.neg = not self.neg
318
319              def enter_Var(self,var):
320                  self.neg = None
321
322              def enter_Int(self,n):
323                  self.neg = n < Int(0)
324
325      We overrode method ``visit_Binop`` that will be invoked for both,
326      addition and subtraction, since in our abstraction they behave the
327      same. We chose to override the ``visit`` stage instead of the
328      ``enter`` or leave, because we wanted to inject our code between
329      visiting left and right branch of the expression. We overrode
330      `leave_Neg` to switch the sign _after_ the enclosed expression is
331      visited. Since variable can have arbitrary sign, we're must stop
332      the sign analysis as soon as we have a variable. Finally, for constants
333      we just look at their sign.
334
335
336      To test our sign analysis let's write a simple expression,
337
338      >>> exp = Add((Neg(Neg(Int(1)))), Mul(Int(2), Neg(Neg(Int(3)))))
339
340      It is easy to see that it is positive (in fact it is not).  In the
341      infix notation, the expression corresponds to
342
343
344      >>> -(-1) + 2 * -(-3)
345      7
346
347      So, let's run the analysis:
348
349      >>> exp = Add((Neg(Neg(Int(1)))), Mul(Int(2), Neg(Neg(Int(3)))))
350      >>> ai = Sign()
351      >>> ai.run(exp)
352      >>> print("exp {0} is {1}".format(exp,
353                                        "negative" if ai.neg else
354                                        "unknown"  if ai.neg is None else
355                                        "positive"))
356
357      For an ADT of type C the method `visit_C` is looked up in the
358      visitors methods dictionary. If it doesn't exist, then `visit_B` is
359      looked up, where `B` is the base class of `C`. The process continues,
360      until the method is found. This is guaranteed to terminate,
361      since visit_ADT method is defined.
362
363      Note: Non ADTs will be silently ignored.
364
365      Once the method is found it is called. It is the method's responsiblity
366      to recurse into sub-elements, e.g., call run method.
367
368      For example, suppose that we want to count negative values in
369      some BIL expression:
370
371      class CountNegatives(Visitor):
372          def __init__(self):
373              self.neg = False
374              self.count = 0
375
376          def visit_Int(self, int):
377              if int.arg < 0 and not self.neg \
378                or int.arg > 0 and self.neg:
379                  self.count += 1
380
381          def visit_NEG(self, op):
382              was = self.neg
383              self.neg = not was
384              self.run(op.arg)
385              self.neg = was
386
387      We need to keep track on the unary negation operator, and, of
388      course, we need to look for immediates, so we override two methods:
389      visit_Int for Int constructor and visit_NEG for counting unary minuses.
390      (Actually we should count for bitwise NOT operation also, since it will
391      change the sign bit also, but lets forget about it for the matter of the
392      exercise (and it can be easily fixed just by matching visit_UnOp)).
393
394      When we hit visit_NEG we toggle current sign, storing its previous value
395      and recurse into the operand. After we return from the recursion, we restore
396      the sign.
397
398      """
399
401          """Default visitor.
402
403          This method will be called for those data types that has
404          no specific visitors. It will recursively descent into all
406          """
411
412 -    def __induct(self, xs):
413          return next((r for r in (self.run(x) for x in xs) if r), None)
414
416          """Deconstructs sequences"""
418
420          """Deconstructs maps"""
422
423
426
427          """
429
430              for meth in ("enter", "visit", "leave"):
432                      name = "{0}_{1}".format(meth, cls.__name__)
433                      fn = getattr(self, name, None)
434                      if fn is not None:
436                          if r is not None:
437                              return r
438                          if meth == "visit":
439                              break
440
442 -    def __init__(self, *args) :
443          super(Seq,self).__init__(args)
444          self.elements = args[0]
445
446 -    def __getitem__(self,i) :
447          return self.elements.__getitem__(i)
448
449 -    def __len__(self) :
450          return self.elements.__len__()
451
452 -    def find(self,key, d=None) :
453          """find(key[, d=None]) -> t
454
455          Looks up for a term that matches with a given key.
456
457          If the key is a string, starting with `@' or `%', then a term
458          with the given identifier name is returned.  Otherwise a term
459          with a matching `name' attribute is returned (useful to find
460          subroutines).
461
462          If a key is an instance of Tid class, then a term with
463          corresponding tid is returned.
464
465          If a key is a number, or an instance of `bil.Int' class, then
466          a term with a matching address is returned.
467
468          Example
469          -------
470
471          In the following example, all searches return the
472          same object
473
474
475          >>> main = proj.program.subs.find('main')
476          >>> main = proj.program.subs.find(main.id)
477          >>> main = proj.program.subs.find(main.id.name)
478          """
479          def by_id(t,key) : return t.id == key
480          def by_name(t,key) :
481              if key.startswith(('@','%')):
482                  return t.id.name == key
483              else:
484                  return hasattr(t,'name') and t.name == key
487              if value is not None:
489
491          if isinstance(key,str):
492              test = by_name
493          elif isinstance(key,Tid):
494              test = by_id
495          elif isinstance(key,Int):
496              key = key.value
498
499          for t in self :
500              if test(t,key) : return t
501          return d
502
503
505 -    def __init__(self, *args) :
506          super(Map,self).__init__(args)
507          self.elements = dict((x.arg[0],x.arg[1]) for x in args[0])
508
509 -    def __getitem__(self,i) :
510          return self.elements.__getitem__(i)
511
512 -    def __len__(self) :
513          return self.elements.__len__()
514
515 -    def __iter__(self) :
516          return self.elements.__iter__()
517
518
520
523              visitor.run(x)
524      else:
526      return visitor
527
528
529
530
531  if __name__ == "__main__":
532 -    class Fruit(ADT) : pass
533 -    class Bannana(Fruit) : pass
534 -    class Apple(Fruit) : pass
535
536      assert(Bannana() == Bannana())
537      assert(Bannana() != Apple())
538      assert(  Apple() <  Bannana())
539
<!--
expandto(location.href);
// -->

```

 Generated by Epydoc 3.0.1 on Thu Sep 22 18:58:40 2016 http://epydoc.sourceforge.net