Concepts

This section describes some of the theory used in arybo/petanque, the choices that have been made and the various limitations that exist.

Representation of boolean expressions and bit-vectors

Boolean expressions are represented in the petanque library using the Algebric Normal Form (ANF). The advantages are that it is complete (that is any boolean expression can be represented using only XOR and AND operations) and it has an algebraic structure (see Application).

The ANF form is a series of AND operations xored together, with a final potential binary XOR with 1. So, the expression tree has an addition as root, and its children are only AND operators, and optionally one 1 in the end. For instance, \((x \land a) \oplus (x \land b) \oplus 1\) is an ANF, whereas \(x \land (a \oplus b) \oplus 1\) is not. On the last one, the AND operation must be expand to get the ANF version. That being said, the petanque library is able to consider any boolean expression with OR, AND, XOR and ESF operators (with any depth), and can compute the ANF form of these expressions.

The petanque library considers expressions modulo 2, and is then using additions and multiplications to respectively represent binary XOR and AND. Here are some examples using the iarybo shell:

$ iarybo 1
In [1]: x^y
Out[1]:
Vec([
(x0 + y0)
])

In [2]: x&y
Out[2]:
Vec([
(x0 * y0)
])

In [3]: x|y
Out[3]:
Vec([
((x0 * y0) + x0 + y0)
])

What is called in the various API “simplify” is the process of expanding any supported boolean expression into an ANF. For instance, one could create this expression:

from arybo.lib import MBA
mba = MBA(1)
x = mba.var('x')
y = mba.var('y')
z = mba.var('z')
# (x XOR y) AND z
e = (x[0]+y[0])*z[0]
>>> print(e)
((x0 + y0) * a0)

Using arybo.lib.MBAVariable.simplify() will transform this expression into its ANF form:

from arybo.lib import simplify
e_s = simplify(e)
>>> print(e_s)
((x0 * a0) + (y0 * a0))

Elementary symmetric functions

Elementary symmetric functions (ESF) are functions whose outputs do not depend on the order of their input boolean variables, which means their output values only depend on the Hamming weight of the input vector. An elementary symmetric function of degree \(d\) with \(k\) input variables is a boolean function defined as:

\[\begin{split}esf_d:~ \mathbb{F}_2^k & \longrightarrow\mathbb{F}_2^k\\ (x_1, \dots, x_k) & \longmapsto \bigoplus_{1\le j_1 < j_2 < \ldots < j_d \le k} x_{j_1} \dotsm x_{j_d}\end{split}\]

Some examples:

\[\begin{split}esf_1(a,b,c) &= a \oplus \oplus b \oplus c\\ esf_2(a,b) &= ab \\ esf_2(a,b,c) &= ab \oplus ac \oplus bc\end{split}\]

They are interesting because they occur “naturally” in many parts of arithmetic operators expressed at their boolean levels (see an example with the addition below).

Moreover, we can prove that:

\[x_1 \lor \dotsb \lor x_n = \bigoplus\limits_{d=1}^{n} esf_d(x_1,\dotsc,x_n).\]

This can be useful to identify OR operations within an ANF expression, as we can see in this example (OR identification is still experimental, and thus needs to be explicitly imported from pytanque):

$ iarybo 1
In [1]: e=a|b|c|d

In [2]: e
Out[2]:
Vec([
((a0 * b0) + (a0 * c0) + (a0 * d0) + (b0 * c0) + (b0 * d0) + (c0 * d0) + (a0 * b0 * c0) + (a0 * b0 * d0) + (a0 * c0 * d0) + (b0 * c0 * d0) + (a0 * b0 * c0 * d0) + a0 + b0 + c0 + d0)
])

In [3]: find_esfs(e[0])
Out[3]: [ESF(3, a0, b0, c0, d0), ESF(2, a0, b0, c0, d0)]

In [4]: e
Out[4]:
Vec([
(ESF(2, a0, b0, c0, d0) + ESF(3, a0, b0, c0, d0) + (a0 * b0 * c0 * d0) + a0 + b0 + c0 + d0)
])

In [5]: import pytanque

In [6]: pytanque.identify_ors_inplace(e[0])
Out[6]: True

In [7]: e
Out[7]:
Vec([
(a0 | b0 | c0 | d0)
])

Boolean expression solver

A naive boolean expression solver has been implemented. It basically takes as input a boolean expression containing a given number of symbolic values, and produces (potentially symbolic) bit-vectors that make the boolean expression true or false (according to the user’s demand).

A usage example is described here.

Integer arithmetic operations

Addition/substraction

Addition is computed symbolically using the algorithm behind a 1-bit logical adder. Basically, for an n-bit addition, \(n-1\) carry bits are computed one after the other, according to the previous results. More formally, \(R = x+y\) is computed like this (with \(R\), \(x\) and \(y\) n-bit variables):

\begin{eqnarray} &R_i = x_i \oplus y_i \oplus c_i\\ &\text{with } \begin{cases} c_0 = 0\\ c_{i+1} = x_i \cdot y_i \oplus c_i \cdot (x_i \oplus y_i)\\ \end{cases} \label{eq:carry} \end{eqnarray}

Using the ESF described above, \(c_i\) can be rewritten as this:

\[c_{i+1} = esf_2(x_i, y_i, c_i)\]

An optimization can be done if \(y\) is a constant known at runtime. It uses the fact that:

\[x+y = (x \oplus y) + ((x \land y) \ll 1)\]

By applying recursively this formula and because \(x+0 = x\), we can write the following recursive algorithm:

def add(x,y):
  if (y == 0): return x
  return add(x^y, (x&y)<<1)

For instance, if

\[y = (0 \dots 0~1)^\intercal\]

then the addition will be reduced to only one XOR in one loop iteration, while the original algorithm would have gone through the computation of every carry bit.

Multiplication

The multiplication is using the fact that:

\[\begin{split}x \times y &= x \times (\sum\limits_{i=0}^n 2^{i}y_{i}) \\ &= \sum\limits_{i=0}^n x\times 2^{i}y_{i} \\ &= \sum\limits_{i=0}^n (x \ll i) \times y_{i}\end{split}\]

An n-bit multiplication is thus performed using \(n\) multiplication.

Division by a known constant

Only a division by a known constant at runtime is supported in Arybo for the moment. The main idea is to transform a division by a \(n\)-bit constant into a multiplication by a \(2n\)-bit constant and a right logical shift.

The details of the complete algorithm are in the Hacker’s Delight book. It also can be found in some optimization libraries, for instance in libdivide.

Application

Applications are functions that take a \(m\)-bit vector as input and produce an n-bit vector. They are represented within petanque in two parts:

  • a non-linear part called NL
  • an affine part composed of a \(m*n\) matrix M and a constant vector V

This construction is possible because of the ANF form.

In petanque, a process called “vectorial decomposition” allows the creation of such application from a bit-vector and list of symbolic inputs to consider. Here is an example that creates the application associated with the operation \(x+1\), for a 4-bit input \(x\):

from mba.lib import MBA
mba = MBA(4)
x = mba.var('x')
r = x+1
F = x.vectorial_decomp([x])
>>> print(F)
App NL = Vec([
0,
0,
(_0 * _1),
(_0 * _1 * _2)
])
AffApp matrix = Mat([
[1, 0, 0, 0]
[1, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
])
AffApp cst = Vec([
1,
0,
0,
0
])

Inverse of an application

Arybo is able to inverse a subset of the invertible applications, without computing the whole truth table and inverting it (which can be really memory and computation intensive for application over 32-bits input for instance).

The two kinds of invertible application Arybo is able to invert are:

  • affine/linear application with an invertible M matrix
  • application with a non-linear part which is a T function. Every arithmetic operation supported by Arybo falls into that category. The main idea is to resolve the non-linear system using a classical substitution technic.

The example above is a good candidate:

from mba.lib import MBA, app_inverse, simplify
mba = MBA(4)
x = mba.var('x')
r = x+1
F = x.vectorial_decomp([x])
Finv = app_inverse(F)
>>> print(simplify(F(Finv(x.vec))))
Vec([
x0,
x1,
x2,
x3
])

A random permutation is a good example of an application Arybo can’t invert (yet). Indeed, chances are very low to fall into one of the two categories mentioned above:

import random
from mba.lib import MBA, app_inverse, simplify
mba = MBA(4)
P = list(range(16))
random.shuffle(P)
E,X = mba.permut2expr(P)
F = E.vectorial_decomp([X])
>>> print(F)
App NL = Vec([
((_0 * _1) + (_0 * _2) + (_1 * _2)),
((_0 * _1) + (_0 * _3) + (_2 * _3) + (_0 * _1 * _2) + (_0 * _1 * _3) + (_0 * _2 * _3)),
((_0 * _1) + (_0 * _2) + (_1 * _2) + (_1 * _3) + (_0 * _2 * _3)),
((_1 * _3) + (_2 * _3) + (_1 * _2 * _3))
])
AffApp matrix = Mat([
[0, 1, 1, 0]
[0, 1, 1, 1]
[1, 1, 0, 1]
[1, 1, 1, 0]
])
AffApp cst = Vec([
0,
0,
1,
1
])
>>> print(app_inverse(F))
None

What could be improved

  • test different ways of storing boolean expressions within petanque. We are currently using a “sorted vector” (that is a vector whose elements are always sorted), which has the advantage of consuming less memory than a tree but has a more important cost when inserting and removing elements (as we need to move the other elements each time).
  • implement a C++ version of the Arybo library (while keeping the Python version for testing/fallback purposes), so that it could be used natively in other libraries, or in other languages through various bindings

Interesting idea/algorithms to implement

  • the algorithm described by Alex Biryukov, Christophe De Cannière, An Braeken and Bart Preneel in this paper that allows to find, for two arbitrary permutations \(S_1\) and \(S_2\), two invertible linear functions \(L_1\) and \(L_2\) such as \(S_2 = L_1 \circ S_1 \circ S_2\).
  • find interesting equalities involving ESF that would make the canonicalisation of some MBA much faster and less memory-consuming (as we would simplify ESFs directly, without expanding them)