Getting Started

# AlgoPy, Algorithmic Differentiation in Python¶

## What is AlgoPy?¶

The purpose of AlgoPy is the evaluation of higher-order derivatives in the forward and reverse mode of Algorithmic Differentiation (AD) of functions that are implemented as Python programs. Particular focus are functions that contain numerical linear algebra functions as they often appear in statistically motivated functions. The intended use of AlgoPy is for easy prototyping at reasonable execution speeds. More precisely, for a typical program a directional derivative takes order 10 times as much time as time as the function evaluation. This is approximately also true for the gradient.

## What can AlgoPy do for you?¶

• evaluation of derivatives useful for nonlinear continuous optimization
• Jacobian
• Hessian
• Jacobian vector product
• vector Jacobian product
• Hessian vector product
• vector Hessian vector product
• higher-order tensors
• Taylor series evaluation
• for modeling higher-order processes
• could be used to compute Taylor series expansions for ODE/DAE integration.

## Getting Started:¶

For the impatient, we show a minimalistic example how AlgoPy can be used to compute derivatives.

import numpy, algopy
from algopy import UTPM, exp

def eval_f(x):
""" some function """
return x[0]*x[1]*x[2] + exp(x[0])*x[1]

# forward mode without building the computational graph
# -----------------------------------------------------
x = UTPM.init_jacobian([3,5,7])
y = eval_f(x)
algopy_jacobian = UTPM.extract_jacobian(y)
print('jacobian = ',algopy_jacobian)

# reverse mode using a computational graph
# ----------------------------------------

# STEP 1: trace the function evaluation
cg = algopy.CGraph()
x = algopy.Function([1,2,3])
y = eval_f(x)
cg.trace_off()
cg.independentFunctionList = [x]
cg.dependentFunctionList = [y]

# STEP 2: use the computational graph to evaluate derivatives
print('Jacobian =', cg.jacobian([3.,5,7]))
print('Hessian =', cg.hessian([3.,5.,7.]))
print('Hessian vector product =', cg.hess_vec([3.,5.,7.],[4,5,6]))


If one executes that code one obtains as output:

$python getting_started.py jacobian = [ 135.42768462 41.08553692 15. ] gradient = [array([ 135.42768462, 41.08553692, 15. ])] Jacobian = [array([[ 135.42768462, 41.08553692, 15. ]])] Hessian = [[[ 100.42768462 27.08553692 5. ] [ 27.08553692 0. 3. ] [ 5. 3. 0. ]]] Hessian vector product = [ 567.13842308 126.34214769 35. ]  ## Help improve AlgoPy¶ If you have any questions, suggestions or bug reports please use the mailing list http://groups.google.com/group/algopy?msg=new&lnk=gcis or alternatively write me an email(sebastian.walter@gmail.com). This will make it much easier for me to provide code/documentation that is easier to understand. Of course, you are also welcome to contribute code, bugfixes, examples, success stories ;), ... ## Issues and common pitfalls¶ Warning Do not use numpy.ndarray with algopy.UTPM or algopy.Function as elements! Example 1: import numpy, algopy x = algopy.Function(1) y = algopy.Function(2) # wrong z = numpy.array([x,y]) # right z = algopy.zeros(2, dtype=x) z[0] = x z[1] = y  Example 2: x = algopy.UTPM(numpy.ones((2,1))) y = algopy.UTPM(2*numpy.ones((2,1))) # wrong z = numpy.array([x,y]) # right z = algopy.zeros(2, dtype=x) z[0] = x z[1] = y  Note If you use NumPy version > 1.5 the reverse mode could be very slow if you use broadcasting. Reason: Numpy version 1.5 and above have a bug in binary inplace operations (imul, iadd, ...) when array elements point to overlapping memory regions, e.g., when strides = (0,8). There is a workaround in AlgoPy for this case, but it is probably rather slow for large matrices since a Python loops needs to access all elements in the matrix. See discussion on http://thread.gmane.org/gmane.comp.python.numeric.general/51379 and the issue on https://github.com/numpy/numpy/issues/2705 This bug is only relevant in the reverse mode and you use broadcasting, e.g., when you run the following example: import numpy, algopy def f(x): N, = x.shape A = algopy.zeros((N,N), dtype=x) A[0,0] = x[2] A[1,2] = x[1] return algopy.sum(A*x) cg = algopy.CGraph() x = algopy.Function(numpy.ones(3)) y = f(x) cg.independentFunctionList = [x] cg.dependentFunctionList = [y] print cg.gradient(numpy.array([1.,2.,3.]))  ## Potential Improvements¶ • better memory management to speed things up • direct support for nested derivatives (requires the algorithms to be generic) • better complex number support, in particular also the reverse mode • support for sparse Jacobian and sparse Hessian computations using graph coloring as explained in http://portal.acm.org/citation.cfm?id=1096222 ## Installation and Upgrade:¶ AlgoPy works on Python 2.6+ and Python 3.0+. Current version is 0.5.1 Official releases: • available at: http://pypi.python.org/pypi/algopy • with pip •$ pip install algopy
• with easy_install
• $easy_install algopy for installation •$ easy_install –upgrade algopy to upgrade to the newest version
• NetBSD package

Bleeding edge:

• includes additional documentation, e.g. talks and additional examples and the sphinx *.rst documents

Dependencies:

ALGOPY Core:
• numpy (known to work with numpy >= 1.6.2)
• scipy (known to work with scipy >= 0.11.0)
ALGOPY Examples:
Run tests:
• Nose
Documentation:
• sphinx
• matplotlib, mayavi2, yapgvb

You can install these dependencies with pip:

pip install nose
pip install numpy
pip install scipy


Or on Debian/Ubuntu:

sudo apt-get install -qq python-sphinx python-nose python$-numpy python-scipy  or: sudo apt-get install -qq python3-sphinx python3-nose python3$-numpy python3-scipy


Note

echo 'yes' | sudo add-apt-repository ppa:pylab/stable
sudo apt-get update


## How does it work?:¶

AlgoPy offers the forward mode and the reverse mode of AD.

The basic idea is the computation on (univariate) Taylor polynomials with with matrix coefficients. More precisely, AlgoPy supports univariate Taylor polynomial (UTP) arithmetic where the coefficients of the polynomial are numpy.ndarrays. To distinguish Taylor polynomials from real vectors resp. matrices they are written with enclosing brackets:

$[x]_D = [x_0, \dots, x_{D-1}] = \sum_{d=0}^{D-1} x_d T^d \;,$

where each $$x_0, x_1, \dots$$ are arrays, e.g. a (5,7) array. This mathematical object is described by numpy.ndarray with shape (D,P, 5,7). The $$T$$ is an indeterminate, i.e. a formal/dummy variable. Roughly speaking, this is the UTP equivalent to the imaginary number $$i$$ in complex arithmetic. The P can be used to compute several Taylor expansions at once. I.e., a vectorization to avoid the recomputation of the same functions with different inputs.

If the input UTPs are correctly initialized one can interpret the coefficients of the resulting polynomial as higher-order derivatives. Have a look at the Taylor series expansion example for a more detailed discussion.

To allow the use of the reverse mode of AD a simple code tracer has been implemented in algopy.tracer. The idea is to record the computation procedure in a datastructure s.t. the control flow sequence can walked in reverse order. There is no complete documentation for the reverse mode yet.

Information on the web:

There has been a winterschool for Algorithmic Differentiation. Some tutorials explain Taylor polynomial arithmetic. http://www.sintef.no/Projectweb/eVITA/Winter-Schools/2010/

Talks:

Simple Examples:

Application Examples:

How is AlgoPy organized:

## Current Issues:¶

• some algorithms require vectors to be columns of a matrix. I.e. if x is a vector it should be initialized as x = UTPM(numpy.random.rand(D,P,N,1) and not as UTPM(numpy.random.rand(D,P,N)) as one would typically do it using numpy.
• there is no vectorized reverse mode yet. That means that one can compute columns of a Jacobian of dimension (M,N) by propagating N directional derivatives at once. In the reverse mode one would like to propagate M adjoint directions at once. However, this is not implemented yet, i.e. one has to repeat the procedure M times.

## Version Changelog¶

• Version 0.2.2
• fixed some broadcasting bugs with UTPM instances
• fixed a bug in algopy.zeros
• Version 0.2.3
• added UTPM.__len__, i.e. len(x) works now for x a UTPM instance
• fixed a bug in algopy.dot(x,y) in the case when x is a numpy.ndarray and y is a UTPM instance
• Version 0.3.0:
• renamed push_forward to pushforward, this is more consistent w.r.t. the pullback
• UTPM.__repr__ now returns a string of the form UTPM(...)
• refactored the tracer: it should now be possible to trace the function evaluation with normal numpy.ndarrays. After that, one can use cg.pushforward with UTPM instances or call cg.gradient, etc.
• UTPM.reshape is now a method, not a class method
• added preliminary complex numbers support for arithmetic with UTPM instances (reverse mode using the tracer is not supported yet)
• UTPM.reshape now can also take integers as input, not only tuples of integers
• added UTPM.tan, UTPM.arcsin, UTPM.arccos, UTPM.arctan, UTPM.sinh, UTPM.cosh, UTPM.tanh
• made init_hessian and extract_hessian generic (to make it useful for complex valued functions)
• added comparison operators <,>,<=,>=,== to UTPM
• Version 0.3.1
• replaced algopy.sum by a faster implementation
• fixed a bug in getitem of the UTPM instance: now works also with numpy.int64 as index
• added dedicated algopy.sum and algopy.prod
• fixed bug in tracing operations involving neg(x)
• changed API of CGraph.hessian, CGraph.jac_vec etc. One has now to write CGraph.jacobian(x) instead of CGraph.jacobian([x]).
• Version 0.3.2
• improved error reporting in the reverse mode: when “something goes wrong” in cg.gradient([1.,2.,3.]) one now gets a much more detailed traceback
• added A.reshape(...) support to the reverse mode
• improved support for broadcasting for UTPM instances
• Version 0.4.0
• added support for a variety of new functions, mostly contributed by Alex Griffing, NCSU: expm, hyp1f1, hyperu, hyp2f0, polygamma, psi, erf, erfi, dawsn, logit, expit
• Version 0.5.0

\$ python -c "import algopy; algopy.test()"