The purpose of AlgoPy is the evaluation of higher-order derivatives in the forward and reverse mode of Algorithmic Differentiation (AD) of functions that are implemented as Python programs. Particular focus are functions that contain numerical linear algebra functions as they often appear in statistically motivated functions. The intended use of AlgoPy is for easy prototyping at reasonable execution speeds. More precisely, for a typical program a directional derivative takes order 10 times as much time as time as the function evaluation. This is approximately also true for the gradient.
- evaluation of derivatives useful for nonlinear continuous optimization
- gradient
- Jacobian
- Hessian
- Jacobian vector product
- vector Jacobian product
- Hessian vector product
- vector Hessian vector product
- higher-order tensors
- Taylor series evaluation
- for modeling higher-order processes
- could be used to compute Taylor series expansions for ODE/DAE integration.
For the impatient, we show a minimalistic example how AlgoPy can be used to compute derivatives.
import numpy, algopy
from algopy import UTPM, exp
def eval_f(x):
""" some function """
return x[0]*x[1]*x[2] + exp(x[0])*x[1]
# forward mode without building the computational graph
# -----------------------------------------------------
x = UTPM.init_jacobian([3,5,7])
y = eval_f(x)
algopy_jacobian = UTPM.extract_jacobian(y)
print('jacobian = ',algopy_jacobian)
# reverse mode using a computational graph
# ----------------------------------------
# STEP 1: trace the function evaluation
cg = algopy.CGraph()
x = algopy.Function([1,2,3])
y = eval_f(x)
cg.trace_off()
cg.independentFunctionList = [x]
cg.dependentFunctionList = [y]
# STEP 2: use the computational graph to evaluate derivatives
print('gradient =', cg.gradient([3.,5,7]))
print('Jacobian =', cg.jacobian([3.,5,7]))
print('Hessian =', cg.hessian([3.,5.,7.]))
print('Hessian vector product =', cg.hess_vec([3.,5.,7.],[4,5,6]))
If one executes that code one obtains as output:
$ python getting_started.py
jacobian = [ 135.42768462 41.08553692 15. ]
gradient = [array([ 135.42768462, 41.08553692, 15. ])]
Jacobian = [array([[ 135.42768462, 41.08553692, 15. ]])]
Hessian = [[[ 100.42768462 27.08553692 5. ]
[ 27.08553692 0. 3. ]
[ 5. 3. 0. ]]]
Hessian vector product = [ 567.13842308 126.34214769 35. ]
If you have any questions, suggestions or bug reports please use the mailing list http://groups.google.com/group/algopy?msg=new&lnk=gcis or alternatively write me an email(sebastian.walter@gmail.com). This will make it much easier for me to provide code/documentation that is easier to understand. Of course, you are also welcome to contribute code, bugfixes, examples, success stories ;), ...
Warning
Do not use numpy.ndarray with algopy.UTPM or algopy.Function as elements!
Example 1:
import numpy, algopy
x = algopy.Function(1)
y = algopy.Function(2)
# wrong
z = numpy.array([x,y])
# right
z = algopy.zeros(2, dtype=x)
z[0] = x
z[1] = y
Example 2:
x = algopy.UTPM(numpy.ones((2,1)))
y = algopy.UTPM(2*numpy.ones((2,1)))
# wrong
z = numpy.array([x,y])
# right
z = algopy.zeros(2, dtype=x)
z[0] = x
z[1] = y
Note
If you use NumPy version > 1.5 the reverse mode could be very slow if you use broadcasting.
Reason:
Numpy version 1.5 and above have a bug in binary inplace operations (imul, iadd, ...) when array elements point to overlapping memory regions, e.g., when strides = (0,8).
There is a workaround in AlgoPy for this case, but it is probably rather slow for large matrices since a Python loops needs to access all elements in the matrix.
See discussion on http://thread.gmane.org/gmane.comp.python.numeric.general/51379
and the issue on https://github.com/numpy/numpy/issues/2705
This bug is only relevant in the reverse mode and you use broadcasting, e.g., when you run the following example:
import numpy, algopy
def f(x):
N, = x.shape
A = algopy.zeros((N,N), dtype=x)
A[0,0] = x[2]
A[1,2] = x[1]
return algopy.sum(A*x)
cg = algopy.CGraph()
x = algopy.Function(numpy.ones(3))
y = f(x)
cg.independentFunctionList = [x]
cg.dependentFunctionList = [y]
print cg.gradient(numpy.array([1.,2.,3.]))
- better memory management to speed things up
- direct support for nested derivatives (requires the algorithms to be generic)
- better complex number support, in particular also the reverse mode
- support for sparse Jacobian and sparse Hessian computations using graph coloring as explained in http://portal.acm.org/citation.cfm?id=1096222
AlgoPy works on Python 2.6+ and Python 3.0+.
Current version is 0.5.1
available at: http://pypi.python.org/pypi/algopy
Bleeding edge:
- the most recent version is available at https://github.com/b45ch1/algopy
- includes additional documentation, e.g. talks and additional examples and the sphinx *.rst documents
Dependencies:
- ALGOPY Core:
- numpy (known to work with numpy >= 1.6.2)
- scipy (known to work with scipy >= 0.11.0)
- ALGOPY Examples:
- pyadolc
- Run tests:
- Nose
- Documentation:
- sphinx
- matplotlib, mayavi2, yapgvb
You can install these dependencies with pip:
pip install nose pip install numpy pip install scipyOr on Debian/Ubuntu:
sudo apt-get install -qq python-sphinx python-nose python$-numpy python-scipyor:
sudo apt-get install -qq python3-sphinx python3-nose python3$-numpy python3-scipy
Note
If your scipy-version is too outdated, you can install a more recent version using:
echo 'yes' | sudo add-apt-repository ppa:pylab/stable
sudo apt-get update
Please note that this adds an unofficial package repository to your system.
AlgoPy offers the forward mode and the reverse mode of AD.
Forward Mode of AD:
The basic idea is the computation on (univariate) Taylor polynomials with with matrix coefficients. More precisely, AlgoPy supports univariate Taylor polynomial (UTP) arithmetic where the coefficients of the polynomial are numpy.ndarrays. To distinguish Taylor polynomials from real vectors resp. matrices they are written with enclosing brackets:
\[[x]_D = [x_0, \dots, x_{D-1}] = \sum_{d=0}^{D-1} x_d T^d \;,\]where each \(x_0, x_1, \dots\) are arrays, e.g. a (5,7) array. This mathematical object is described by numpy.ndarray with shape (D,P, 5,7). The \(T\) is an indeterminate, i.e. a formal/dummy variable. Roughly speaking, this is the UTP equivalent to the imaginary number \(i\) in complex arithmetic. The P can be used to compute several Taylor expansions at once. I.e., a vectorization to avoid the recomputation of the same functions with different inputs.
If the input UTPs are correctly initialized one can interpret the coefficients of the resulting polynomial as higher-order derivatives. Have a look at the Taylor series expansion example for a more detailed discussion.
Reverse Mode of AD:
To allow the use of the reverse mode of AD a simple code tracer has been implemented in algopy.tracer. The idea is to record the computation procedure in a datastructure s.t. the control flow sequence can walked in reverse order. There is no complete documentation for the reverse mode yet.
Information on the web:
There has been a winterschool for Algorithmic Differentiation. Some tutorials explain Taylor polynomial arithmetic. http://www.sintef.no/Projectweb/eVITA/Winter-Schools/2010/
Simple Examples:
Advanced Examples:
Application Examples:
Additional Information:
How is AlgoPy organized:
- some algorithms require vectors to be columns of a matrix. I.e. if x is a vector it should be initialized as x = UTPM(numpy.random.rand(D,P,N,1) and not as UTPM(numpy.random.rand(D,P,N)) as one would typically do it using numpy.
- there is no vectorized reverse mode yet. That means that one can compute columns of a Jacobian of dimension (M,N) by propagating N directional derivatives at once. In the reverse mode one would like to propagate M adjoint directions at once. However, this is not implemented yet, i.e. one has to repeat the procedure M times.
Version 0.5.0
- add Python 3 compatibility
- add Travis CI
AlgoPy uses the same testing facilitities as NumPy. I.e., one can run the complete unit test with:
$ python -c "import algopy; algopy.test()"