Loading [MathJax]/jax/output/HTML-CSS/jax.js

Previous topic

The Code Tracer

Next topic

Symbolic Differentiation

This Page

Polarization Identities for Mixed Partial Derivatives

This is an advanced tutorial. It explains how functions containing derivatives of other functions can be evaluated using univariate Taylor polynomial arithmetic by use of polarization identities. The described technique could be implemented as convenience functions just as UTPM.init_jacobian, UTPM.init_hessian, etc.

Consider the function

F:RNRMxy=F(x)=(x1x2x2x3x1x2)

where (N,N)=(3,3) and we want to compute

xΦ(J(x))=tr(J(x)TJ(x)),

where J(x)=Fx(x). In univariate Taylor polynomial arithmetic it is possible to compute the i-th column of J(x) using TF(x+ei)|T=0. Also, it is possible to compute the j-th element of the gradient xΦ(J(x)) using TΦ(J(x+ej))|T=0. That means, in total it is necessary to compute

Φxj(J(x))=T2Φ((T1F(x+e1T1+ejT2),,T1F(x+eNT1+ejT2)))|T1=T2=0=Mm=1Nn=1ΦJmn|J=J(x)2T2T1Fm(x+enT1+ejT2)

In this form it seems to be necessary to propagate a multivariate Taylor polynomial x+e1T1+ejT2 through F. However, one can use a polarization identity like

2T1T2F(x+eiT1+ejT2)|T1=T2=0=142T2(F(x+(ei+ej)T)F(x+(eiej)T))|T=0

to cast the problem back to a problem where twice the number of univariate Taylor polynomials have to be propagated. In total we need to cycle over ei, for i=1,2,,N and ej, for j=1,2,,N, which can both be written as columns/rows of the identity matrix. Stacking ei+ej and eiej for all possible choices one obtains the univariate directions:

dirs =
[[ 2.  0.  0.]
 [ 0.  0.  0.]
 [ 1.  1.  0.]
 [ 1. -1.  0.]
 [ 1.  0.  1.]
 [ 1.  0. -1.]
 [ 1.  1.  0.]
 [-1.  1.  0.]
 [ 0.  2.  0.]
 [ 0.  0.  0.]
 [ 0.  1.  1.]
 [ 0.  1. -1.]
 [ 1.  0.  1.]
 [-1.  0.  1.]
 [ 0.  1.  1.]
 [ 0. -1.  1.]
 [ 0.  0.  2.]
 [ 0.  0.  0.]]

As one can see, some directions are zero and could be avoided. Without this obvious improvement, there are P=2N2=18 directions. Now to how the above example can be computed in AlgoPy.

import algopy, numpy
import algopy.exact_interpolation as ei

def eval_F(x):
    retval = algopy.zeros(3, dtype=x)
    retval[0] = x[0]*x[1]
    retval[1] = x[1]*x[2]
    retval[2] = x[0] - x[1]
    return retval

D,M,N = 3,3,3
P = 2*M*N

# STEP 1: generate all necessary directions
Id = numpy.eye(N)
dirs = numpy.zeros((P, N))

count = 0
for n1 in range(N):
    for n2 in range(N): 
        dirs[count] += Id[n1]
        dirs[count] += Id[n2]
        
        dirs[count + 1] += Id[n1]
        dirs[count + 1] -= Id[n2]
        
        count += 2

print('dirs =')
print(dirs)

# STEP 2: use these directions to initialize the UTPM instance
xdata = numpy.zeros((D,2*N*N,N))
xdata[0] = [1,2,3]
xdata[1] = dirs

# STEP 3: compute function F in UTP arithmetic
x = algopy.UTPM(xdata)
y = eval_F(x)

# STEP 4: use polarization identity to build univariate Taylor polynomial
#         of the Jacobian J
Jdata = numpy.zeros((D-1, N, N, N))
count, count2 = 0,0

# build J_0
for n in range(N):
    Jdata[0,:,:,n] = y.data[1, 2*n*(N+1), :]/2.
    
# build J_1
count = 0
for n1 in range(N):
    for n2 in range(N):
        Jdata[1,n2,:,n1] = 0.5*( y.data[2, count] - y.data[2, count+1])
        count += 2

# initialize UTPM instance
J = algopy.UTPM(Jdata)

# STEP 5: evaluate Phi in UTP arithmetic
Phi = algopy.trace(algopy.dot(J.T, J))
print('Phi=',Phi)
print('gradient of Phi =', Phi.data[1,:])

As output we obtain:

$ python polarization.py
dirs =
[[ 2.  0.  0.]
 [ 0.  0.  0.]
 [ 1.  1.  0.]
 [ 1. -1.  0.]
 [ 1.  0.  1.]
 [ 1.  0. -1.]
 [ 1.  1.  0.]
 [-1.  1.  0.]
 [ 0.  2.  0.]
 [ 0.  0.  0.]
 [ 0.  1.  1.]
 [ 0.  1. -1.]
 [ 1.  0.  1.]
 [-1.  0.  1.]
 [ 0.  1.  1.]
 [ 0. -1.  1.]
 [ 0.  0.  2.]
 [ 0.  0.  0.]]
Phi= [[ 20.  20.  20.]
 [  2.   8.   6.]]
gradient of Phi = [ 2.  8.  6.]

which is the correct value.

One should note that actually in this special case, when all elements Φxj have to be propagated it is advantageous to use another polarization identity which requires to propagate only as many directions as the Hessian has distinct elements. However, the polarization identity from above is more flexible since it allows to compute mixed partial derivatives more directly.