Previous topic

The Code Tracer

Next topic

Symbolic Differentiation

This Page

Polarization Identities for Mixed Partial DerivativesΒΆ

This is an advanced tutorial. It explains how functions containing derivatives of other functions can be evaluated using univariate Taylor polynomial arithmetic by use of polarization identities. The described technique could be implemented as convenience functions just as UTPM.init_jacobian, UTPM.init_hessian, etc.

Consider the function

\[\begin{split}F: \Bbb{R}^N \to & \Bbb{R}^M \\ x \mapsto& y = F(x) = \begin{pmatrix} x_1 x_2 \\ x_2 x_3 \\ x_1 - x_2 \end{pmatrix}\end{split}\]

where \((N,N)=(3,3)\) and we want to compute

\[\begin{split}\nabla_x \Phi(J(x)) =& \mathrm{tr} ( J(x)^T J(x)) \;,\end{split}\]

where \(J(x) = {\partial F \over \partial x}(x)\). In univariate Taylor polynomial arithmetic it is possible to compute the \(i\)-th column of \(J(x)\) using \({\partial \over \partial T} F(x + e_i)|_{T=0}\). Also, it is possible to compute the \(j\)-th element of the gradient \(\nabla_x \Phi(J(x))\) using \({\partial \over \partial T} \Phi( J(x + e_j)) |_{T=0}\). That means, in total it is necessary to compute

\[\begin{split}{\partial \Phi \over \partial x_j} (J(x)) =& \left. {\partial \over \partial T_2} \Phi \left( \begin{pmatrix} {\partial \over \partial T_1} F(x + e_1 T_1 + e_jT_2), \dots, {\partial \over \partial T_1} F(x + e_N T_1 + e_jT_2) \\ \end{pmatrix} \right) \right|_{T_1 = T_2 =0} \\ =& \left. \sum_{m=1}^M \sum_{n=1}^N {\partial \Phi \over \partial J_{mn}} \right|_{J = J(x)} {\partial^2 \over \partial T_2 \partial T_1} F_m(x + e_n T_1 + e_jT_2)\end{split}\]

In this form it seems to be necessary to propagate a multivariate Taylor polynomial \(x + e_1 T_1 + e_jT_2\) through \(F\). However, one can use a polarization identity like

\[\left. {\partial^2 \over \partial T_1 \partial T_2} F(x + e_i T_1 + e_jT_2) \right|_{T_1 = T_2 = 0} = \left. {1 \over 4} {\partial^2 \over \partial T^2} \left( F(x + (e_i + e_j) T) - F(x + (e_i - e_j)T) \right) \right|_{T=0}\]

to cast the problem back to a problem where twice the number of univariate Taylor polynomials have to be propagated. In total we need to cycle over \(e_i\), for \(i =1,2,\dots,N\) and \(e_j\), for \(j = 1,2,\dots,N\), which can both be written as columns/rows of the identity matrix. Stacking \(e_i + e_j\) and \(e_i - e_j\) for all possible choices one obtains the univariate directions:

dirs =
[[ 2.  0.  0.]
 [ 0.  0.  0.]
 [ 1.  1.  0.]
 [ 1. -1.  0.]
 [ 1.  0.  1.]
 [ 1.  0. -1.]
 [ 1.  1.  0.]
 [-1.  1.  0.]
 [ 0.  2.  0.]
 [ 0.  0.  0.]
 [ 0.  1.  1.]
 [ 0.  1. -1.]
 [ 1.  0.  1.]
 [-1.  0.  1.]
 [ 0.  1.  1.]
 [ 0. -1.  1.]
 [ 0.  0.  2.]
 [ 0.  0.  0.]]

As one can see, some directions are zero and could be avoided. Without this obvious improvement, there are \(P = 2N^2=18\) directions. Now to how the above example can be computed in AlgoPy.

import algopy, numpy
import algopy.exact_interpolation as ei

def eval_F(x):
    retval = algopy.zeros(3, dtype=x)
    retval[0] = x[0]*x[1]
    retval[1] = x[1]*x[2]
    retval[2] = x[0] - x[1]
    return retval

D,M,N = 3,3,3
P = 2*M*N

# STEP 1: generate all necessary directions
Id = numpy.eye(N)
dirs = numpy.zeros((P, N))

count = 0
for n1 in range(N):
    for n2 in range(N): 
        dirs[count] += Id[n1]
        dirs[count] += Id[n2]
        
        dirs[count + 1] += Id[n1]
        dirs[count + 1] -= Id[n2]
        
        count += 2

print('dirs =')
print(dirs)

# STEP 2: use these directions to initialize the UTPM instance
xdata = numpy.zeros((D,2*N*N,N))
xdata[0] = [1,2,3]
xdata[1] = dirs

# STEP 3: compute function F in UTP arithmetic
x = algopy.UTPM(xdata)
y = eval_F(x)

# STEP 4: use polarization identity to build univariate Taylor polynomial
#         of the Jacobian J
Jdata = numpy.zeros((D-1, N, N, N))
count, count2 = 0,0

# build J_0
for n in range(N):
    Jdata[0,:,:,n] = y.data[1, 2*n*(N+1), :]/2.
    
# build J_1
count = 0
for n1 in range(N):
    for n2 in range(N):
        Jdata[1,n2,:,n1] = 0.5*( y.data[2, count] - y.data[2, count+1])
        count += 2

# initialize UTPM instance
J = algopy.UTPM(Jdata)

# STEP 5: evaluate Phi in UTP arithmetic
Phi = algopy.trace(algopy.dot(J.T, J))
print('Phi=',Phi)
print('gradient of Phi =', Phi.data[1,:])

As output we obtain:

$ python polarization.py
dirs =
[[ 2.  0.  0.]
 [ 0.  0.  0.]
 [ 1.  1.  0.]
 [ 1. -1.  0.]
 [ 1.  0.  1.]
 [ 1.  0. -1.]
 [ 1.  1.  0.]
 [-1.  1.  0.]
 [ 0.  2.  0.]
 [ 0.  0.  0.]
 [ 0.  1.  1.]
 [ 0.  1. -1.]
 [ 1.  0.  1.]
 [-1.  0.  1.]
 [ 0.  1.  1.]
 [ 0. -1.  1.]
 [ 0.  0.  2.]
 [ 0.  0.  0.]]
Phi= [[ 20.  20.  20.]
 [  2.   8.   6.]]
gradient of Phi = [ 2.  8.  6.]

which is the correct value.

One should note that actually in this special case, when all elements \({\partial \Phi \over \partial x_j}\) have to be propagated it is advantageous to use another polarization identity which requires to propagate only as many directions as the Hessian has distinct elements. However, the polarization identity from above is more flexible since it allows to compute mixed partial derivatives more directly.