Previous topic

First Order Derivatives in the Forward Mode of Algorithmic Differentiation

Next topic

Linear Error Propagation

This Page

Covariance Matrix ComputationΒΆ

In this example it is the goal to compute the gradient of one element of the covariance matrix \(C\) of a constrained parameter estimation problem, i.e,

\[\nabla_{J_1, J_2} y\]

where \(y = C_{11}\). The covariance matrix satisfies the equation

\[\begin{split}C = \begin{pmatrix} I & 0 \end{pmatrix} \ \begin{pmatrix} J_1^T J_1 & J_2^T \\ J_2 & 0 \end{pmatrix}^{-1} \ \begin{pmatrix} I \\ 0 \end{pmatrix}\end{split}\]

where \(J_1\) and \(J_2\).

Two possibilities are compared:
  1. filling a big matrix with elements, then invert it and return a view of of the upper left part of the matrix

  2. Computation of the Nullspace of J2 with a QR decomposition. The formula is \(C = Q_2^T( Q_2 J_1^T J_1 Q_2^T)^{-1} Q_2\).

    Potentially, using the QR decomposition twice, i.e. once to compute \(Q_2\) and then for \(J_1\) compute \(Q_2^T\) to avoid the multiplication which would square the condition number, may be numerically more stable. This has not been tested yet though.

The setup for both possibilities to compute the covariance matrix and their derivatives is the same

import numpy
from algopy import CGraph, Function, UTPM, dot, qr, qr_full, eigh, inv, solve, zeros

def eval_covariance_matrix_naive(J1, J2):
    M,N = J1.shape
    K,N = J2.shape
    tmp = zeros((N+K, N+K), dtype=J1)
    tmp[:N,:N] = dot(J1.T,J1)
    tmp[:N,N:] = J2.T
    tmp[N:,:N] = J2
    return inv(tmp)[:N,:N]
    
def eval_covariance_matrix_qr(J1, J2):
    M,N = J1.shape
    K,N = J2.shape
    Q,R = qr_full(J2.T)
    Q2 = Q[:,K:].T
    J1_tilde = dot(J1,Q2.T)
    Q,R = qr(J1_tilde)
    V = solve(R.T, Q2)
    return dot(V.T,V)


# dimensions of the involved matrices
D,P,M,N,K,Nx = 2,1,5,3,1,1
where:
  • D - 1 is the degree of the Taylor polynomial
  • P directional derivatives at once
  • M number of rows of J1
  • N number of cols of J1
  • K number of rows of J2 (must be smaller than N)

At first the naive function evaluation is traced.

# trace the function evaluation of METHOD 1: nullspace method
cg1 = CGraph()
J1 = Function(UTPM(numpy.random.rand(*(D,P,M,N))))
J2 = Function(UTPM(numpy.random.rand(*(D,P,K,N))))
C = eval_covariance_matrix_qr(J1, J2)
y = C[0,0]
cg1.trace_off()
cg1.independentFunctionList = [J1, J2]
cg1.dependentFunctionList = [y]
print('covariance matrix: C =\n',C)

then the nullspace method is traced

# trace the function evaluation of METHOD 2: naive method (potentially numerically unstable)
cg2 = CGraph()
J1 = Function(J1.x)
J2 = Function(J2.x)
C2 = eval_covariance_matrix_naive(J1, J2)
y = C2[0,0]
cg2.trace_off()
cg2.independentFunctionList = [J1, J2]
cg2.dependentFunctionList = [y]
print('covariance matrix: C =\n',C2)

After that, the function evaluation is stored in the CGraph instance and can be used to compute the gradient.

# compute the gradient for another value of J1 and J2
J1 = numpy.random.rand(*(M,N))
J2 = numpy.random.rand(*(K,N))

g1 = cg1.gradient([J1,J2])
g2 = cg2.gradient([J1,J2])

print('naive approach: dy/dJ1 = ', g1[0])
print('naive approach: dy/dJ2 = ', g1[1])

print('nullspace approach: dy/dJ1 = ', g2[0])
print('nullspace approach: dy/dJ2 = ', g2[1])

One obtains the output:

naive approach: dy/dJ1 =  [[ 1.09167163 -0.37815832 -0.45414733]
[ 0.57524052 -0.19926504 -0.23930634]
[-1.6063055   0.55642903  0.66824064]
[-0.1674705   0.05801228  0.06966956]
[-1.23363017  0.42733318  0.51320363]]
naive approach: dy/dJ2 =  [[ 0.1039174  -0.0359973  -0.04323077]]
nullspace approach: dy/dJ1 =  [[ 1.09167163 -0.37815832 -0.45414733]
[ 0.57524052 -0.19926504 -0.23930634]
[-1.6063055   0.55642903  0.66824064]
[-0.1674705   0.05801228  0.06966956]
[-1.23363017  0.42733318  0.51320363]]
nullspace approach: dy/dJ2 =  [[ 0.1039174  -0.0359973  -0.04323077]]

As one can see, both methods yield the same result.