In this example it is the goal to compute the gradient of one element of the covariance matrix \(C\) of a constrained parameter estimation problem, i.e,
where \(y = C_{11}\). The covariance matrix satisfies the equation
where \(J_1\) and \(J_2\).
filling a big matrix with elements, then invert it and return a view of of the upper left part of the matrix
Computation of the Nullspace of J2 with a QR decomposition. The formula is \(C = Q_2^T( Q_2 J_1^T J_1 Q_2^T)^{-1} Q_2\).
Potentially, using the QR decomposition twice, i.e. once to compute \(Q_2\) and then for \(J_1\) compute \(Q_2^T\) to avoid the multiplication which would square the condition number, may be numerically more stable. This has not been tested yet though.
The setup for both possibilities to compute the covariance matrix and their derivatives is the same
import numpy
from algopy import CGraph, Function, UTPM, dot, qr, qr_full, eigh, inv, solve, zeros
def eval_covariance_matrix_naive(J1, J2):
M,N = J1.shape
K,N = J2.shape
tmp = zeros((N+K, N+K), dtype=J1)
tmp[:N,:N] = dot(J1.T,J1)
tmp[:N,N:] = J2.T
tmp[N:,:N] = J2
return inv(tmp)[:N,:N]
def eval_covariance_matrix_qr(J1, J2):
M,N = J1.shape
K,N = J2.shape
Q,R = qr_full(J2.T)
Q2 = Q[:,K:].T
J1_tilde = dot(J1,Q2.T)
Q,R = qr(J1_tilde)
V = solve(R.T, Q2)
return dot(V.T,V)
# dimensions of the involved matrices
D,P,M,N,K,Nx = 2,1,5,3,1,1
At first the naive function evaluation is traced.
# trace the function evaluation of METHOD 1: nullspace method
cg1 = CGraph()
J1 = Function(UTPM(numpy.random.rand(*(D,P,M,N))))
J2 = Function(UTPM(numpy.random.rand(*(D,P,K,N))))
C = eval_covariance_matrix_qr(J1, J2)
y = C[0,0]
cg1.trace_off()
cg1.independentFunctionList = [J1, J2]
cg1.dependentFunctionList = [y]
print('covariance matrix: C =\n',C)
then the nullspace method is traced
# trace the function evaluation of METHOD 2: naive method (potentially numerically unstable)
cg2 = CGraph()
J1 = Function(J1.x)
J2 = Function(J2.x)
C2 = eval_covariance_matrix_naive(J1, J2)
y = C2[0,0]
cg2.trace_off()
cg2.independentFunctionList = [J1, J2]
cg2.dependentFunctionList = [y]
print('covariance matrix: C =\n',C2)
After that, the function evaluation is stored in the CGraph instance and can be used to compute the gradient.
# compute the gradient for another value of J1 and J2
J1 = numpy.random.rand(*(M,N))
J2 = numpy.random.rand(*(K,N))
g1 = cg1.gradient([J1,J2])
g2 = cg2.gradient([J1,J2])
print('naive approach: dy/dJ1 = ', g1[0])
print('naive approach: dy/dJ2 = ', g1[1])
print('nullspace approach: dy/dJ1 = ', g2[0])
print('nullspace approach: dy/dJ2 = ', g2[1])
One obtains the output:
naive approach: dy/dJ1 = [[ 1.09167163 -0.37815832 -0.45414733]
[ 0.57524052 -0.19926504 -0.23930634]
[-1.6063055 0.55642903 0.66824064]
[-0.1674705 0.05801228 0.06966956]
[-1.23363017 0.42733318 0.51320363]]
naive approach: dy/dJ2 = [[ 0.1039174 -0.0359973 -0.04323077]]
nullspace approach: dy/dJ1 = [[ 1.09167163 -0.37815832 -0.45414733]
[ 0.57524052 -0.19926504 -0.23930634]
[-1.6063055 0.55642903 0.66824064]
[-0.1674705 0.05801228 0.06966956]
[-1.23363017 0.42733318 0.51320363]]
nullspace approach: dy/dJ2 = [[ 0.1039174 -0.0359973 -0.04323077]]
As one can see, both methods yield the same result.