Mathematical formulation¶
Formulation¶
Given a sample of \(n\) independent observations \(\mathbf{x}_1,...,\mathbf{x}_n\) of length \(m\), the sample co-variance matrix of \(X\in R^{n\times m}\) is given by
\[Q = \frac{1}{n-1} \sum_{i=1}^n(\mathbf{x}_i - \mathbf{\hat{x}})(\mathbf{x}_i - \mathbf{\hat{x}})^T\]
where \(\mathbf{x}_i\) denotes the \(i\)-th observation and
\[\mathbf{\hat{x}} = \frac{1}{n}\sum_{i=1}^n\mathbf{x}_i\]
is the sample mean. Re-organizing the first formula, we get
\[Q = \frac{1}{n-1}\left[\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i^T - \mathbf{\hat{x}}\left(\sum_{i=1}^n\mathbf{x}_i\right)^T - \left(\sum_{i=1}^n\mathbf{x}_i\right)\mathbf{\hat{x}}^T + n\mathbf{\hat{x}}\mathbf{\hat{x}}^T\right]\]
which is essentially the application of \((a-b)^2 = 2a^2 - 2ab - b^2\). Substituting the sums, we get
\[Q = \frac{1}{n-1}\left[A - \mathbf{\hat{x}}\mathbf{b}^T - \mathbf{b}\mathbf{\hat{x}}^T + n\mathbf{\hat{x}}\mathbf{\hat{x}}^T\right]\]
Updating¶
From this form, we can derive an efficient update of \(Q_{n+1}\) when a new sample \(\mathbf{x}_{n+1}\) becomes available. First we update \(A\) and \(\mathbf{b}\)
\[\begin{split}A_{n+1} = A_n + \mathbf{x}_{n+1}\mathbf{x}_{x-1}^T\\\end{split}\]\[\begin{split}\mathbf{b}_{n+1} = \mathbf{b} + \mathbf{x}_{x+1}\\\end{split}\]
then calculate the new sample mean
\[\mathbf{\hat{x}}_{n+1} = \frac{1}{n+1}\mathbf{b}_{n+1}\]
to finally compute the updated co-variance matrix
\[Q_{n+1} = \frac{1}{n}\left[A_{n+1} - \mathbf{\hat{x}}_{n+1}\mathbf{b}_{n+1}^T - \mathbf{b}_{n+1}\mathbf{\hat{x}}_{n+1}^T + (n+1)\mathbf{\hat{x}}_{n+1}\mathbf{\hat{x}}_{n+1}^T\right]\]