covar.cov_shrink_ss

covar.cov_shrink_ss()

Compute a shrinkage estimate of the covariance matrix using the Schafer and Strimmer (2005) method.

Parameters:

X : array, shape=(n, p)

Data matrix. Each row represents a data point, and each column represents a feature.

shrinkage : float, optional

The covariance shrinkage intensity (range 0-1). If shrinkage is not specified (the default) it is estimated using an analytic formula from Schafer and Strimmer (2005). For shrinkage=0 the empirical correlations are recovered.

Returns:

cov : array, shape=(p, p)

Estimated covariance matrix of the data.

shrinkage : float

The applied covariance shrinkage intensity.

See also

cov_shrink_rblw
similar method, using a different shrinkage target, T.
sklearn.covariance.ledoit_wolf
very similar approach, but uses a different shrinkage target, T.

Notes

This shrinkage estimator corresponds to “Target D”: (diagonal, unequal variance) as described in [1]. The estimator takes the form

ˆΣ=(1γ)Σsample+γT,

where Σsample is the (noisy but unbiased) empirical covariance matrix,

Σsampleij=1n1nk=1(xkiˉxi)(xkjˉxj),

the matrix T is the shrinkage target, a less noisy but biased estimator for the covariance, and the scalar γ[0,1] is the shrinkage intensity (regularization strength). This approaches uses a diagonal shrinkage target, T:

Tij={Σsampleii if i=j0 otherwise,

The idea is that by taking a weighted average of these two estimators, we can get a combined estimator which is more accurate than either is individually, especially when p is large. The optimal weighting, γ, is determined automatically by minimizing the mean squared error. See [1] for details on how this can be done. The formula for γ is

γ=ij^Var(rij)ijr2ij

where r is the sample correlation matrix,

rij=Σsampleijσiσj,

and ^Var(rij) is given by

^Var(rij)=n(n1)3σ2iσ2jnk=1(wkijˉwij)2,

with wkij=(xkiˉxi)(xkjˉxj), and ˉwij=1nnk=1wkij.

This method is equivalent to the cov.shrink method in the R package corpcor, if the argument lambda.var is set to 0. See https://cran.r-project.org/web/packages/corpcor/ for details.

References

[R2]Schafer, J., and K. Strimmer. 2005. A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32. http://doi.org/10.2202/1544-6115.1175