covar.cov_shrink_ss¶
- covar.cov_shrink_ss()¶
Compute a shrinkage estimate of the covariance matrix using the Schafer and Strimmer (2005) method.
Parameters: X : array, shape=(n, p)
Data matrix. Each row represents a data point, and each column represents a feature.
shrinkage : float, optional
The covariance shrinkage intensity (range 0-1). If shrinkage is not specified (the default) it is estimated using an analytic formula from Schafer and Strimmer (2005). For shrinkage=0 the empirical correlations are recovered.
Returns: cov : array, shape=(p, p)
Estimated covariance matrix of the data.
shrinkage : float
The applied covariance shrinkage intensity.
See also
- cov_shrink_rblw
- similar method, using a different shrinkage target, T.
- sklearn.covariance.ledoit_wolf
- very similar approach, but uses a different shrinkage target, T.
Notes
This shrinkage estimator corresponds to “Target D”: (diagonal, unequal variance) as described in [1]. The estimator takes the form
ˆΣ=(1−γ)Σsample+γT,where Σsample is the (noisy but unbiased) empirical covariance matrix,
Σsampleij=1n−1n∑k=1(xki−ˉxi)(xkj−ˉxj),the matrix T is the shrinkage target, a less noisy but biased estimator for the covariance, and the scalar γ∈[0,1] is the shrinkage intensity (regularization strength). This approaches uses a diagonal shrinkage target, T:
Tij={Σsampleii if i=j0 otherwise,The idea is that by taking a weighted average of these two estimators, we can get a combined estimator which is more accurate than either is individually, especially when p is large. The optimal weighting, γ, is determined automatically by minimizing the mean squared error. See [1] for details on how this can be done. The formula for γ is
γ=∑i≠j^Var(rij)∑i≠jr2ijwhere r is the sample correlation matrix,
rij=Σsampleijσiσj,and ^Var(rij) is given by
^Var(rij)=n(n−1)3σ2iσ2jn∑k=1(wkij−ˉwij)2,with wkij=(xki−ˉxi)(xkj−ˉxj), and ˉwij=1n∑nk=1wkij.
This method is equivalent to the cov.shrink method in the R package corpcor, if the argument lambda.var is set to 0. See https://cran.r-project.org/web/packages/corpcor/ for details.
References
[R2] Schafer, J., and K. Strimmer. 2005. A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32. http://doi.org/10.2202/1544-6115.1175