Library Header Image
LSE Theses Online London School of Economics web site

Estimation of covariance, correlation and precision matrices for high-dimensional data

Huang, Na (2016) Estimation of covariance, correlation and precision matrices for high-dimensional data. PhD thesis, London School of Economics and Political Science.

Text - Submitted Version
Download (6MB) | Preview


The thesis concerns estimating large correlation and covariance matrices and their inverses. Two new methods are proposed. First, tilting-based methods are proposed to estimate the precision matrix of a p-dimensional random variable, X, when p is possibly much larger than the sample size n. Each 2 by 2 block indexed by (i, j) of the precision matrix can be estimated by the inversion of the pairwise sample conditional covariance matrix of Xi and Xj controlling for all the other variables. However, in the high dimensional setting, including too many or irrelevant controlling variables may distort the results. To determine the controlling subsets, the tilting technique is applied to measure the contribution of each remaining variable to the covariance matrix of Xi and Xj , and only puts the (hopefully) highly relevant remaining variables into the controlling subsets. Four types of tilting-based methods are introduced and the properties are demonstrated. The simulation results are presented under different scenarios for the underlying precision matrix. The second method NOVEL Integration of the Sample and Thresholded covariance estimators (NOVELIST) performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low-rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when p and n satisfy log p/n → 0. In empirical comparisons with several popular estimators, the NOVELIST estimator performs well in estimating covariance and precision matrices over a wide range of models. An automatic algorithm for NOVELIST is developed. Comprehensive applications and real data examples of NOVELIST are presented. Moreover, intensive real data applications of NOVELIST are presented.

Item Type: Thesis (PhD)
Additional Information: © 2016 Na Huang
Library of Congress subject classification: H Social Sciences > HA Statistics
Sets: Departments > Statistics
Supervisor: Fryzlewicz, Piotr

Actions (login required)

Record administration - authorised staff only Record administration - authorised staff only


Downloads per month over past year

View more statistics