Hao, Sixing (2025) Latent structure estimation for high-dimensional dependent data via eigenanalysis. PhD thesis, London School of Economics and Political Science.
![]() |
Text
- Submitted Version
Download (11MB) |
Abstract
With advancements in technology, the collection and storage of high-dimensional data have become increasingly common, necessitating tools to analyze such data effectively. Recovering the latent structure has gained popularity as an approach to dimensionality reduction, as the latent processes typically have lower dimensionality, making them easier to analyze and interpret. This thesis explores methods for uncovering the latent structure in high-dimensional data across three domains. Chapter 2 proposes a novel estimation method for the blind source separation model, as introduced in Bachoc et al. (2020). The new method leverages eigenanalysis of a positive definite matrix constructed from multiple normalized spatial local covariance matrices, enabling the handling of moderately high-dimensional random fields. The consistency of the estimated mixing matrix is established with explicit error rates, even under slowly decaying eigen-gaps. Chapter 3 examines the factor model framework for time series Lam and Yao (2012a), with a focus on estimating the number of latent factors. Traditional methods struggle with varying factor strengths, limiting their applicability. To address this, a non-parametric hypothesis testing procedure is proposed, capable of identifying the correct number of factors even when factor strengths differ. The proof on significance level of the test is provided, and its effectiveness is demonstrated through comparisons with existing methods on both simulated and real-world datasets. Chapter 4 addresses the challenge of electricity load forecasting, starting with Generalized Additive Models (GAM) provided by Électricité de France. The residuals from GAM forecasts is analyzed and modeled to uncover its latent structure, which not only simplify the modeling process but also enhance GAM estimations. Two approaches are explored: latent segmentation using TS-PCA Chang, Guo, and Yao (2018a) and Matrix Time Series Decorrelation Han et al. (2023), and dimensionality reduction with factor models using the procedure developed in Chapter 3. Applied to national and regional electricity load data in France, both methods enhance forecast accuracy, as measured by Root Mean Squared Error (RMSE).
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | © 2025 Sixing Hao |
Library of Congress subject classification: | Q Science > QA Mathematics |
Sets: | Departments > Statistics |
Supervisor: | Yao, Qiwei |
URI: | http://etheses.lse.ac.uk/id/eprint/4859 |
Actions (login required)
![]() |
Record administration - authorised staff only |