Yuen, Lok Ting (2021) High-dimensional variable selection and time series classification and forecasting with potential change-points. PhD thesis, London School of Economics and Political Science.
Text
- Submitted Version
Download (4MB) |
Abstract
This thesis studies high-dimensional variable selection and time series with potential changepoints. In Chapter 1 we propose Combined Selection and Uncertainty Visualiser (CSUV), which estimates the set of true covariates in high-dimensional linear regression and visualises selection uncertainties by exploiting the (dis)agreement among different base selectors. Our proposed method selects covariates that get selected the most frequently by the different variable selection methods on subsampled data. The method is generic and can be used with different existing variable selection methods. We demonstrate its variable selection performance using real and simulated data. The variable selection method and its uncertainty illustration tool are publicly available as R package CSUV (https://github.com/christineyuen/ CSUV). The graphical tool is also available online via https://csuv.shinyapps.io/csuv. In Chapter 2 we explore the potential and shortcomings of the “estimation-simulationclassification” approach for time series model identification. Assume there is only one realisation of a time series available and we would like to find the true model specification for a given time series. With the success of deep learning in classification in recent years, we explore the possibility of using classifiers for model identification. The application of classifiers on model identification is not straightforward as classifiers require a sufficient number of observations to train but we only have one time series at hand. One possible solution is to generate pseudo training data that is similar to the observed time series, and use them to fit the classifiers. We call it the “estimation-simulation-classification” (ESC) approach. We find that if the model complexity is not taken into account, more flexible models are favoured by this approach. The advantage of using a good classifier can be discounted by the ignorance of the model complexity, and some simple methods (e.g. information criteria) that take into account the model complexity may outperform classifiers with the ESC approach. Based on our observations on the ESC approach, we propose using BIC and consider ResNet via the ESC approach for time series model identification. The newly proposed methods are implemented in R and will be available online via https://github.com/christineyuen/ESC. In Chapter 3, we propose different procedures to extend the use of Narrowest-Over- Threshold (NOT) to time series with dependent noise, with the objective to provide better forecasting performance. The new method takes into account the potential dependent structure of the noise. We also explore using cross-validation to select the set of changepoints from the NOT solution path. We demonstrate the prediction performance of the proposed procedures using real and simulated data, and compare the performance with some other methods in different settings. The newly proposed methods are implemented in R and will be available online via https://github.com/christineyuen/NOT-ARMA.
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | © 2021 Lok Ting Yuen |
Library of Congress subject classification: | Q Science > QA Mathematics |
Sets: | Departments > Statistics |
Supervisor: | Fryzlewicz, Piotr |
URI: | http://etheses.lse.ac.uk/id/eprint/4237 |
Actions (login required)
Record administration - authorised staff only |