Library Header Image
LSE Theses Online London School of Economics web site

On variable selection in high dimensions, segmentation and multiscale time series

Baranowski, Rafal (2016) On variable selection in high dimensions, segmentation and multiscale time series. PhD thesis, London School of Economics and Political Science.

Text - Submitted Version
Download (2MB) | Preview


In this dissertation, we study the following three statistical problems. First, we consider a high-dimensional data framework, where the number of covariates potentially affecting the response is large relatively to the sample size. In this setting, some of the covariates are observed to exhibit an impact on the response spuriously. Addressing this issue, we rank the covariates according to their impact on the response and use certain subsampling scheme to identify the covariates which non-spuriously appear at the top of the ranking. We study the conditions under which such set is unique and show that, with high probability, it can be recovered from the data by our procedure, for rankings based on measures commonly used in statistics. We illustrate its good practical performance in an extensive comparative simulation study and on microarray data. Second, we propose a generic approach to the problem of detecting the unknown number of features in the time series of interest, such as changes in trend or jumps in the mean, occurring at the unknown locations in time. Those locations naturally imply the decomposition of the data into segments of homogeneity, the knowledge of which is useful in e.g. estimation of the mean of the series. We provide a precise description of the type of features we are interested in and, in two important scenarios, demonstrate that our methodology enjoys appealing theoretical properties. We show that the performance of our proposal matches or surpasses the state of the art in the scenarios tested and present its applications on three real datasets: oil price log-returns, temperature anomalies data and the UK House Price Index Finally, we introduce a class of univariate multiscale time series models and propose an estimation procedure to fit those models from the data. We demonstrate that our proposal, with a large probability, correctly identifies important timescales, under the framework in which the largest timescale in the model diverges with the sample size. A good empirical performance of the method is illustrated in an application to high-frequency financial returns for stocks listed on New York Stock Exchange. For all proposed methods, we provide efficient and publicly-available computer implementations.

Item Type: Thesis (PhD)
Additional Information: © 2016 Rafal Baranowski
Library of Congress subject classification: H Social Sciences > HA Statistics
Sets: Departments > Statistics
Supervisor: Fryzlewicz, Piotr

Actions (login required)

Record administration - authorised staff only Record administration - authorised staff only


Downloads per month over past year

View more statistics