Lu, Yan (2022) Multivariate outlier detection in latent variable models. PhD thesis, London School of Economics and Political Science.
Text
- Submitted Version
Download (1MB) |
Abstract
Outliers often pose serious problems for statistical models since they can distort the model fit and bias parameter estimation. Outliers are also worthy of attention in their own rights, as they are often informative of substructures of the data. This thesis aims to develop methods of detecting multivariate outliers in latent variable modelling contexts. Outliers are defined as data subsets deviating from a baseline model specified for the majority of the data. By this definition, we specify oneway outliers on the basis of atypical attributes of either individuals or variables and two-way outliers on the basis of atypical attributes of both individuals and variables. In this thesis, we develop the Forward Search (FS) procedures for detecting outlying individuals, latent groups of individuals and DIF variables. The FS does not examine just one subset of the data but instead fits a sequence of augmented subsets in order to decide which part of the data deviates from the baseline model. Outliers are identified through monitoring the effect of the sequential addition of individuals or items on the fitted model. The performance of the FS is assessed through simulated data and cross-national survey data under latent class models, factor mixture models and multiple-group latent variable models. To detect two-way outliers, the thesis proposes to impose a latent class model component for capturing two-way outliers upon a latent factor model component for capturing normal item response behaviour. Statistical inference is carried out under a fully Bayesian framework. The detection of two-way outliers is formulated based on the proposed Bayesian decision rules and compound decision rules that control local false discovery rate and local false non-discovery rate. The proposed method proves to be particularly useful in simultaneously detecting compromised items and test takers with item pre-knowledge in educational tests. To further improve two-way outlier detection, the two-way outlier detection model is extended in an explanatory framework by accounting for covariate effects and the relationships between latent variables.
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | © 2022 Yan Lu |
Library of Congress subject classification: | Q Science > QA Mathematics |
Sets: | Departments > Statistics |
Supervisor: | Moustaki, Irini and Chen, Yunxiao |
URI: | http://etheses.lse.ac.uk/id/eprint/4430 |
Actions (login required)
Record administration - authorised staff only |