Random rotations in machine learning

Blaser, Rico (2021) Random rotations in machine learning. PhD thesis, London School of Economics and Political Science.

Text - Submitted Version
Download (16MB)

Abstract

This thesis discusses applications of random rotations in machine learning. Rotations of the feature space can lead to more diverse ensembles, better predictions, less complex classifiers and smoother decision boundaries. In Chapter 1 of this thesis, the feature space is randomly rotated and one or more independent base learner is constructed on each rotation. In the case of classification, each base learner receives one vote in the final ensemble prediction; for regressions, predictions are averaged. An empirical study demonstrates the efficacy of random rotations. Observing that not all rotations are equally effective, Chapter 2 is dedicated to the analysis of what makes a rotation effective and whether it is possible to emphasize such rotations in the final ensemble prediction. It is demonstrated that focusing on rotations that lead to simpler base learners leads to more compact ensembles and often increases predictive accuracy. In this chapter, predictions are aggregated in a parametric fashion, providing more weight to less complex predictors in the final ensemble. Multiple parametric forms are explored. Instead of constructing one or more predictor for each rotation, it is also possible to provide multiple rotations of the feature space to a single predictor. This effectively provides a single predictor with multiple simultaneous viewpoints on the same feature space. The first half of Chapter 3 explores this idea. A great benefit of this approach, when compared to the methods described in the earlier chapters, is that the aggregation of the predictions across multiple rotations becomes part of the training algorithm of the classifier, rather than being constructed exogenously. This also makes the approach viable for ensemble architectures with an interdependence between the base learners, such as boosting. Finally, an importance measure can be used not only to select the most salient features but also to determine the most helpful rotations. A different method of combining multiple rotations is to form a meta- or stacking predictor that leverages the base predictions on each rotation as inputs. This results in a generalization of the results of Chapter 2, whereby the aggregation becomes nonparametric in nature and local with respect to the decision boundary. In this context, extra care must be taken to avoid data snooping biases. A repeated, nested cross-validation technique is described in the second half of Chapter 3 to facilitate this process. The procedure directly answers the question if rotations are helpful for a specific data set and provides an avenue for selecting effective rotations. Chapter 4 is concerned with the impact random rotations have had on the scientific literature and open source software community since their introduction with the publication of our initial paper on the topic.

Item Type:	Thesis (PhD)
Additional Information:	© 2021 Rico Blaser
Library of Congress subject classification:	Q Science > QA Mathematics
Sets:	Departments > Statistics
Supervisor:	Fryzlewicz, Piotr
URI:	http://etheses.lse.ac.uk/id/eprint/4368

Actions (login required)

Record administration - authorised staff only

Download statistics

Downloads

Downloads per month over past year

View more statistics