Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It is typically applied in contexts where the input features naturally form multiple groups or views. An example of a naturally multi-view context is a data set of websites, where each website is described not only by the text on the page, but also by the text of hyperlinks pointing to the page. More recently, various studies have demonstrated the initial success of applying multi-view learning on single-view data with multiple artificially constructed views. However, there lacks a systematic study regarding the effectiveness of such artificially constructed views. To bridge this gap, this thesis begins by providing a high-level overview of multi-view learning with the co-training algorithm. Co-training is a classic semi-supervised learning algorithm that takes advantage of both labelled and unlabelled examples in the data set for training. Then, the thesis presents a web-based tool developed in Python allowing users to experiment with and compare the performance of multiple view construction approaches on various data sets. The supported view construction approaches in the web-based tool include subsampling, Optimal Feature Set Partitioning, and the genetic algorithm. Finally, the thesis presents an empirical comparison of the performance of these approaches, not only against one another, but also against traditional single-view models. The findings show that a simple subsampling approach combined with co-training often outperforms both the other view construction approaches, as well as traditional single-view methods.
Included in this item (2)