A Classification Algorithm for High-Dimensional Data

Document
Description

With the advent of high-dimensional stored big data and streaming data, suddenly machine learning on a very large scale has become a critical need. Such machine learning should be extremely fast, should scale up easily with volume and dimension, should

With the advent of high-dimensional stored big data and streaming data, suddenly machine learning on a very large scale has become a critical need. Such machine learning should be extremely fast, should scale up easily with volume and dimension, should be able to learn from streaming data, should automatically perform dimension reduction for high-dimensional data, and should be deployable on hardware. Neural networks are well positioned to address these challenges of large scale machine learning. In this paper, we present a method that can effectively handle large scale, high-dimensional data. It is an online method that can be used for both streaming and large volumes of stored big data. It primarily uses Kohonen nets, although only a few selected neurons (nodes) from multiple Kohonen nets are actually retained in the end; we discard all Kohonen nets after training. We use Kohonen nets both for dimensionality reduction through feature selection and for building an ensemble of classifiers using single Kohonen neurons. The method is meant to exploit massive parallelism and should be easily deployable on hardware that implements Kohonen nets. Some initial computational results are presented.