Novel statistical learning methods for multi-modality heterogeneous data fusion in health care applications
With the development of computer and sensing technology, rich datasets have become available in many fields such as health care, manufacturing, transportation, just to name a few. Also, data come from multiple heterogeneous sources or modalities. This is a common phenomenon in health care systems. While multi-modality data fusion is a promising research area, there are several special challenges in health care applications. (1) The integration of biological and statistical model is a big challenge; (2) It is commonplace that data from various modalities is not available for every patient due to cost, accessibility, and other reasons. This results in a special missing data structure in which different modalities may be missed in “blocks”. Therefore, how to train a predictive model using such a dataset poses a significant challenge to statistical learning. (3) It is well known that different modality data may contain different aspects of information about the response. The current studies cannot afford to solve this problem. My dissertation includes new statistical learning model development to address each of the aforementioned challenges as well as application case studies using real health care datasets, included in three chapters (Chapter 2, 3, and 4), respectively. Collectively, it is expected that my dissertation could provide a new sets of statistical learning models, algorithms, and theory contributed to multi-modality heterogeneous data fusion driven by the unique challenges in this area. Also, application of these new methods to important medical problems using real-world datasets is expected to provide solutions to these problems, and therefore contributing to the application domains.