Tree Guided Personalized Machine Learning Prediction With Applications To Precision Diagnostics

Shah, Nishtha

The proposed research is motivated by the colon cancer bio-marker study, which recruited case (or colon cancer) and healthy control samples and quantified their large number of candidate bio-markers using a high-throughput technology, called nucleicacid-programmable protein array (NAPPA). The study…

The proposed research is motivated by the colon cancer bio-marker study, which recruited case (or colon cancer) and healthy control samples and quantified their large number of candidate bio-markers using a high-throughput technology, called nucleicacid-programmable protein array (NAPPA). The study aimed to identify a panel of biomarkers to accurately distinguish between the cases and controls. A major challenge in analyzing this study was the bio-marker heterogeneity, where bio-marker responses differ from sample to sample. The goal of this research is to improve prediction accuracy for motivating or similar studies. Most machine learning (ML) algorithms, developed under the one-size-fits-all strategy, were not able to analyze the above-mentioned heterogeneous data. Failing to capture the individuality of each subject, several standard ML algorithms tested against this dataset performed poorly resulting in 55-61% accuracy. Alternatively, the proposed personalized ML (PML) strategy aims at tailoring the optimal ML models for each subject according to their individual characteristics yielding best highest accuracy of 72%.

Copyright Statement