Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants.
In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score.
The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance.
By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.