Matching Items (5)
Filtering by

Clear all filters

128381-Thumbnail Image.png
Description

Objectives: Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid

Objectives: Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants.

Methods: In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score.

Results: The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance.

Conclusions: By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.

ContributorsHu, Xia (Author) / Reaven, Peter (Author) / Saremi, Aramesh (Author) / Liu, Ninghao (Author) / Abbasi, Mohammad (Author) / Liu, Huan (Author) / Migrino, Raymond Q. (Author) / DREAM 9 AML-OPC Consortium (Contributor)
Created2016-09-05
128554-Thumbnail Image.png
Description

Successful identification of directed dynamical influence in complex systems is relevant to significant problems of current interest. Traditional methods based on Granger causality and transfer entropy have issues such as difficulty with nonlinearity and large data requirement. Recently a framework based on nonlinear dynamical analysis was proposed to overcome these

Successful identification of directed dynamical influence in complex systems is relevant to significant problems of current interest. Traditional methods based on Granger causality and transfer entropy have issues such as difficulty with nonlinearity and large data requirement. Recently a framework based on nonlinear dynamical analysis was proposed to overcome these difficulties. We find, surprisingly, that noise can counterintuitively enhance the detectability of directed dynamical influence. In fact, intentionally injecting a proper amount of asymmetric noise into the available time series has the unexpected benefit of dramatically increasing confidence in ascertaining the directed dynamical influence in the underlying system. This result is established based on both real data and model time series from nonlinear ecosystems. We develop a physical understanding of the beneficial role of noise in enhancing detection of directed dynamical influence.

ContributorsJiang, Junjie (Author) / Huang, Zi-Gang (Author) / Huang, Liang (Author) / Liu, Huan (Author) / Lai, Ying-Cheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2016-04-12
129268-Thumbnail Image.png
Description

Location-based social networks (LBSNs) have attracted an increasing number of users in recent years, resulting in large amounts of geographical and social data. Such LBSN data provide an unprecedented opportunity to study the human movement from their socio-spatial behavior, in order to improve location-based applications like location recommendation. As users

Location-based social networks (LBSNs) have attracted an increasing number of users in recent years, resulting in large amounts of geographical and social data. Such LBSN data provide an unprecedented opportunity to study the human movement from their socio-spatial behavior, in order to improve location-based applications like location recommendation. As users can check-in at new places, traditional work on location prediction that relies on mining a user’s historical moving trajectories fails as it is not designed for the cold-start problem of recommending new check-ins. While previous work on LBSNs attempting to utilize a user’s social connections for location recommendation observed limited help from social network information. In this work, we propose to address the cold-start location recommendation problem by capturing the correlations between social networks and geographical distance on LBSNs with a geo-social correlation model. The experimental results on a real-world LBSN dataset demonstrate that our approach properly models the geo-social correlations of a user’s cold-start check-ins and significantly improves the location recommendation performance.

ContributorsGao, Huiji (Author) / Tang, Jiliang (Author) / Liu, Huan (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2015-03-01
128907-Thumbnail Image.png
Description

Twitter is a major social media platform in which users send and read messages (“tweets”) of up to 140 characters. In recent years this communication medium has been used by those affected by crises to organize demonstrations or find relief. Because traffic on this media platform is extremely heavy, with

Twitter is a major social media platform in which users send and read messages (“tweets”) of up to 140 characters. In recent years this communication medium has been used by those affected by crises to organize demonstrations or find relief. Because traffic on this media platform is extremely heavy, with hundreds of millions of tweets sent every day, it is difficult to differentiate between times of turmoil and times of typical discussion. In this work we present a new approach to addressing this problem. We first assess several possible “thermostats” of activity on social media for their effectiveness in finding important time periods. We compare methods commonly found in the literature with a method from economics. By combining methods from computational social science with methods from economics, we introduce an approach that can effectively locate crisis events in the mountains of data generated on Twitter. We demonstrate the strength of this method by using it to locate the social events relating to the Occupy Wall Street movement protests at the end of 2011.

ContributorsKenett, Dror Y. (Author) / Morstatter, Fred (Author) / Stanley, H. Eugene (Author) / Liu, Huan (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2014-07-30
129372-Thumbnail Image.png
Description

Understanding the dynamics of human movements is key to issues of significant current interest such as behavioral prediction, recommendation, and control of epidemic spreading. We collect and analyze big data sets of human movements in both cyberspace (through browsing of websites) and physical space (through mobile towers) and find a

Understanding the dynamics of human movements is key to issues of significant current interest such as behavioral prediction, recommendation, and control of epidemic spreading. We collect and analyze big data sets of human movements in both cyberspace (through browsing of websites) and physical space (through mobile towers) and find a superlinear scaling relation between the mean frequency of visit〈f〉and its fluctuation σ : σ ∼〈f⟩β with β ≈ 1.2. The probability distribution of the visiting frequency is found to be a stretched exponential function. We develop a model incorporating two essential ingredients, preferential return and exploration, and show that these are necessary for generating the scaling relation extracted from real data. A striking finding is that human movements in cyberspace and physical space are strongly correlated, indicating a distinctive behavioral identifying characteristic and implying that the behaviors in one space can be used to predict those in the other.

ContributorsZhao, Zhidan (Author) / Huang, Zi-Gang (Author) / Huang, Liang (Author) / Liu, Huan (Author) / Lai, Ying-Cheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2014-11-12