Matching Items (101)

128381-Thumbnail Image.png

Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance

Description

Objectives
Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk

Objectives
Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants.
Methods
In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score.
Results
The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance.
Conclusions
By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.

Contributors

Agent

Created

Date Created
  • 2016-09-05

128554-Thumbnail Image.png

Directed dynamical influence is more detectable with noise

Description

Successful identification of directed dynamical influence in complex systems is relevant to significant problems of current interest. Traditional methods based on Granger causality and transfer entropy have issues such as

Successful identification of directed dynamical influence in complex systems is relevant to significant problems of current interest. Traditional methods based on Granger causality and transfer entropy have issues such as difficulty with nonlinearity and large data requirement. Recently a framework based on nonlinear dynamical analysis was proposed to overcome these difficulties. We find, surprisingly, that noise can counterintuitively enhance the detectability of directed dynamical influence. In fact, intentionally injecting a proper amount of asymmetric noise into the available time series has the unexpected benefit of dramatically increasing confidence in ascertaining the directed dynamical influence in the underlying system. This result is established based on both real data and model time series from nonlinear ecosystems. We develop a physical understanding of the beneficial role of noise in enhancing detection of directed dynamical influence.

Contributors

Agent

Created

Date Created
  • 2016-04-12

134317-Thumbnail Image.png

Modeling Fantasy Baseball Player Popularity Using Twitter Activity

Description

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such has generated a great deal of activity on Twitter. As fantasy baseball continues to grow in popularity, so does the research into better algorithms for picking players. Most of the research done in this area focuses on improving the prediction of a player's individual performance. However, the crowd-sourcing power afforded by social media may enable more informed predictions about players' performances. Players are chosen by popularity and personal preferences by most amateur gamblers. While some of these trends (particularly the long-term ones) are captured by ranking systems, this research was focused on predicting the daily spikes in popularity (and therefore price or draft order) by comparing the number of mentions that the player received on Twitter compared to their previous mentions. In doing so, it was demonstrated that improved fantasy baseball predictions can be made through leveraging social media data.

Contributors

Agent

Created

Date Created
  • 2017-05

134710-Thumbnail Image.png

Sentiment Analysis of Public Perception Towards Transgender Rights on Twitter

Description

The fight for equal transgender rights is gaining traction in the public eye, but still has a lot of progress to make in the social and legal spheres. Since public

The fight for equal transgender rights is gaining traction in the public eye, but still has a lot of progress to make in the social and legal spheres. Since public opinion is critical in any civil rights movement, this study attempts to identify the most effective methods to elicit public reactions in support of transgender rights. Topic analysis through Latent Dirichlet Allocation is performed on Twitter data, along with polarity sentiment analysis, to track the subjects which gain the most effective reactions over time. Graphing techniques are used in an attempt to visually display the trends in topics. The topic analysis techniques are effective in identifying the positive and negative trends in the data, but the graphing algorithm lacks the ability to comprehensibly display complex data with more dimensionality.

Contributors

Created

Date Created
  • 2016-12

Learning Users Visual Preferences: Building a Recommendation System for Instagram

Description

Social media users are inundated with information. Especially on Instagram--a social media service based on sharing photos--where for many users, missing important posts is a common issue. By creating a

Social media users are inundated with information. Especially on Instagram--a social media service based on sharing photos--where for many users, missing important posts is a common issue. By creating a recommendation system which learns each user's preference and gives them a curated list of posts, the information overload issue can be mediated in order to enhance the user experience for Instagram users. This paper explores methods for creating such a recommendation system. The proposed method employs a learning model called ``Factorization Machines" which combines the advantages of linear models and latent factor models. In this work I derived features from Instagram post data, including the image, social data about the post, and information about the user who created the post. I also collect user-post interaction data describing which users ``liked" which posts, and this was used in models leveraging latent factors. The proposed model successfully improves the rate of interesting content seen by the user by anywhere from 2 to 12 times.

Contributors

Agent

Created

Date Created
  • 2016-12

133932-Thumbnail Image.png

How Fake News Spreads in the U.S: A Geographic Visualization System for Misinformation

Description

The spread of fake news (rumors) has been a growing problem on the internet in the past few years due to the increase of social media services. People share fake

The spread of fake news (rumors) has been a growing problem on the internet in the past few years due to the increase of social media services. People share fake news articles on social media sometimes without knowing that those articles contain false information. Not knowing whether an article is fake or real is a problem because it causes social media news to lose credibility. Prior research on fake news has focused on how to detect fake news, but efforts towards controlling fake news articles on the internet are still facing challenges. Some of these challenges include; it is hard to collect large sets of fake news data, it is hard to collect locations of people who are spreading fake news, and it is difficult to study the geographic distribution of fake news. To address these challenges, I am examining how fake news spreads in the United States (US) by developing a geographic visualization system for misinformation. I am collecting a set of fake news articles from a website called snopes.com. After collecting these articles I am extracting the keywords from each article and storing them in a file. I then use the stored keywords to search on Twitter in order to find out the locations of users who spread the rumors. Finally, I mark those locations on a map in order to show the geographic distribution of fake news. Having access to large sets of fake news data, knowing the locations of people who are spreading fake news, and being able to understand the geographic distribution of fake news will help in the efforts towards addressing the fake news problem on the internet by providing target areas.

Contributors

Agent

Created

Date Created
  • 2018-05

133143-Thumbnail Image.png

Analysis of BoostOR: A Twitter Bot Detection Classification Algorithm

Description

The prevalence of bots, or automated accounts, on social media is a well-known problem. Some of the ways bots harm social media users include, but are not limited to, spreading

The prevalence of bots, or automated accounts, on social media is a well-known problem. Some of the ways bots harm social media users include, but are not limited to, spreading misinformation, influencing topic discussions, and dispersing harmful links. Bots have affected the field of disaster relief on social media as well. These bots cause problems such as preventing rescuers from determining credible calls for help, spreading fake news and other malicious content, and generating large amounts of content which burdens rescuers attempting to provide aid in the aftermath of disasters. To address these problems, this research seeks to detect bots participating in disaster event related discussions and increase the recall, or number of bots removed from the network, of Twitter bot detection methods. The removal of these bots will also prevent human users from accidentally interacting with these bot accounts and being manipulated by them. To accomplish this goal, an existing bot detection classification algorithm known as BoostOR was employed. BoostOR is an ensemble learning algorithm originally modeled to increase bot detection recall in a dataset and it has the possibility to solve the social media bot dilemma where there may be several different types of bots in the data. BoostOR was first introduced as an adjustment to existing ensemble classifiers to increase recall. However, after testing the BoostOR algorithm on unobserved datasets, results showed that BoostOR does not perform as expected. This study attempts to improve the BoostOR algorithm by comparing it with a baseline classification algorithm, AdaBoost, and then discussing the intentional differences between the two. Additionally, this study presents the main factors which contribute to the shortcomings of the BoostOR algorithm and proposes a solution to improve it. These recommendations should ensure that the BoostOR algorithm can be applied to new and unobserved datasets in the future.

Contributors

Agent

Created

Date Created
  • 2018-12

147700-Thumbnail Image.png

Using Machine Learning Models to Detect Fake News, Bots, and Rumors on Social Media

Description

In this paper, I introduce the fake news problem and detail how it has been exacerbated<br/>through social media. I explore current practices for fake news detection using natural language<br/>processing and

In this paper, I introduce the fake news problem and detail how it has been exacerbated<br/>through social media. I explore current practices for fake news detection using natural language<br/>processing and current benchmarks in ranking the efficacy of various language models. Using a<br/>Twitter-specific benchmark, I attempt to reproduce the scores of six language models<br/>demonstrating their effectiveness in seven tweet classification tasks. I explain the successes and<br/>challenges in reproducing these results and provide analysis for the future implications of fake<br/>news research.

Contributors

Agent

Created

Date Created
  • 2021-05

137682-Thumbnail Image.png

GCKEngine - An Algorithm for Automatic Ontology Building

Description

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships to include in the ontology, can utilize information gathered from Google queries to build a full ontology for a certain domain. We utilized this ontology-building algorithm as part of a larger system to tag computer tutorials for three systems with different kinds of metadata, and index the tagged documents into a search engine. Our evaluation of the resultant search engine indicates that our automatic ontology-building algorithm is able to build relatively good-quality ontologies and utilize this ontology to effectively apply metadata to documents.

Contributors

Agent

Created

Date Created
  • 2013-05

133525-Thumbnail Image.png

An Assessment of the Performance of Machine Learning Techniques When Applied to Trajectory Optimization

Description

Prior research has confirmed that supervised learning is an effective alternative to computationally costly numerical analysis. Motivated by NASA's use of abort scenario matrices to aid in mission operations and

Prior research has confirmed that supervised learning is an effective alternative to computationally costly numerical analysis. Motivated by NASA's use of abort scenario matrices to aid in mission operations and planning, this paper applies supervised learning to trajectory optimization in an effort to assess the accuracy of a less time-consuming method of producing the magnitude of delta-v vectors required to abort from various points along a Near Rectilinear Halo Orbit. Although the utility of the study is limited, the accuracy of the delta-v predictions made by a Gaussian regression model is fairly accurate after a relatively swift computation time, paving the way for more concentrated studies of this nature in the future.

Contributors

Agent

Created

Date Created
  • 2018-05