Matching Items (102)

137682-Thumbnail Image.png

GCKEngine - An Algorithm for Automatic Ontology Building

Description

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships

To facilitate the development of the Semantic Web, we propose in this thesis a general automatic ontology-building algorithm which, given a pool of potential terms and a set of relationships to include in the ontology, can utilize information gathered from Google queries to build a full ontology for a certain domain. We utilized this ontology-building algorithm as part of a larger system to tag computer tutorials for three systems with different kinds of metadata, and index the tagged documents into a search engine. Our evaluation of the resultant search engine indicates that our automatic ontology-building algorithm is able to build relatively good-quality ontologies and utilize this ontology to effectively apply metadata to documents.

Contributors

Agent

Created

Date Created
  • 2013-05

134317-Thumbnail Image.png

Modeling Fantasy Baseball Player Popularity Using Twitter Activity

Description

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such has generated a great deal of activity on Twitter. As fantasy baseball continues to grow in popularity, so does the research into better algorithms for picking players. Most of the research done in this area focuses on improving the prediction of a player's individual performance. However, the crowd-sourcing power afforded by social media may enable more informed predictions about players' performances. Players are chosen by popularity and personal preferences by most amateur gamblers. While some of these trends (particularly the long-term ones) are captured by ranking systems, this research was focused on predicting the daily spikes in popularity (and therefore price or draft order) by comparing the number of mentions that the player received on Twitter compared to their previous mentions. In doing so, it was demonstrated that improved fantasy baseball predictions can be made through leveraging social media data.

Contributors

Agent

Created

Date Created
  • 2017-05

134710-Thumbnail Image.png

Sentiment Analysis of Public Perception Towards Transgender Rights on Twitter

Description

The fight for equal transgender rights is gaining traction in the public eye, but still has a lot of progress to make in the social and legal spheres. Since public

The fight for equal transgender rights is gaining traction in the public eye, but still has a lot of progress to make in the social and legal spheres. Since public opinion is critical in any civil rights movement, this study attempts to identify the most effective methods to elicit public reactions in support of transgender rights. Topic analysis through Latent Dirichlet Allocation is performed on Twitter data, along with polarity sentiment analysis, to track the subjects which gain the most effective reactions over time. Graphing techniques are used in an attempt to visually display the trends in topics. The topic analysis techniques are effective in identifying the positive and negative trends in the data, but the graphing algorithm lacks the ability to comprehensibly display complex data with more dimensionality.

Contributors

Created

Date Created
  • 2016-12

Learning Users Visual Preferences: Building a Recommendation System for Instagram

Description

Social media users are inundated with information. Especially on Instagram--a social media service based on sharing photos--where for many users, missing important posts is a common issue. By creating a

Social media users are inundated with information. Especially on Instagram--a social media service based on sharing photos--where for many users, missing important posts is a common issue. By creating a recommendation system which learns each user's preference and gives them a curated list of posts, the information overload issue can be mediated in order to enhance the user experience for Instagram users. This paper explores methods for creating such a recommendation system. The proposed method employs a learning model called ``Factorization Machines" which combines the advantages of linear models and latent factor models. In this work I derived features from Instagram post data, including the image, social data about the post, and information about the user who created the post. I also collect user-post interaction data describing which users ``liked" which posts, and this was used in models leveraging latent factors. The proposed model successfully improves the rate of interesting content seen by the user by anywhere from 2 to 12 times.

Contributors

Agent

Created

Date Created
  • 2016-12

136409-Thumbnail Image.png

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media.

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

Contributors

Created

Date Created
  • 2015-05

136516-Thumbnail Image.png

Categorizing and Discovering Social Bots

Description

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot detection, we are interested in bots on Twitter that tweet Arabic extremist-like phrases. A testing dataset is collected using the honeypot method, and five different heuristics are measured for their effectiveness in detecting bots. The model underperformed, but we have laid the ground-work for a vastly untapped focus on bot detection: extremist ideal diffusion through bots.

Contributors

Created

Date Created
  • 2015-05

128381-Thumbnail Image.png

Machine learning to predict rapid progression of carotid atherosclerosis in patients with impaired glucose tolerance

Description

Objectives
Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk

Objectives
Prediabetes is a major epidemic and is associated with adverse cardio-cerebrovascular outcomes. Early identification of patients who will develop rapid progression of atherosclerosis could be beneficial for improved risk stratification. In this paper, we investigate important factors impacting the prediction, using several machine learning methods, of rapid progression of carotid intima-media thickness in impaired glucose tolerance (IGT) participants.
Methods
In the Actos Now for Prevention of Diabetes (ACT NOW) study, 382 participants with IGT underwent carotid intima-media thickness (CIMT) ultrasound evaluation at baseline and at 15–18 months, and were divided into rapid progressors (RP, n = 39, 58 ± 17.5 μM change) and non-rapid progressors (NRP, n = 343, 5.8 ± 20 μM change, p < 0.001 versus RP). To deal with complex multi-modal data consisting of demographic, clinical, and laboratory variables, we propose a general data-driven framework to investigate the ACT NOW dataset. In particular, we first employed a Fisher Score-based feature selection method to identify the most effective variables and then proposed a probabilistic Bayes-based learning method for the prediction. Comparison of the methods and factors was conducted using area under the receiver operating characteristic curve (AUC) analyses and Brier score.
Results
The experimental results show that the proposed learning methods performed well in identifying or predicting RP. Among the methods, the performance of Naïve Bayes was the best (AUC 0.797, Brier score 0.085) compared to multilayer perceptron (0.729, 0.086) and random forest (0.642, 0.10). The results also show that feature selection has a significant positive impact on the data prediction performance.
Conclusions
By dealing with multi-modal data, the proposed learning methods show effectiveness in predicting prediabetics at risk for rapid atherosclerosis progression. The proposed framework demonstrated utility in outcome prediction in a typical multidimensional clinical dataset with a relatively small number of subjects, extending the potential utility of machine learning approaches beyond extremely large-scale datasets.

Contributors

Agent

Created

Date Created
  • 2016-09-05

128554-Thumbnail Image.png

Directed Dynamical Influence is More Detectable With Noise

Description

Successful identification of directed dynamical influence in complex systems is relevant to significant problems of current interest. Traditional methods based on Granger causality and transfer entropy have issues such as

Successful identification of directed dynamical influence in complex systems is relevant to significant problems of current interest. Traditional methods based on Granger causality and transfer entropy have issues such as difficulty with nonlinearity and large data requirement. Recently a framework based on nonlinear dynamical analysis was proposed to overcome these difficulties. We find, surprisingly, that noise can counterintuitively enhance the detectability of directed dynamical influence. In fact, intentionally injecting a proper amount of asymmetric noise into the available time series has the unexpected benefit of dramatically increasing confidence in ascertaining the directed dynamical influence in the underlying system. This result is established based on both real data and model time series from nonlinear ecosystems. We develop a physical understanding of the beneficial role of noise in enhancing detection of directed dynamical influence.

Contributors

Agent

Created

Date Created
  • 2016-04-12

129268-Thumbnail Image.png

Addressing the Cold-Start Problem in Location Recommendation Using Geo-Social Correlations

Description

Location-based social networks (LBSNs) have attracted an increasing number of users in recent years, resulting in large amounts of geographical and social data. Such LBSN data provide an unprecedented opportunity

Location-based social networks (LBSNs) have attracted an increasing number of users in recent years, resulting in large amounts of geographical and social data. Such LBSN data provide an unprecedented opportunity to study the human movement from their socio-spatial behavior, in order to improve location-based applications like location recommendation. As users can check-in at new places, traditional work on location prediction that relies on mining a user’s historical moving trajectories fails as it is not designed for the cold-start problem of recommending new check-ins. While previous work on LBSNs attempting to utilize a user’s social connections for location recommendation observed limited help from social network information. In this work, we propose to address the cold-start location recommendation problem by capturing the correlations between social networks and geographical distance on LBSNs with a geo-social correlation model. The experimental results on a real-world LBSN dataset demonstrate that our approach properly models the geo-social correlations of a user’s cold-start check-ins and significantly improves the location recommendation performance.

Contributors

Agent

Created

Date Created
  • 2015-03-01