Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Breaking hash-tag detection algorithm for social media (Twitter)

Description

In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm…

In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm value patterns, breakouts, and spot potential reversals. In my thesis, I hypothesize that the concept of trading volume can be extrapolated to social media (Twitter).

The ubiquity of social media, especially Twitter, in financial market has been overly resonant in the past couple of years. With the growth of its (Twitter) usage by news channels, financial experts and pandits, the global economy does seem to hinge on 140 characters. By analyzing the number of tweets hash tagged to a stock, a strong relation can be established between the number of people talking about it, to the trading volume of the stock.

In my work, I overt this relation and find a state of the breakout when the volume goes beyond a characterized support or resistance level.

ContributorsAwasthi, Piyush (Author) / Davulcu, Hasan (Thesis advisor) / Tong, Hanghang (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2015

Visualization tool for islamic radical and counter radical movements and their online followers in South East Asia

Description

With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand…

With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand them and track their behavior. Using this content of the user, data analysis can be performed to understand their social ideology and affinity towards Radical and Counter-Radical Movements. During the process of expressing their opinions people use hashtags in their messages in Twitter. These hashtags are a rich source of information in understanding the content based relationship between the online users apart from the existing context based follower and friend relationship.

An intelligent visual dash-board system is necessary which can track the activities of the users and diffusion of the online social movements, identify the hot-spots in the users' network, show the geographic foot print of the users and to understand the socio-cultural, economic and political drivers for the relationship among different groups of the users.

ContributorsGaripalli, Sravan Kumar (Author) / Davulcu, Hasan (Thesis advisor) / Shakarian, Paulo (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2015

A timeline extraction approach to derive drug usage patterns in pregnant women using social media

Description

Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on

Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed…

Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on

Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed by classification of the associated text as consisting an ADR or not. Although this method works efficiently for ADR classifications, if ADR evidence is present in users posts over time, drug mentions fail to capture such ADRs. It also fails to record additional user information which may provide an opportunity to perform an in-depth analysis for lifestyle habits and possible reasons for any medical problems.

Pre-market clinical trials for drugs generally do not include pregnant women, and so their effects on pregnancy outcomes are not discovered early. This thesis presents a thorough, alternative strategy for assessing the safety profiles of drugs during pregnancy by utilizing user timelines from social media. I explore the use of a variety of state-of-the-art social media mining techniques, including rule-based and machine learning techniques, to identify pregnant women, monitor their drug usage patterns, categorize their birth outcomes, and attempt to discover associations between drugs and bad birth outcomes.

The technique used models user timelines as longitudinal patient networks, which provide us with a variety of key information about pregnancy, drug usage, and post-

birth reactions. I evaluate the distinct parts of the pipeline separately, validating the usefulness of each step. The approach to use user timelines in this fashion has produced very encouraging results, and can be employed for a range of other important tasks where users/patients are required to be followed over time to derive population-based measures.

ContributorsChandrashekar, Pramod Bharadwaj (Author) / Davulcu, Hasan (Thesis advisor) / Gonzalez, Graciela (Thesis advisor) / Hsiao, Sharon (Committee member) / Arizona State University (Publisher)

Created2016

Detecting Organizational Accounts from Twitter Based on Network and Behavioral Factors

Description

With the rise of Online Social Networks (OSN) in the last decade, social network analysis has become a crucial research topic. The OSN graphs have unique properties that distinguish them from other types of graphs. In this thesis, five month Tweet corpus collected from Bangladesh - between June 2016 and…

With the rise of Online Social Networks (OSN) in the last decade, social network analysis has become a crucial research topic. The OSN graphs have unique properties that distinguish them from other types of graphs. In this thesis, five month Tweet corpus collected from Bangladesh - between June 2016 and October 2016 is analyzed, in order to detect accounts that belong to groups. These groups consist of official and non-official twitter handles of political organizations and NGOs in Bangladesh. A set of network, temporal, spatial and behavioral features are proposed to discriminate between accounts belonging to individual twitter users, news, groups and organization leaders. Finally, the experimental results are presented and a subset of relevant features is identified that lead to a generalizable model. Detection of tiny number of groups from large network is achieved with 0.8 precision, 0.75 recall and 0.77 F1 score. The domain independent network and behavioral features and models developed here are suitable for solving twitter account classification problem in any context.

ContributorsGore, Chinmay Chandrashekhar (Author) / Davulcu, Hasan (Thesis advisor) / Hsiao, Ihan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2017

Machine learning methods for high-dimensional imbalanced biomedical data

Description

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect…

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance. In this thesis, I focus on developing learning methods for the high-dimensional imbalanced biomedical data. In the first part, a sparse canonical correlation analysis (CCA) method is presented. The penalty terms is used to control the sparsity of the projection matrices of CCA. The sparse CCA method is then applied to find patterns among biomedical data sets and labels, or to find patterns among different data sources. In the second part, I discuss several learning problems for imbalanced biomedical data. Note that traditional learning systems are often biased when the biomedical data are imbalanced. Therefore, traditional evaluations such as accuracy may be inappropriate for such cases. I then discuss several alternative evaluation criteria to evaluate the learning performance. For imbalanced binary classification problems, I use the undersampling based classifiers ensemble (UEM) strategy to obtain accurate models for both classes of samples. A small sphere and large margin (SSLM) approach is also presented to detect rare abnormal samples from a large number of subjects. In addition, I apply multiple feature selection and clustering methods to deal with high-dimensional data and data with highly correlated features. Experiments on high-dimensional imbalanced biomedical data are presented which illustrate the effectiveness and efficiency of my methods.

ContributorsYang, Tao (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

BioEve: user interface framework bridging IE and IR

Description

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to…

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many systems aid in navigating through this wide collection of documents, the vastness and depth of this information overload can be overwhelming. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also facilitate discovery of the unknown information implicitly conveyed in the texts. This thesis presents the different approaches used for large scale biomedical named entity recognition, and the challenges faced in each. It also proposes BioEve: an integrative framework to fuse a faceted search with information extraction to provide a search service that addresses the user's desire for "completeness" of the query results, not just the top-ranked ones. This information extraction system enables discovery of important semantic relationships between entities such as genes, diseases, drugs, and cell lines and events from biomedical text on MEDLINE, which is the largest publicly available database of the world's biomedical journal literature. It is an innovative search and discovery service that makes it easier to search
avigate and discover knowledge hidden in life sciences literature. To demonstrate the utility of this system, this thesis also details a prototype enterprise quality search and discovery service that helps researchers with a guided step-by-step query refinement, by suggesting concepts enriched in intermediate results, and thereby facilitating the "discover more as you search" paradigm.

ContributorsKanwar, Pradeep (Author) / Davulcu, Hasan (Thesis advisor) / Dinu, Valentin (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2010

Hidden Fear: Evaluating the Effectiveness of Messages on Social Media

Description

The development of the internet provided new means for people to communicate effectively and share their ideas. There has been a decline in the consumption of newspapers and traditional broadcasting media toward online social mediums in recent years. Social media has been introduced as a new way of increasing democratic…

The development of the internet provided new means for people to communicate effectively and share their ideas. There has been a decline in the consumption of newspapers and traditional broadcasting media toward online social mediums in recent years. Social media has been introduced as a new way of increasing democratic discussions on political and social matters. Among social media, Twitter is widely used by politicians, government officials, communities, and parties to make announcements and reach their voice to their followers. This greatly increases the acceptance domain of the medium.

The usage of social media during social and political campaigns has been the subject of a lot of social science studies including the Occupy Wall Street movement, The Arab Spring, the United States (US) election, more recently The Brexit campaign. The wide

spread usage of social media in this space and the active participation of people in the discussions on social media made this communication channel a suitable place for spreading propaganda to alter public opinion.

An interesting feature of twitter is the feasibility of which bots can be programmed to operate on this platform. Social media bots are automated agents engineered to emulate the activity of a human being by tweeting some specific content, replying to users, magnifying certain topics by retweeting them. Network on these bots is called botnets and describing the collaboration of connected computers with programs that communicates across multiple devices to perform some task.

In this thesis, I will study how bots can influence the opinion, finding which parameters are playing a role in shrinking or coalescing the communities, and finally logically proving the effectiveness of each of the hypotheses.

ContributorsAhmadi, Mohsen (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by