Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Techniques for supporting prediction of security breaches in critical cloud infrastructures using Bayesian network and Markov decision process

Description

Emerging trends in cyber system security breaches in critical cloud infrastructures show that attackers have abundant resources (human and computing power), expertise and support of large organizations and possible foreign governments. In order to greatly improve the protection of critical cloud infrastructures, incorporation of human behavior is needed to predict…

Emerging trends in cyber system security breaches in critical cloud infrastructures show that attackers have abundant resources (human and computing power), expertise and support of large organizations and possible foreign governments. In order to greatly improve the protection of critical cloud infrastructures, incorporation of human behavior is needed to predict potential security breaches in critical cloud infrastructures. To achieve such prediction, it is envisioned to develop a probabilistic modeling approach with the capability of accurately capturing system-wide causal relationship among the observed operational behaviors in the critical cloud infrastructure and accurately capturing probabilistic human (users’) behaviors on subsystems as the subsystems are directly interacting with humans. In our conceptual approach, the system-wide causal relationship can be captured by the Bayesian network, and the probabilistic human behavior in the subsystems can be captured by the Markov Decision Processes. The interactions between the dynamically changing state graphs of Markov Decision Processes and the dynamic causal relationships in Bayesian network are key components in such probabilistic modelling applications. In this thesis, two techniques are presented for supporting the above vision to prediction of potential security breaches in critical cloud infrastructures. The first technique is for evaluation of the conformance of the Bayesian network with the multiple MDPs. The second technique is to evaluate the dynamically changing Bayesian network structure for conformance with the rules of the Bayesian network using a graph checker algorithm. A case study and its simulation are presented to show how the two techniques support the specific parts in our conceptual approach to predicting system-wide security breaches in critical cloud infrastructures.

ContributorsNagaraja, Vinjith (Author) / Yau, Stephen S. (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2015

Feature selection techniques for effective model building and estimation on Twitter data to understand the political scenario in Latvia with supporting visualizations

Description

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been…

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been going on in this field to understand

the interactions of various people in the society in response to actions taken by their

organizations.

This paper talks about understanding the Russian influence on people in Latvia.

This is done by building an eeffective model learnt on initial set of documents

containing a combination of official party web-pages, important political leaders' social

networking sites. Since twitter is a micro-blogging site which allows people to post

their opinions on any topic, the model built is used for estimating the tweets sup-

porting the Russian and Latvian political organizations in Latvia. All the documents

collected for analysis are in Latvian and Russian languages which are rich in vocabulary resulting into huge number of features. Hence, feature selection techniques can

be used to reduce the vocabulary set relevant to the classification model. This thesis

provides a comparative analysis of traditional feature selection techniques and implementation of a new iterative feature selection method using EM and cross-domain

training along with supportive visualization tool. This method out performed other

feature selection methods by reducing the number of features up-to 50% along with

good model accuracy. The results from the classification are used to interpret user

behavior and their political influence patterns across organizations in Latvia using

interactive dashboard with combination of powerful widgets.

ContributorsBollapragada, Lakshmi Gayatri Niharika (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2016

Directional prediction of stock prices using breaking news on Twitter

Description

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based…

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based on news content. Next, I proceed to show that information in articles indicated by breaking Tweet volumes leads to a statistically significant boost in the hourly directional prediction accuracies for the DJI stock prices mentioned in these articles. Secondly, I show that using document-level sentiment extraction does not yield a statistically significant boost in the directional predictive accuracies in the presence of other 1-gram keyword features. Thirdly I test the performance of the system on several time-frames and identify the 4 hour time-frame for both the price charts and for Tweet breakout detection as the best time-frame combination. Finally, I develop a set of price momentum based trade exit rules to cut losing trades early and to allow the winning trades run longer. I show that the Tweet volume breakout based trading system with the price momentum based exit rules not only improves the winning accuracy and the return on investment, but it also lowers the maximum drawdown and achieves the highest overall return over maximum drawdown.

ContributorsAlostad, Hana (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steven (Committee member) / Tong, Hanghang (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)

Created2016

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

Description

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the…

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the popular public opinion about these events. So it would be useful to annotate the transcript with tweets. The technical challenge is to align the tweets with the correct segment of the transcript. ET-LDA by Hu et al [9] addresses this issue by modeling the whole process with an LDA-based graphical model. The system segments the transcript into coherent and meaningful parts and also determines if a tweet is a general tweet about the event or it refers to a particular segment of the transcript. One characteristic of the Hu et al’s model is that it expects all the data to be available upfront and uses batch inference procedure. But in many cases we find that data is not available beforehand, and it is often streaming. In such cases it is infeasible to repeatedly run the batch inference algorithm. My thesis presents an online inference algorithm for the ET-LDA model, with a continuous stream of tweet data and compare their runtime and performance to existing algorithms.

ContributorsAcharya, Anirudh (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2015

Secure Mobile SDN

Description

The increasing usage of smart-phones and mobile devices in work environment and IT

industry has brought about unique set of challenges and opportunities. ARM architecture

in particular has evolved to a point where it supports implementations across wide spectrum

of performance points and ARM based tablets and smart-phones are in demand. The

enhancements to…

The increasing usage of smart-phones and mobile devices in work environment and IT

industry has brought about unique set of challenges and opportunities. ARM architecture

in particular has evolved to a point where it supports implementations across wide spectrum

of performance points and ARM based tablets and smart-phones are in demand. The

enhancements to basic ARM RISC architecture allow ARM to have high performance,

small code size, low power consumption and small silicon area. Users want their devices to

perform many tasks such as read email, play games, and run other online applications and

organizations no longer desire to provision and maintain individual’s IT equipment. The

term BYOD (Bring Your Own Device) has come into being from demand of such a work

setup and is one of the motivation of this research work. It brings many opportunities such

as increased productivity and reduced costs and challenges such as secured data access,

data leakage and amount of control by the organization.

To provision such a framework we need to bridge the gap from both organizations side

and individuals point of view. Mobile device users face issue of application delivery on

multiple platforms. For instance having purchased many applications from one proprietary

application store, individuals may want to move them to a different platform/device but

currently this is not possible. Organizations face security issues in providing such a solution

as there are many potential threats from allowing BYOD work-style such as unauthorized

access to data, attacks from the devices within and outside the network.

ARM based Secure Mobile SDN framework will resolve these issues and enable employees

to consolidate both personal and business calls and mobile data access on a single device.

To address application delivery issue we are introducing KVM based virtualization that

will allow host OS to run multiple guest OS. To address the security problem we introduce

SDN environment where host would be running bridged network of guest OS using Open

vSwitch . This would allow a remote controller to monitor the state of guest OS for making

important control and traffic flow decisions based on the situation.

ContributorsChowdhary, Ankur (Author) / Huang, Dijiang (Thesis advisor) / Tong, Hanghang (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2015

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Description

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with…

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context of an orphaned tweet, this thesis aims at building a hashtag recommendation system called TweetSense, to suggest hashtags as a context or metadata for the orphaned tweets. This in turn would increase user's social engagement and impact Twitter to maintain its monthly active online users in its social network. In contrast to other existing systems, this hashtag recommendation system recommends personalized hashtags by exploiting the social signals of users in Twitter. The novelty with this system is that it emphasizes on selecting the suitable candidate set of hashtags from the related tweets of user's social graph (timeline).The system then rank them based on the combination of features scores computed from their tweet and user related features. It is evaluated based on its ability to predict suitable hashtags for a random sample of tweets whose existing hashtags are deliberately removed for evaluation. I present a detailed internal empirical evaluation of TweetSense, as well as an external evaluation in comparison with current state of the art method.

ContributorsVijayakumar, Manikandan (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2014

Improved, scalable, and personalized context recovery system: E-TweetSense

Description

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who…

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who has chosen to subscribe to the tweets from the originator by following the originator. Although, hashtags are used to tag tweets in an effort to attach context to the tweets, many tweets do not have a hashtag. Such tweets are called orphan tweets and they adversely affect the experience of a browser.

A hashtag is a type of label or meta-data tag used in social networks and micro-blogging services which makes it easier for users to find messages with a specific theme or content. The context of a tweet can be defined as a set of one or more hashtags. Users often do not use hashtags to tag their tweets. This leads to the problem of missing context for tweets. To address the problem of missing hashtags, a statistical method was proposed which predicts most likely hashtags based on the social circle of an originator.

In this thesis, we propose to improve on the existing context recovery system by selectively limiting the candidate set of hashtags to be derived from the intimate circle of the originator rather than from every user in the social network of the originator. This helps in reducing the computation, increasing speed of prediction, scaling the system to originators with large social networks while still preserving most of the accuracy of the predictions. We also propose to not only derive the candidate hashtags from the social network of the originator but also derive the candidate hashtags based on the content of the tweet. We further propose to learn personalized statistical models according to the adoption patterns of different originators. This helps in not only identifying the personalized candidate set of hashtags based on the social circle and content of the tweets but also in customizing the hashtag adoption pattern to the originator of the tweet.

ContributorsMallapura Umamaheshwar, Tejas (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2015

ASU Electronic Theses and Dissertations

Filtering by

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Techniques for supporting prediction of security breaches in critical cloud infrastructures using Bayesian network and Markov decision process

Feature selection techniques for effective model building and estimation on Twitter data to understand the political scenario in Latvia with supporting visualizations

Directional prediction of stock prices using breaking news on Twitter

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

Secure Mobile SDN

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Improved, scalable, and personalized context recovery system: E-TweetSense