Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Description

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with…

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context of an orphaned tweet, this thesis aims at building a hashtag recommendation system called TweetSense, to suggest hashtags as a context or metadata for the orphaned tweets. This in turn would increase user's social engagement and impact Twitter to maintain its monthly active online users in its social network. In contrast to other existing systems, this hashtag recommendation system recommends personalized hashtags by exploiting the social signals of users in Twitter. The novelty with this system is that it emphasizes on selecting the suitable candidate set of hashtags from the related tweets of user's social graph (timeline).The system then rank them based on the combination of features scores computed from their tweet and user related features. It is evaluated based on its ability to predict suitable hashtags for a random sample of tweets whose existing hashtags are deliberately removed for evaluation. I present a detailed internal empirical evaluation of TweetSense, as well as an external evaluation in comparison with current state of the art method.

ContributorsVijayakumar, Manikandan (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2014

An activity theoretical analysis of microblogging and blogging by Spanish L2 learners in a bridging activities framework

Description

The use of blogging tools in the second language classroom has been investigated from a variety of theoretical and methodological perspectives (Alm, 2009; Armstrong & Retterer, 2008; Dippold, 2009; Ducate & Lomicka, 2008; Elola & Oskoz, 2008; Jauregi & Banados, 2008; Lee, 2009; Petersen, Divitini, & Chabert, 2008; Pinkman, 2005;…

The use of blogging tools in the second language classroom has been investigated from a variety of theoretical and methodological perspectives (Alm, 2009; Armstrong & Retterer, 2008; Dippold, 2009; Ducate & Lomicka, 2008; Elola & Oskoz, 2008; Jauregi & Banados, 2008; Lee, 2009; Petersen, Divitini, & Chabert, 2008; Pinkman, 2005; Raith, 2009; Soares, 2008; Sun, 2009, 2012; Vurdien, 2011; Yang, 2009) and a growing number of studies examine the use of microblogging tools for language learning (Antenos-Conforti, 2009; Borau, Ullrich, Feng, & Shen, 2009; Lomicka & Lord, 2011; Perifanou, 2009). Grounded in Cultural Historical Activity Theory (Engestrom, 1987), the present study explores the outcomes of a semester-long project based on the Bridging Activities framework (Thorne & Reinhardt, 2008) and implemented in an intermediate hybrid Spanish-language course at a large public university in Arizona, in which students used microblogging and blogging tools to collect digital texts, analyze perspectives of the target culture, and participate as part of an online community of language learners with a broader audience of native speakers. The research questions are: (1) What technology is used by the students, with what frequency and for what purposes in both English and Spanish prior to beginning the project?, (2) What are students' values and attitudes toward using Twitter and Blogger as tools for learning Spanish and how do they change over time through their use in the project during the semester course?, and (3) What tensions emerge in the activity systems of the intermediate Spanish-language students throughout the process of using Twitter and Blogger for the project? What are the underlying reasons for the tensions? How are they resolved? The data was collected using pre-, post-, and periodic surveys, which included Likert and open-ended questions, as well as the participants' microblog and blog posts. The quantitative data was analyzed using descriptive statistics and the qualitative data was analyzed to identify emerging themes following the Constant Comparative Method (Glaser & Strauss, 1967). Finally, three participant outliers were selected as case studies for activity theoretical analysis in order to identify tensions and, through their resolution, evidence of expansive learning.

ContributorsAlvarado, Margaret (Author) / Lafford, Barbara (Thesis advisor) / González, Verónica (Committee member) / Cerron-Palomino, Alvaro (Committee member) / Arizona State University (Publisher)

Created2015

Event analytics on social media: challenges and solutions

Description

Social media platforms such as Twitter, Facebook, and blogs have emerged as valuable

- in fact, the de facto - virtual town halls for people to discover, report, share and

communicate with others about various types of events. These events range from

widely-known events such as the U.S Presidential debate to smaller scale,…

Social media platforms such as Twitter, Facebook, and blogs have emerged as valuable

- in fact, the de facto - virtual town halls for people to discover, report, share and

communicate with others about various types of events. These events range from

widely-known events such as the U.S Presidential debate to smaller scale, local events

such as a local Halloween block party. During these events, we often witness a large

amount of commentary contributed by crowds on social media. This burst of social

media responses surges with the "second-screen" behavior and greatly enriches the

user experience when interacting with the event and people's awareness of an event.

Monitoring and analyzing this rich and continuous flow of user-generated content can

yield unprecedentedly valuable information about the event, since these responses

usually offer far more rich and powerful views about the event that mainstream news

simply could not achieve. Despite these benefits, social media also tends to be noisy,

chaotic, and overwhelming, posing challenges to users in seeking and distilling high

quality content from that noise.

In this dissertation, I explore ways to leverage social media as a source of information and analyze events based on their social media responses collectively. I develop, implement and evaluate EventRadar, an event analysis toolbox which is able to identify, enrich, and characterize events using the massive amounts of social media responses. EventRadar contains three automated, scalable tools to handle three core event analysis tasks: Event Characterization, Event Recognition, and Event Enrichment. More specifically, I develop ET-LDA, a Bayesian model and SocSent, a matrix factorization framework for handling the Event Characterization task, i.e., modeling characterizing an event in terms of its topics and its audience's response behavior (via ET-LDA), and the sentiments regarding its topics (via SocSent). I also develop DeMa, an unsupervised event detection algorithm for handling the Event Recognition task, i.e., detecting trending events from a stream of noisy social media posts. Last, I develop CrowdX, a spatial crowdsourcing system for handling the Event Enrichment task, i.e., gathering additional first hand information (e.g., photos) from the field to enrich the given event's context.

Enabled by EventRadar, it is more feasible to uncover patterns that have not been

explored previously and re-validating existing social theories with new evidence. As a

result, I am able to gain deep insights into how people respond to the event that they

are engaged in. The results reveal several key insights into people's various responding

behavior over the event's timeline such the topical context of people's tweets does not

always correlate with the timeline of the event. In addition, I also explore the factors

that affect a person's engagement with real-world events on Twitter and find that

people engage in an event because they are interested in the topics pertaining to

that event; and while engaging, their engagement is largely affected by their friends'

behavior.

ContributorsHu, Yuheng (Author) / Kambhampati, Subbarao (Thesis advisor) / Horvitz, Eric (Committee member) / Krumm, John (Committee member) / Liu, Huan (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2014

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

Description

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the…

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the popular public opinion about these events. So it would be useful to annotate the transcript with tweets. The technical challenge is to align the tweets with the correct segment of the transcript. ET-LDA by Hu et al [9] addresses this issue by modeling the whole process with an LDA-based graphical model. The system segments the transcript into coherent and meaningful parts and also determines if a tweet is a general tweet about the event or it refers to a particular segment of the transcript. One characteristic of the Hu et al’s model is that it expects all the data to be available upfront and uses batch inference procedure. But in many cases we find that data is not available beforehand, and it is often streaming. In such cases it is infeasible to repeatedly run the batch inference algorithm. My thesis presents an online inference algorithm for the ET-LDA model, with a continuous stream of tweet data and compare their runtime and performance to existing algorithms.

ContributorsAcharya, Anirudh (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2015

Improved, scalable, and personalized context recovery system: E-TweetSense

Description

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who…

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who has chosen to subscribe to the tweets from the originator by following the originator. Although, hashtags are used to tag tweets in an effort to attach context to the tweets, many tweets do not have a hashtag. Such tweets are called orphan tweets and they adversely affect the experience of a browser.

A hashtag is a type of label or meta-data tag used in social networks and micro-blogging services which makes it easier for users to find messages with a specific theme or content. The context of a tweet can be defined as a set of one or more hashtags. Users often do not use hashtags to tag their tweets. This leads to the problem of missing context for tweets. To address the problem of missing hashtags, a statistical method was proposed which predicts most likely hashtags based on the social circle of an originator.

In this thesis, we propose to improve on the existing context recovery system by selectively limiting the candidate set of hashtags to be derived from the intimate circle of the originator rather than from every user in the social network of the originator. This helps in reducing the computation, increasing speed of prediction, scaling the system to originators with large social networks while still preserving most of the accuracy of the predictions. We also propose to not only derive the candidate hashtags from the social network of the originator but also derive the candidate hashtags based on the content of the tweet. We further propose to learn personalized statistical models according to the adoption patterns of different originators. This helps in not only identifying the personalized candidate set of hashtags based on the social circle and content of the tweets but also in customizing the hashtag adoption pattern to the originator of the tweet.

ContributorsMallapura Umamaheshwar, Tejas (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2015

Secure and privacy-preserving microblogging services: attacks and defenses

Description

Microblogging services such as Twitter, Sina Weibo, and Tumblr have been emerging and deeply embedded into people's daily lives. Used by hundreds of millions of users to connect the people worldwide and share and access information in real-time, the microblogging service has also became the target of malicious attackers due…

Microblogging services such as Twitter, Sina Weibo, and Tumblr have been emerging and deeply embedded into people's daily lives. Used by hundreds of millions of users to connect the people worldwide and share and access information in real-time, the microblogging service has also became the target of malicious attackers due to its massive user engagement and structural openness. Although existed, little is still known in the community about new types of vulnerabilities in current microblogging services which could be leveraged by the intelligence-evolving attackers, and more importantly, the corresponding defenses that could prevent both the users and the microblogging service providers from being attacked. This dissertation aims to uncover a number of challenging security and privacy issues in microblogging services and also propose corresponding defenses.

This dissertation makes fivefold contributions. The first part presents the social botnet, a group of collaborative social bots under the control of a single botmaster, demonstrate the effectiveness and advantages of exploiting a social botnet for spam distribution and digital-influence manipulation, and propose the corresponding countermeasures and evaluate their effectiveness. Inspired by Pagerank, the second part describes TrueTop, the first sybil-resilient system to find the top-K influential users in microblogging services with very accurate results and strong resilience to sybil attacks. TrueTop has been implemented to handle millions of nodes and 100 times more edges on commodity computers. The third and fourth part demonstrate that microblogging systems' structural openness and users' carelessness could disclose the later's sensitive information such as home city and age. LocInfer, a novel and lightweight system, is presented to uncover the majority of the users in any metropolitan area; the dissertation also proposes MAIF, a novel machine learning framework that leverages public content and interaction information in microblogging services to infer users' hidden ages. Finally, the dissertation proposes the first privacy-preserving social media publishing framework to let the microblogging service providers publish their data to any third-party without disclosing users' privacy and meanwhile meeting the data's commercial utilities. This dissertation sheds the light on the state-of-the-art security and privacy issues in the microblogging services.

ContributorsZhang, Jinxue (Author) / Zhang, Yanchao (Thesis advisor) / Zhang, Junshan (Committee member) / Ying, Lei (Committee member) / Ahn, Gail-Joon (Committee member) / Arizona State University (Publisher)

Created2016

Directional prediction of stock prices using breaking news on Twitter

Description

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based…

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based on news content. Next, I proceed to show that information in articles indicated by breaking Tweet volumes leads to a statistically significant boost in the hourly directional prediction accuracies for the DJI stock prices mentioned in these articles. Secondly, I show that using document-level sentiment extraction does not yield a statistically significant boost in the directional predictive accuracies in the presence of other 1-gram keyword features. Thirdly I test the performance of the system on several time-frames and identify the 4 hour time-frame for both the price charts and for Tweet breakout detection as the best time-frame combination. Finally, I develop a set of price momentum based trade exit rules to cut losing trades early and to allow the winning trades run longer. I show that the Tweet volume breakout based trading system with the price momentum based exit rules not only improves the winning accuracy and the return on investment, but it also lowers the maximum drawdown and achieves the highest overall return over maximum drawdown.

ContributorsAlostad, Hana (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steven (Committee member) / Tong, Hanghang (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)

Created2016

Feature selection techniques for effective model building and estimation on Twitter data to understand the political scenario in Latvia with supporting visualizations

Description

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been…

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been going on in this field to understand

the interactions of various people in the society in response to actions taken by their

organizations.

This paper talks about understanding the Russian influence on people in Latvia.

This is done by building an eeffective model learnt on initial set of documents

containing a combination of official party web-pages, important political leaders' social

networking sites. Since twitter is a micro-blogging site which allows people to post

their opinions on any topic, the model built is used for estimating the tweets sup-

porting the Russian and Latvian political organizations in Latvia. All the documents

collected for analysis are in Latvian and Russian languages which are rich in vocabulary resulting into huge number of features. Hence, feature selection techniques can

be used to reduce the vocabulary set relevant to the classification model. This thesis

provides a comparative analysis of traditional feature selection techniques and implementation of a new iterative feature selection method using EM and cross-domain

training along with supportive visualization tool. This method out performed other

feature selection methods by reducing the number of features up-to 50% along with

good model accuracy. The results from the classification are used to interpret user

behavior and their political influence patterns across organizations in Latvia using

interactive dashboard with combination of powerful widgets.

ContributorsBollapragada, Lakshmi Gayatri Niharika (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2016

Querying for relevant people in online social networks

Description

Online social networks, including Twitter, have expanded in both scale and diversity of content, which has created significant challenges to the average user. These challenges include finding relevant information on a topic and building social ties with like-minded individuals. The fundamental question addressed by this thesis is if an individual…

Online social networks, including Twitter, have expanded in both scale and diversity of content, which has created significant challenges to the average user. These challenges include finding relevant information on a topic and building social ties with like-minded individuals. The fundamental question addressed by this thesis is if an individual can leverage social network to search for information that is relevant to him or her. We propose to answer this question by developing computational algorithms that analyze a user's social network. The features of the social network we analyze include the network topology and member communications of a specific user's social network. Determining the "social value" of one's contacts is a valuable outcome of this research. The algorithms we developed were tested on Twitter, which is an extremely popular social network. Twitter was chosen due to its popularity and a majority of the communications artifacts on Twitter is publically available. In this work, the social network of a user refers to the "following relationship" social network. Our algorithm is not specific to Twitter, and is applicable to other social networks, where the network topology and communications are accessible. My approaches are as follows. For a user interested in using the system, I first determine the immediate social network of the user as well as the social contacts for each person in this network. Afterwards, I establish and extend the social network for each user. For each member of the social network, their tweet data are analyzed and represented by using a word distribution. To accomplish this, I use WordNet, a popular lexical database, to determine semantic similarity between two words. My mechanism of search combines both communication distance between two users and social relationships to determine the search results. Additionally, I developed a search interface, where a user can interactively query the system. I conducted preliminary user study to evaluate the quality and utility of my method and system against several baseline methods, including the default Twitter search. The experimental results from the user study indicate that my method is able to find relevant people and identify valuable contacts in one's social circle based on the query. The proposed system outperforms baseline methods in terms of standard information retrieval metrics.

ContributorsXu, Ke (Author) / Sundaram, Hari (Thesis advisor) / Ye, Jieping (Committee member) / Kelliher, Aisling (Committee member) / Arizona State University (Publisher)

Created2010