Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Thai English as a variety

Description

This study is about Thai English (ThaiE), a variety of World Englishes that is presently spoken in Thailand, as the result of the spread of English and the recent Thai government policies towards English communication in Thailand. In the study, I examined the linguistic data of spoken ThaiE, collected from…

This study is about Thai English (ThaiE), a variety of World Englishes that is presently spoken in Thailand, as the result of the spread of English and the recent Thai government policies towards English communication in Thailand. In the study, I examined the linguistic data of spoken ThaiE, collected from multiple sources both in the U.S.A. and Thailand. The study made use of a qualitative approach in examining the data, which were from (i) English interviews and questionnaires with 12 highly educated Thai speakers of English during my fieldwork in the Southwestern U.S.A., Central Thailand, and Northeastern Thailand, (ii) English speech samples from the media in Thailand, i.e. television programs, a news report, and a talk radio program, and (iii) the research articles on English used by Thai speakers of English. This study describes the typology of ThaiE in terms of its morpho-syntax, phonology, and sociolinguistics, with the main focus being placed on the structural characteristics of ThaiE. Based on the data, the results show that some of the ThaiE features are similar to the World Englishes features, but some are unique to ThaiE. Therefore, I argue that ThaiE is structurally considered a new variety of World Englishes at the present time. The findings also showed an interesting result, regarding the notion of ThaiE by the fieldwork interview participants. The majority of these participants (n=6) denied the existence of ThaiE, while the minority of the participants (n=5) believed ThaiE existed, and one participant was reluctant to give the answer. The study suggested that the participants' academic backgrounds, the unfamiliar notion of ThaiE, and the level of the participants' social interaction with everyday persons may have influenced their answers to the main research question.

ContributorsRogers, Uthairat (Author) / Gelderen, Elly van (Thesis advisor) / Mailhammer, Robert (Committee member) / Adams, Karen (Committee member) / Arizona State University (Publisher)

Created2013

Reanalysis of OE hwæðer in the left periphery

Description

Despite the vast research on language carried out by the generative linguistics of Noam Chomsky and his followers since the 1950s, for theoretical reasons (mainly their attention to the mental abstraction of language structure rather than language as a performed product), historical linguistics from the start lay outside their research…

Despite the vast research on language carried out by the generative linguistics of Noam Chomsky and his followers since the 1950s, for theoretical reasons (mainly their attention to the mental abstraction of language structure rather than language as a performed product), historical linguistics from the start lay outside their research interest. This study is an attempt to bridge the gap between the formalism and theoretical constructs introduced by generative grammar, whose ultimate goal is to provide not only a description but also an explanation to linguistic phenomena, and historical linguistics, which studies the evolution of language over time. This main objective is met by providing a formal account of the changes hwæðer undergoes throughout the Old English (OE) period. This seemingly inconspicuous word presents itself as a case of particular investigative interest in that it reflects the different stages proclaimed by the theoretical assumptions implemented in the study, namely the economy principles responsible for what has become known as the CP cycle: the Head Preference Principle and the Late Merge Principle, whereby pronominal hwæðer would raise to the specifier position for topicalization purposes, then after frequent use in that position, it would be base-generated there under Late Merge, until later reanalysis as the head of the Complementizer Phrase (CP) under Head Preference. Thus, I set out to classify the diverse functions of OE hwæðer by identifying and analyzing all instances as recorded in the diachronic part of the Helsinki Corpus. Both quantitative and qualitative analyses of the data have rendered the following results: 1) a fully satisfactory functional and chronological classification has been obtained by analyzing the data under investigation following a formal theoretical approach; and 2) a step-by-step historical analysis proves to be indispensable for understanding how language works at the abstract level from a historical point of view. This project is part of a growing body of research on language change which attempts to describe and explain the evolution of certain words as these change in form and function.

ContributorsParra-Guinaldo, Víctor (Author) / Gelderen, Elly van (Thesis advisor) / Bjork, Robert (Committee member) / Nilsen, Don L. F. (Committee member) / Arizona State University (Publisher)

Created2013

Connecting users with similar interests for group understanding

Description

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our…

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our connections and the expansion of our social networks easier. The aggregation of people who share common interests forms social groups, which are fundamental parts of our social lives. Social behavioral analysis at a group level is an active research area and attracts many interests from the industry. Challenges of my work mainly arise from the scale and complexity of user generated behavioral data. The multiple types of interactions, highly dynamic nature of social networking and the volatile user behavior suggest that these data are complex and big in general. Effective and efficient approaches are required to analyze and interpret such data. My work provide effective channels to help connect the like-minded and, furthermore, understand user behavior at a group level. The contributions of this dissertation are in threefold: (1) proposing novel representation of collective tagging knowledge via tag networks; (2) proposing the new information spreader identification problem in egocentric soical networks; (3) defining group profiling as a systematic approach to understanding social groups. In sum, the research proposes novel concepts and approaches for connecting the like-minded, enables the understanding of user groups, and exposes interesting research opportunities.

ContributorsWang, Xufei (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

The subjectification of English adjectives, and the effect of subjectivity on prenominal adjective order

Description

Linguistic subjectivity and subjectification are fields of research that are relatively new to those working in English linguistics. After a discussion of linguistic subjectivity and subjectification as they relate to English, I investigate the subjectification of a specific English adjective, and how its usage has changed over time. Subjectivity is…

Linguistic subjectivity and subjectification are fields of research that are relatively new to those working in English linguistics. After a discussion of linguistic subjectivity and subjectification as they relate to English, I investigate the subjectification of a specific English adjective, and how its usage has changed over time. Subjectivity is held by many linguists of today to be the major governing factor behind the ordering of English prenominal adjectives. Through the use of a questionnaire, I investigate the effect of subjectivity on English prenominal adjective order from the perspective of the native English speaker. I then discuss the results of the questionnaire, what they mean in relation to how subjectivity affects that order, and a few of the patterns that emerged as I analyzed the data.

ContributorsSkarstedt, Luke (Author) / Gelderen, Elly van (Thesis advisor) / Bjork, Robert (Committee member) / Adams, Karen (Committee member) / Arizona State University (Publisher)

Created2013

When is temporal planning really temporal

Description

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns…

In this dissertation I develop a deep theory of temporal planning well-suited to analyzing, understanding, and improving the state of the art implementations (as of 2012). At face-value the work is strictly theoretical; nonetheless its impact is entirely real and practical. The easiest portion of that impact to highlight concerns the notable improvements to the format of the temporal fragment of the International Planning Competitions (IPCs). Particularly: the theory I expound upon here is the primary cause of--and justification for--the altered (i) selection of benchmark problems, and (ii) notion of "winning temporal planner". For higher level motivation: robotics, web service composition, industrial manufacturing, business process management, cybersecurity, space exploration, deep ocean exploration, and logistics all benefit from applying domain-independent automated planning technique. Naturally, actually carrying out such case studies has much to offer. For example, we may extract the lesson that reasoning carefully about deadlines is rather crucial to planning in practice. More generally, effectively automating specifically temporal planning is well-motivated from applications. Entirely abstractly, the aim is to improve the theory of automated temporal planning by distilling from its practice. My thesis is that the key feature of computational interest is concurrency. To support, I demonstrate by way of compilation methods, worst-case counting arguments, and analysis of algorithmic properties such as completeness that the more immediately pressing computational obstacles (facing would-be temporal generalizations of classical planning systems) can be dealt with in theoretically efficient manner. So more accurately the technical contribution here is to demonstrate: The computationally significant obstacle to automated temporal planning that remains is just concurrency.

ContributorsCushing, William Albemarle (Author) / Kambhampati, Subbarao (Thesis advisor) / Weld, Daniel S. (Committee member) / Smith, David E. (Committee member) / Baral, Chitta (Committee member) / Davalcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2012

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Shibboleth: an automated foreign accent identification program

Description

The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their native language.…

The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their native language. The training and classification is done in Shibboleth by support vector machines using a Gaussian radial basis kernel. In one experiment run using Shibboleth, the program correctly identified the native language (L1) of a speaker of unknown origin 42% of the time when there were six possible L1s in which to classify the speaker. This rate is significantly better than the 17% chance classification rate. Chi-squared test (1, N=24) =10.800, p=.0010 In a second experiment, Shibboleth was not able to determine the native language family of a speaker of unknown origin at a rate better than chance (33-44%) when the L1 was not in the transcripts used for training the language family rule-set. Chi-squared test (1, N=18) =1.000, p=.3173 The 318 participants for both experiments were from the Speech Accent Archive (Weinberger, 2013), and ranged in age from 17 to 80 years old. Forty percent of the speakers were female and 60% were male. The factor that most influenced correct classification was higher age of onset for the L2. A higher number of years spent living in an English-speaking country did not have the expected positive effect on classification.

ContributorsFrost, Wende (Author) / Gelderen, Elly van (Thesis advisor) / Perzanowski, Dennis (Committee member) / Gee, Elisabeth (Committee member) / Arizona State University (Publisher)

Created2013

Negation particles and historical linguistics: what part of "not" do you not understand?

Description

ABSTRACT There are many parts of speech and morphological items in a linguistic lexicon that may be optional in order to have a cohesive language with a complete range of expression. Negation is not one of them. Negation appears to be absolutely essential from a linguistic (and indeed, a psychological)…

ABSTRACT There are many parts of speech and morphological items in a linguistic lexicon that may be optional in order to have a cohesive language with a complete range of expression. Negation is not one of them. Negation appears to be absolutely essential from a linguistic (and indeed, a psychological) point of view within any human language. Humans need to be able to say in some fashion "No" and to express our not doing things in various ways. During the discussions that appear in this thesis, I expound upon the historical changes that can be seen within three different language branches - North Germanic (with Gothic, Old Saxon, Old Norse, Swedish, and Icelandic), West Germanic (with English), and Celtic (with Welsh) - focusing on negation particles in particular and their position within these languages. I also examine how each of these chosen languages has seen negation shift over time in relation to Jespersen's negation cycle. Finally, I compare and contrast the results I see from these languages, demonstrating that they all three do follow a distinct negation cycle. I also explain how these three negation cycles are chronologically not in sync with one another and obviously all changed at different rates. This appears to be the case even within the different branches of the Germanic family.

ContributorsLoewenhagen, Angela C (Author) / Gelderen, Elly van (Committee member) / Bjork, Robert (Committee member) / Gillon, Carrie (Committee member) / Arizona State University (Publisher)

Created2014

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Description

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with…

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context of an orphaned tweet, this thesis aims at building a hashtag recommendation system called TweetSense, to suggest hashtags as a context or metadata for the orphaned tweets. This in turn would increase user's social engagement and impact Twitter to maintain its monthly active online users in its social network. In contrast to other existing systems, this hashtag recommendation system recommends personalized hashtags by exploiting the social signals of users in Twitter. The novelty with this system is that it emphasizes on selecting the suitable candidate set of hashtags from the related tweets of user's social graph (timeline).The system then rank them based on the combination of features scores computed from their tweet and user related features. It is evaluated based on its ability to predict suitable hashtags for a random sample of tweets whose existing hashtags are deliberately removed for evaluation. I present a detailed internal empirical evaluation of TweetSense, as well as an external evaluation in comparison with current state of the art method.

ContributorsVijayakumar, Manikandan (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2014

ASU Electronic Theses and Dissertations

Filtering by

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Thai English as a variety

Reanalysis of OE hwæðer in the left periphery

Connecting users with similar interests for group understanding

The subjectification of English adjectives, and the effect of subjectivity on prenominal adjective order

When is temporal planning really temporal

Utility of considering multiple alternative rectifications in data cleaning

Shibboleth: an automated foreign accent identification program

Negation particles and historical linguistics: what part of "not" do you not understand?

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter