Search Content

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Description

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with…

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context of an orphaned tweet, this thesis aims at building a hashtag recommendation system called TweetSense, to suggest hashtags as a context or metadata for the orphaned tweets. This in turn would increase user's social engagement and impact Twitter to maintain its monthly active online users in its social network. In contrast to other existing systems, this hashtag recommendation system recommends personalized hashtags by exploiting the social signals of users in Twitter. The novelty with this system is that it emphasizes on selecting the suitable candidate set of hashtags from the related tweets of user's social graph (timeline).The system then rank them based on the combination of features scores computed from their tweet and user related features. It is evaluated based on its ability to predict suitable hashtags for a random sample of tweets whose existing hashtags are deliberately removed for evaluation. I present a detailed internal empirical evaluation of TweetSense, as well as an external evaluation in comparison with current state of the art method.

ContributorsVijayakumar, Manikandan (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2014

Planning challenges in human-robot teaming

Description

As robotic technology and its various uses grow steadily more complex and ubiquitous, humans are coming into increasing contact with robotic agents. A large portion of such contact is cooperative interaction, where both humans and robots are required to work on the same application towards achieving common goals. These application…

As robotic technology and its various uses grow steadily more complex and ubiquitous, humans are coming into increasing contact with robotic agents. A large portion of such contact is cooperative interaction, where both humans and robots are required to work on the same application towards achieving common goals. These application scenarios are characterized by a need to leverage the strengths of each agent as part of a unified team to reach those common goals. To ensure that the robotic agent is truly a contributing team-member, it must exhibit some degree of autonomy in achieving goals that have been delegated to it. Indeed, a significant portion of the utility of such human-robot teams derives from the delegation of goals to the robot, and autonomy on the part of the robot in achieving those goals. In order to be considered truly autonomous, the robot must be able to make its own plans to achieve the goals assigned to it, with only minimal direction and assistance from the human.

Automated planning provides the solution to this problem -- indeed, one of the main motivations that underpinned the beginnings of the field of automated planning was to provide planning support for Shakey the robot with the STRIPS system. For long, however, automated planners suffered from scalability issues that precluded their application to real world, real time robotic systems. Recent decades have seen a gradual abeyance of those issues, and fast planning systems are now the norm rather than the exception. However, some of these advances in speedup and scalability have been achieved by ignoring or abstracting out challenges that real world integrated robotic systems must confront.

In this work, the problem of planning for human-hobot teaming is introduced. The central idea -- the use of automated planning systems as mediators in such human-robot teaming scenarios -- and the main challenges inspired from real world scenarios that must be addressed in order to make such planning seamless are presented: (i) Goals which can be specified or changed at execution time, after the planning process has completed; (ii) Worlds and scenarios where the state changes dynamically while a previous plan is executing; (iii) Models that are incomplete and can be changed during execution; and (iv) Information about the human agent's plan and intentions that can be used for coordination. These challenges are compounded by the fact that the human-robot team must execute in an open world, rife with dynamic events and other agents; and in a manner that encourages the exchange of information between the human and the robot. As an answer to these challenges, implemented solutions and a fielded prototype that combines all of those solutions into one planning system are discussed. Results from running this prototype in real world scenarios are presented, and extensions to some of the solutions are offered as appropriate.

ContributorsTalamadupula, Kartik (Author) / Kambhampati, Subbarao (Thesis advisor) / Baral, Chitta (Committee member) / Liu, Huan (Committee member) / Scheutz, Matthias (Committee member) / Smith, David E. (Committee member) / Arizona State University (Publisher)

Created2014

Sarcasm detection on Twitter: a behavioral modeling approach

Description

Sarcasm is a nuanced form of language where usually, the speaker explicitly states the opposite of what is implied. Imbued with intentional ambiguity and subtlety, detecting sarcasm is a difficult task, even for humans. Current works approach this challenging problem primarily from a linguistic perspective, focusing on the lexical and…

Sarcasm is a nuanced form of language where usually, the speaker explicitly states the opposite of what is implied. Imbued with intentional ambiguity and subtlety, detecting sarcasm is a difficult task, even for humans. Current works approach this challenging problem primarily from a linguistic perspective, focusing on the lexical and syntactic aspects of sarcasm. In this thesis, I explore the possibility of using behavior traits intrinsic to users of sarcasm to detect sarcastic tweets. First, I theorize the core forms of sarcasm using findings from the psychological and behavioral sciences, and some observations on Twitter users. Then, I develop computational features to model the manifestations of these forms of sarcasm using the user's profile information and tweets. Finally, I combine these features to train a supervised learning model to detect sarcastic tweets. I perform experiments to extensively evaluate the proposed behavior modeling approach and compare with the state-of-the-art.

ContributorsRajadesingan, Ashwin (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Pon-Barry, Heather (Committee member) / Arizona State University (Publisher)

Created2014

Discovering and Mitigating Social Data Bias

Description

Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people…

Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people at scales thought unimaginable only a few years ago.

Researchers and practitioners use social media to extract actionable patterns such as where aid should be distributed in a crisis. However, the validity of these patterns relies on having a representative dataset. As this dissertation shows, the data collected from social media is seldom representative of the activity of the site itself, and less so of human activity. This means that the results of many studies are limited by the quality of data they collect.

The finding that social media data is biased inspires the main challenge addressed by this thesis. I introduce three sets of methodologies to correct for bias. First, I design methods to deal with data collection bias. I offer a methodology which can find bias within a social media dataset. This methodology works by comparing the collected data with other sources to find bias in a stream. The dissertation also outlines a data collection strategy which minimizes the amount of bias that will appear in a given dataset. It introduces a crawling strategy which mitigates the amount of bias in the resulting dataset. Second, I introduce a methodology to identify bots and shills within a social media dataset. This directly addresses the concern that the users of a social media site are not representative. Applying these methodologies allows the population under study on a social media site to better match that of the real world. Finally, the dissertation discusses perceptual biases, explains how they affect analysis, and introduces computational approaches to mitigate them.

The results of the dissertation allow for the discovery and removal of different levels of bias within a social media dataset. This has important implications for social media mining, namely that the behavioral patterns and insights extracted from social media will be more representative of the populations under study.

ContributorsMorstatter, Fred (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Maciejewski, Ross (Committee member) / Carley, Kathleen M. (Committee member) / Arizona State University (Publisher)

Created2017

Representing and Reasoning about Dynamic Multi-Agent Domains: An Action Language Approach

Description

Reasoning about actions forms the basis of many tasks such as prediction, planning, and diagnosis in a dynamic domain. Within the reasoning about actions community, a broad class of languages, called action languages, has been developed together with a methodology for their use in representing and reasoning about dynamic domains.…

Reasoning about actions forms the basis of many tasks such as prediction, planning, and diagnosis in a dynamic domain. Within the reasoning about actions community, a broad class of languages, called action languages, has been developed together with a methodology for their use in representing and reasoning about dynamic domains. With a few notable exceptions, the focus of these efforts has largely centered around single-agent systems. Agents rarely operate in a vacuum however, and almost in parallel, substantial work has been done within the dynamic epistemic logic community towards understanding how the actions of an agent may effect not just his own knowledge and/or beliefs, but those of his fellow agents as well. What is less understood by both communities is how to represent and reason about both the direct and indirect effects of both ontic and epistemic actions within a multi-agent setting. This dissertation presents ongoing research towards a framework for representing and reasoning about dynamic multi-agent domains involving both classes of actions.

The contributions of this work are as follows: the formulation of a precise mathematical model of a dynamic multi-agent domain based on the notion of a transition diagram; the development of the multi-agent action languages mA+ and mAL based upon this model, as well as preliminary investigations of their properties and implementations via logic programming under the answer set semantics; precise formulations of the temporal projection, and planning problems within a multi-agent context; and an investigation of the application of the proposed approach to the representation of, and reasoning about, scenarios involving the modalities of knowledge and belief.

ContributorsGelfond, Gregory (Author) / Baral, Chitta (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Lee, Joohyung (Committee member) / Moss, Larry (Committee member) / Cao Son, Tran (Committee member) / Arizona State University (Publisher)

Created2018

A Study on Generative Adversarial Networks Exacerbating Social Data Bias

Description

Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world…

Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications.

ContributorsJain, Niharika (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Manikonda, Lydia (Committee member) / Arizona State University (Publisher)

Created2020

The What, When, and How of Strategic Movement in Adversarial Settings: A Syncretic View of AI and Security

Description

The field of cyber-defenses has played catch-up in the cat-and-mouse game of finding vulnerabilities followed by the invention of patches to defend against them. With the complexity and scale of modern-day software, it is difficult to ensure that all known vulnerabilities are patched; moreover, the attacker, with reconnaissance on their…

The field of cyber-defenses has played catch-up in the cat-and-mouse game of finding vulnerabilities followed by the invention of patches to defend against them. With the complexity and scale of modern-day software, it is difficult to ensure that all known vulnerabilities are patched; moreover, the attacker, with reconnaissance on their side, will eventually discover and leverage them. To take away the attacker's inherent advantage of reconnaissance, researchers have proposed the notion of proactive defenses such as Moving Target Defense (MTD) in cyber-security. In this thesis, I make three key contributions that help to improve the effectiveness of MTD.

First, I argue that naive movement strategies for MTD systems, designed based on intuition, are detrimental to both security and performance. To answer the question of how to move, I (1) model MTD as a leader-follower game and formally characterize the notion of optimal movement strategies, (2) leverage expert-curated public data and formal representation methods used in cyber-security to obtain parameters of the game, and (3) propose optimization methods to infer strategies at Strong Stackelberg Equilibrium, addressing issues pertaining to scalability and switching costs. Second, when one cannot readily obtain the parameters of the game-theoretic model but can interact with a system, I propose a novel multi-agent reinforcement learning approach that finds the optimal movement strategy. Third, I investigate the novel use of MTD in three domains-- cyber-deception, machine learning, and critical infrastructure networks. I show that the question of what to move poses non-trivial challenges in these domains. To address them, I propose methods for patch-set selection in the deployment of honey-patches, characterize the notion of differential immunity in deep neural networks, and develop optimization problems that guarantee differential immunity for dynamic sensor placement in power-networks.

ContributorsSengupta, Sailik (Author) / Kambhampati, Subbarao (Thesis advisor) / Bao, Tiffany (Youzhi) (Committee member) / Huang, Dijiang (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2020

Software-defined Situation-aware Cloud Security

Description

The use of reactive security mechanisms in enterprise networks can, at times, provide an asymmetric advantage to the attacker. Similarly, the use of a proactive security mechanism like Moving Target Defense (MTD), if performed without analyzing the effects of security countermeasures, can lead to security policy and service level agreement…

The use of reactive security mechanisms in enterprise networks can, at times, provide an asymmetric advantage to the attacker. Similarly, the use of a proactive security mechanism like Moving Target Defense (MTD), if performed without analyzing the effects of security countermeasures, can lead to security policy and service level agreement violations. In this thesis, I explore the research questions 1) how to model attacker-defender interactions for multi-stage attacks? 2) how to efficiently deploy proactive (MTD) security countermeasures in a software-defined environment for single and multi-stage attacks? 3) how to verify the effects of security and management policies on the network and take corrective actions?

I propose a Software-defined Situation-aware Cloud Security framework, that, 1) analyzes the attacker-defender interactions using an Software-defined Networking (SDN) based scalable attack graph. This research investigates Advanced Persistent Threat (APT) attacks using a scalable attack graph. The framework utilizes a parallel graph partitioning algorithm to generate an attack graph quickly and efficiently. 2) models single-stage and multi-stage attacks (APTs) using the game-theoretic model and provides SDN-based MTD countermeasures. I propose a Markov Game for modeling multi-stage attacks. 3) introduces a multi-stage policy conflict checking framework at the SDN network's application plane. I present INTPOL, a new intent-driven security policy enforcement solution. INTPOL provides a unified language and INTPOL grammar that abstracts the network administrator from the underlying network controller's lexical rules. INTPOL develops a bounded formal model for network service compliance checking, which significantly reduces the number of countermeasures that needs to be deployed. Once the application-layer policy conflicts are resolved, I utilize an Object-Oriented Policy Conflict checking (OOPC) framework that identifies and resolves rule-order dependencies and conflicts between security policies.

ContributorsChowdhary, Ankur (Author) / Huang, Dijiang (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Doupe, Adam (Committee member) / Bao, Youzhi (Committee member) / Arizona State University (Publisher)

Created2020

Protecting User Privacy with Social Media Data and Mining

Description

The pervasive use of the Web has connected billions of people all around the globe and enabled them to obtain information at their fingertips. This results in tremendous amounts of user-generated data which makes users traceable and vulnerable to privacy leakage attacks. In general, there are two types of privacy…

The pervasive use of the Web has connected billions of people all around the globe and enabled them to obtain information at their fingertips. This results in tremendous amounts of user-generated data which makes users traceable and vulnerable to privacy leakage attacks. In general, there are two types of privacy leakage attacks for user-generated data, i.e., identity disclosure and private-attribute disclosure attacks. These attacks put users at potential risks ranging from persecution by governments to targeted frauds. Therefore, it is necessary for users to be able to safeguard their privacy without leaving their unnecessary traces of online activities. However, privacy protection comes at the cost of utility loss defined as the loss in quality of personalized services users receive. The reason is that this information of traces is crucial for online vendors to provide personalized services and a lack of it would result in deteriorating utility. This leads to a dilemma of privacy and utility.

Protecting users' privacy while preserving utility for user-generated data is a challenging task. The reason is that users generate different types of data such as Web browsing histories, user-item interactions, and textual information. This data is heterogeneous, unstructured, noisy, and inherently different from relational and tabular data and thus requires quantifying users' privacy and utility in each context separately. In this dissertation, I investigate four aspects of protecting user privacy for user-generated data. First, a novel adversarial technique is introduced to assay privacy risks in heterogeneous user-generated data. Second, a novel framework is proposed to boost users' privacy while retaining high utility for Web browsing histories. Third, a privacy-aware recommendation system is developed to protect privacy w.r.t. the rich user-item interaction data by recommending relevant and privacy-preserving items. Fourth, a privacy-preserving framework for text representation learning is presented to safeguard user-generated textual data as it can reveal private information.

ContributorsBeigi, Ghazaleh (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Tong, Hanghang (Committee member) / Eliassi-Rad, Tina (Committee member) / Arizona State University (Publisher)

Created2020

Using Event logs and Rapid Ethnographic Data to Mine Clinical Pathways

Description

Background: Process mining (PM) using event log files is gaining popularity in healthcare to investigate clinical pathways. But it has many unique challenges. Clinical Pathways (CPs) are often complex and unstructured which results in spaghetti-like models. Moreover, the log files collected from the electronic health record (EHR) often contain noisy…

Background: Process mining (PM) using event log files is gaining popularity in healthcare to investigate clinical pathways. But it has many unique challenges. Clinical Pathways (CPs) are often complex and unstructured which results in spaghetti-like models. Moreover, the log files collected from the electronic health record (EHR) often contain noisy and incomplete data. Objective: Based on the traditional process mining technique of using event logs generated by an EHR, observational video data from rapid ethnography (RE) were combined to model, interpret, simplify and validate the perioperative (PeriOp) CPs. Method: The data collection and analysis pipeline consisted of the following steps: (1) Obtain RE data, (2) Obtain EHR event logs, (3) Generate CP from RE data, (4) Identify EHR interfaces and functionalities, (5) Analyze EHR functionalities to identify missing events, (6) Clean and preprocess event logs to remove noise, (7) Use PM to compute CP time metrics, (8) Further remove noise by removing outliers, (9) Mine CP from event logs and (10) Compare CPs resulting from RE and PM. Results: Four provider interviews and 1,917,059 event logs and 877 minutes of video ethnography recording EHRs interaction were collected. When mapping event logs to EHR functionalities, the intraoperative (IntraOp) event logs were more complete (45%) when compared with preoperative (35%) and postoperative (21.5%) event logs. After removing the noise (496 outliers) and calculating the duration of the PeriOp CP, the median was 189 minutes and the standard deviation was 291 minutes. Finally, RE data were analyzed to help identify most clinically relevant event logs and simplify spaghetti-like CPs resulting from PM. Conclusion: The study demonstrated the use of RE to help overcome challenges of automatic discovery of CPs. It also demonstrated that RE data could be used to identify relevant clinical tasks and incomplete data, remove noise (outliers), simplify CPs and validate mined CPs.

ContributorsDeotale, Aditya Vijay (Author) / Liu, Huan (Thesis advisor) / Grando, Maria (Thesis advisor) / Manikonda, Lydia (Committee member) / Arizona State University (Publisher)

Created2020