Matching Items (5)
Filtering by
- Member of: ASU Electronic Theses and Dissertations
Description
Sarcasm is a nuanced form of language where usually, the speaker explicitly states the opposite of what is implied. Imbued with intentional ambiguity and subtlety, detecting sarcasm is a difficult task, even for humans. Current works approach this challenging problem primarily from a linguistic perspective, focusing on the lexical and syntactic aspects of sarcasm. In this thesis, I explore the possibility of using behavior traits intrinsic to users of sarcasm to detect sarcastic tweets. First, I theorize the core forms of sarcasm using findings from the psychological and behavioral sciences, and some observations on Twitter users. Then, I develop computational features to model the manifestations of these forms of sarcasm using the user's profile information and tweets. Finally, I combine these features to train a supervised learning model to detect sarcastic tweets. I perform experiments to extensively evaluate the proposed behavior modeling approach and compare with the state-of-the-art.
ContributorsRajadesingan, Ashwin (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Pon-Barry, Heather (Committee member) / Arizona State University (Publisher)
Created2014
Description
With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand them and track their behavior. Using this content of the user, data analysis can be performed to understand their social ideology and affinity towards Radical and Counter-Radical Movements. During the process of expressing their opinions people use hashtags in their messages in Twitter. These hashtags are a rich source of information in understanding the content based relationship between the online users apart from the existing context based follower and friend relationship.
An intelligent visual dash-board system is necessary which can track the activities of the users and diffusion of the online social movements, identify the hot-spots in the users' network, show the geographic foot print of the users and to understand the socio-cultural, economic and political drivers for the relationship among different groups of the users.
An intelligent visual dash-board system is necessary which can track the activities of the users and diffusion of the online social movements, identify the hot-spots in the users' network, show the geographic foot print of the users and to understand the socio-cultural, economic and political drivers for the relationship among different groups of the users.
ContributorsGaripalli, Sravan Kumar (Author) / Davulcu, Hasan (Thesis advisor) / Shakarian, Paulo (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)
Created2015
Description
Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks such as pharmacovigilance via the use of Natural Language Processing (NLP) techniques. One of the critical steps in information extraction pipelines is Named Entity Recognition (NER), where the mentions of entities such as diseases are located in text and their entity type are identified. However, the language in social media is highly informal, and user-expressed health-related concepts are often non-technical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and advanced machine learning-based NLP techniques have been underutilized. This work explores the effectiveness of different machine learning techniques, and particularly deep learning, to address the challenges associated with extraction of health-related concepts from social media. Deep learning has recently attracted a lot of attention in machine learning research and has shown remarkable success in several applications particularly imaging and speech recognition. However, thus far, deep learning techniques are relatively unexplored for biomedical text mining and, in particular, this is the first attempt in applying deep learning for health information extraction from social media.
This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems.
This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques.
This work presents ADRMine that uses a Conditional Random Field (CRF) sequence tagger for extraction of complex health-related concepts. It utilizes a large volume of unlabeled user posts for automatic learning of embedding cluster features, a novel application of deep learning in modeling the similarity between the tokens. ADRMine significantly improved the medical NER performance compared to the baseline systems.
This work also presents DeepHealthMiner, a deep learning pipeline for health-related concept extraction. Most of the machine learning methods require sophisticated task-specific manual feature design which is a challenging step in processing the informal and noisy content of social media. DeepHealthMiner automatically learns classification features using neural networks and utilizing a large volume of unlabeled user posts. Using a relatively small labeled training set, DeepHealthMiner could accurately identify most of the concepts, including the consumer expressions that were not observed in the training data or in the standard medical lexicons outperforming the state-of-the-art baseline techniques.
ContributorsNikfarjam, Azadeh (Author) / Gonzalez, Graciela (Thesis advisor) / Greenes, Robert (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)
Created2016
Description
Social media is a medium that contains rich information which has been shared by many users every second every day. This information can be utilized for various outcomes such as understanding user behaviors, learning the effect of social media on a community, and developing a decision-making system based on the information available. With the growing popularity of social networking sites, people can freely express their opinions and feelings which results in a tremendous amount of user-generated data. The rich amount of social media data has opened the path for researchers to study and understand the users’ behaviors and mental health conditions. Several studies have shown that social media provides a means to capture an individual state of mind. Given the social media data and related work in this field, this work studies the scope of users’ discussion among online mental health communities. In the first part of this dissertation, this work focuses on the role of social media on mental health among sexual abuse community. It employs natural language processing techniques to extract topics of responses, examine how diverse these topics are to answer research questions such as whether responses are limited to emotional support; if not, what other topics are; what the diversity of topics manifests; how online response differs from traditional response found in a physical world. To answer these questions, this work extracts Reddit posts on rape to understand the nature of user responses for this stigmatized topic. In the second part of this dissertation, this work expands to a broader range of online communities. In particular, it investigates the potential roles of social media on mental health among five major communities, i.e., trauma and abuse community, psychosis and anxiety community, compulsive disorders community, coping and therapy community, and mood disorders community. This work studies how people interact with each other in each of these communities and what these online forums provide a resource to users who seek help. To understand users’ behaviors, this work extracts Reddit posts on 52 related subcommunities and analyzes the linguistic behavior of each community. Experiments in this dissertation show that Reddit is a good medium for users with mental health issues to find related helpful resources. Another interesting observation is an interesting topic cluster from users’ posts which shows that discussion and communication among users help individuals to find proper resources for their problem. Moreover, results show that the anonymity of users in Reddit allows them to have discussions about different topics beyond social support such as financial and religious support.
ContributorsKamarudin, Nur Shazwani (Author) / Liu, Huan (Thesis advisor) / Davulcu, Hasan (Committee member) / Sen, Arunabha (Committee member) / Hu, Xia (Committee member) / Arizona State University (Publisher)
Created2019
Description
The pervasive use of the Web has connected billions of people all around the globe and enabled them to obtain information at their fingertips. This results in tremendous amounts of user-generated data which makes users traceable and vulnerable to privacy leakage attacks. In general, there are two types of privacy leakage attacks for user-generated data, i.e., identity disclosure and private-attribute disclosure attacks. These attacks put users at potential risks ranging from persecution by governments to targeted frauds. Therefore, it is necessary for users to be able to safeguard their privacy without leaving their unnecessary traces of online activities. However, privacy protection comes at the cost of utility loss defined as the loss in quality of personalized services users receive. The reason is that this information of traces is crucial for online vendors to provide personalized services and a lack of it would result in deteriorating utility. This leads to a dilemma of privacy and utility.
Protecting users' privacy while preserving utility for user-generated data is a challenging task. The reason is that users generate different types of data such as Web browsing histories, user-item interactions, and textual information. This data is heterogeneous, unstructured, noisy, and inherently different from relational and tabular data and thus requires quantifying users' privacy and utility in each context separately. In this dissertation, I investigate four aspects of protecting user privacy for user-generated data. First, a novel adversarial technique is introduced to assay privacy risks in heterogeneous user-generated data. Second, a novel framework is proposed to boost users' privacy while retaining high utility for Web browsing histories. Third, a privacy-aware recommendation system is developed to protect privacy w.r.t. the rich user-item interaction data by recommending relevant and privacy-preserving items. Fourth, a privacy-preserving framework for text representation learning is presented to safeguard user-generated textual data as it can reveal private information.
Protecting users' privacy while preserving utility for user-generated data is a challenging task. The reason is that users generate different types of data such as Web browsing histories, user-item interactions, and textual information. This data is heterogeneous, unstructured, noisy, and inherently different from relational and tabular data and thus requires quantifying users' privacy and utility in each context separately. In this dissertation, I investigate four aspects of protecting user privacy for user-generated data. First, a novel adversarial technique is introduced to assay privacy risks in heterogeneous user-generated data. Second, a novel framework is proposed to boost users' privacy while retaining high utility for Web browsing histories. Third, a privacy-aware recommendation system is developed to protect privacy w.r.t. the rich user-item interaction data by recommending relevant and privacy-preserving items. Fourth, a privacy-preserving framework for text representation learning is presented to safeguard user-generated textual data as it can reveal private information.
ContributorsBeigi, Ghazaleh (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Tong, Hanghang (Committee member) / Eliassi-Rad, Tina (Committee member) / Arizona State University (Publisher)
Created2020