Matching Items (4)
Filtering by

Clear all filters

155717-Thumbnail Image.png
Description
Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people

Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people at scales thought unimaginable only a few years ago.

Researchers and practitioners use social media to extract actionable patterns such as where aid should be distributed in a crisis. However, the validity of these patterns relies on having a representative dataset. As this dissertation shows, the data collected from social media is seldom representative of the activity of the site itself, and less so of human activity. This means that the results of many studies are limited by the quality of data they collect.

The finding that social media data is biased inspires the main challenge addressed by this thesis. I introduce three sets of methodologies to correct for bias. First, I design methods to deal with data collection bias. I offer a methodology which can find bias within a social media dataset. This methodology works by comparing the collected data with other sources to find bias in a stream. The dissertation also outlines a data collection strategy which minimizes the amount of bias that will appear in a given dataset. It introduces a crawling strategy which mitigates the amount of bias in the resulting dataset. Second, I introduce a methodology to identify bots and shills within a social media dataset. This directly addresses the concern that the users of a social media site are not representative. Applying these methodologies allows the population under study on a social media site to better match that of the real world. Finally, the dissertation discusses perceptual biases, explains how they affect analysis, and introduces computational approaches to mitigate them.

The results of the dissertation allow for the discovery and removal of different levels of bias within a social media dataset. This has important implications for social media mining, namely that the behavioral patterns and insights extracted from social media will be more representative of the populations under study.
ContributorsMorstatter, Fred (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Maciejewski, Ross (Committee member) / Carley, Kathleen M. (Committee member) / Arizona State University (Publisher)
Created2017
141320-Thumbnail Image.png
Description

This chapter integrates from cognitive neuroscience, cognitive psychology, and social psychology the basic science of bias in human judgment as relevant to judgments and decisions by forensic mental health professionals. Forensic mental health professionals help courts make decisions in cases when some question of psychology pertains to the legal issue,

This chapter integrates from cognitive neuroscience, cognitive psychology, and social psychology the basic science of bias in human judgment as relevant to judgments and decisions by forensic mental health professionals. Forensic mental health professionals help courts make decisions in cases when some question of psychology pertains to the legal issue, such as in insanity cases, child custody hearings, and psychological injuries in civil suits. The legal system itself and many people involved, such as jurors, assume mental health experts are “objective” and untainted by bias. However, basic psychological science from several branches of the discipline suggest the law’s assumption about experts’ protection from bias is wrong. Indeed, several empirical studies now show clear evidence of (unintentional) bias in forensic mental health experts’ judgments and decisions. In this chapter, we explain the science of how and why human judgments are susceptible to various kinds of bias. We describe dual-process theories from cognitive neuroscience, cognitive psychology, and social psychology that can help explain these biases. We review the empirical evidence to date specifically about cognitive and social psychological biases in forensic mental health judgments, weaving in related literature about biases in other types of expert judgment, with hypotheses about how forensic experts are likely affected by these biases. We close with a discussion of directions for future research and practice.

ContributorsNeal, Tess M.S. (Author) / Hight, Morgan (Author) / Howatt, Brian C. (Author) / Hamza, Cassandra (Author)
Created2017-04-30
161967-Thumbnail Image.png
Description
Machine learning models can pick up biases and spurious correlations from training data and projects and amplify these biases during inference, thus posing significant challenges in real-world settings. One approach to mitigating this is a class of methods that can identify filter out bias-inducing samples from the training datasets to

Machine learning models can pick up biases and spurious correlations from training data and projects and amplify these biases during inference, thus posing significant challenges in real-world settings. One approach to mitigating this is a class of methods that can identify filter out bias-inducing samples from the training datasets to force models to avoid being exposed to biases. However, the filtering leads to a considerable wastage of resources as most of the dataset created is discarded as biased. This work deals with avoiding the wastage of resources by identifying and quantifying the biases. I further elaborate on the implications of dataset filtering on robustness (to adversarial attacks) and generalization (to out-of-distribution samples). The findings suggest that while dataset filtering does help to improve OOD(Out-Of-Distribution) generalization, it has a significant negative impact on robustness to adversarial attacks. It also shows that transforming bias-inducing samples into adversarial samples (instead of eliminating them from the dataset) can significantly boost robustness without sacrificing generalization.
ContributorsSachdeva, Bhavdeep Singh (Author) / Baral, Chitta (Thesis advisor) / Liu, Huan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2021
158485-Thumbnail Image.png
Description
Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world

Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications.
ContributorsJain, Niharika (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Manikonda, Lydia (Committee member) / Arizona State University (Publisher)
Created2020