Search Content

Matching Items (2)

Filtering by

Genre: Masters Thesis

A Study on Generative Adversarial Networks Exacerbating Social Data Bias

Description

Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications.

ContributorsJain, Niharika (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Manikonda, Lydia (Committee member) / Arizona State University (Publisher)

Created2020

Using Event logs and Rapid Ethnographic Data to Mine Clinical Pathways

Description

Background: Process mining (PM) using event log files is gaining popularity in healthcare to investigate clinical pathways. But it has many unique challenges. Clinical Pathways (CPs) are often complex and unstructured which results in spaghetti-like models. Moreover, the log files collected from the electronic health record (EHR) often contain noisy and incomplete data. Objective: Based on the traditional process mining technique of using event logs generated by an EHR, observational video data from rapid ethnography (RE) were combined to model, interpret, simplify and validate the perioperative (PeriOp) CPs. Method: The data collection and analysis pipeline consisted of the following steps: (1) Obtain RE data, (2) Obtain EHR event logs, (3) Generate CP from RE data, (4) Identify EHR interfaces and functionalities, (5) Analyze EHR functionalities to identify missing events, (6) Clean and preprocess event logs to remove noise, (7) Use PM to compute CP time metrics, (8) Further remove noise by removing outliers, (9) Mine CP from event logs and (10) Compare CPs resulting from RE and PM. Results: Four provider interviews and 1,917,059 event logs and 877 minutes of video ethnography recording EHRs interaction were collected. When mapping event logs to EHR functionalities, the intraoperative (IntraOp) event logs were more complete (45%) when compared with preoperative (35%) and postoperative (21.5%) event logs. After removing the noise (496 outliers) and calculating the duration of the PeriOp CP, the median was 189 minutes and the standard deviation was 291 minutes. Finally, RE data were analyzed to help identify most clinically relevant event logs and simplify spaghetti-like CPs resulting from PM. Conclusion: The study demonstrated the use of RE to help overcome challenges of automatic discovery of CPs. It also demonstrated that RE data could be used to identify relevant clinical tasks and incomplete data, remove noise (outliers), simplify CPs and validate mined CPs.

ContributorsDeotale, Aditya Vijay (Author) / Liu, Huan (Thesis advisor) / Grando, Maria (Thesis advisor) / Manikonda, Lydia (Committee member) / Arizona State University (Publisher)

Created2020