Matching Items (3)
Filtering by

Clear all filters

133140-Thumbnail Image.png
Description
The Internet has made it possible to exchange information at a rapid rate. With this extraordinary ability, media companies and various other organizations have been able to communicate thoughts and information to an extremely large audience. As a result, news subscribers are overwhelmed with biased information, which makes it very

The Internet has made it possible to exchange information at a rapid rate. With this extraordinary ability, media companies and various other organizations have been able to communicate thoughts and information to an extremely large audience. As a result, news subscribers are overwhelmed with biased information, which makes it very easy to be misinformed. Unfortunately, there is currently no way to stay truly informed without spending countless hours searching the Internet for different viewpoints and ultimately using that information to formulate a sound understanding. This project (nicknamed "Newsie") solves this problem by providing news subscribers with many news sources to every topic, thereby saving them time and ultimately paving a way to a more informed society. Since one of the main goals of this project is to provide information to the largest number of people, Newsie is designed with availability in mind. Unsurprisingly, the most accessible method of communication is the Internet \u2014 more specifically, a website. Users will be able to access Newsie via a webpage, and easily view to most recent headlines with their corresponding articles from several sources. Another goal of the project is to classify different articles and sources based on their bias. After reading articles, users will be able to vote on their biases. This provides a crowdsourced method of determining bias.
ContributorsAlimov, Robert Joseph (Author) / Meuth, Ryan (Thesis director) / Franceschini, Enos (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2018-12
158485-Thumbnail Image.png
Description
Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world

Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications.
ContributorsJain, Niharika (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Manikonda, Lydia (Committee member) / Arizona State University (Publisher)
Created2020
158252-Thumbnail Image.png
Description
Background: Process mining (PM) using event log files is gaining popularity in healthcare to investigate clinical pathways. But it has many unique challenges. Clinical Pathways (CPs) are often complex and unstructured which results in spaghetti-like models. Moreover, the log files collected from the electronic health record (EHR) often contain noisy

Background: Process mining (PM) using event log files is gaining popularity in healthcare to investigate clinical pathways. But it has many unique challenges. Clinical Pathways (CPs) are often complex and unstructured which results in spaghetti-like models. Moreover, the log files collected from the electronic health record (EHR) often contain noisy and incomplete data. Objective: Based on the traditional process mining technique of using event logs generated by an EHR, observational video data from rapid ethnography (RE) were combined to model, interpret, simplify and validate the perioperative (PeriOp) CPs. Method: The data collection and analysis pipeline consisted of the following steps: (1) Obtain RE data, (2) Obtain EHR event logs, (3) Generate CP from RE data, (4) Identify EHR interfaces and functionalities, (5) Analyze EHR functionalities to identify missing events, (6) Clean and preprocess event logs to remove noise, (7) Use PM to compute CP time metrics, (8) Further remove noise by removing outliers, (9) Mine CP from event logs and (10) Compare CPs resulting from RE and PM. Results: Four provider interviews and 1,917,059 event logs and 877 minutes of video ethnography recording EHRs interaction were collected. When mapping event logs to EHR functionalities, the intraoperative (IntraOp) event logs were more complete (45%) when compared with preoperative (35%) and postoperative (21.5%) event logs. After removing the noise (496 outliers) and calculating the duration of the PeriOp CP, the median was 189 minutes and the standard deviation was 291 minutes. Finally, RE data were analyzed to help identify most clinically relevant event logs and simplify spaghetti-like CPs resulting from PM. Conclusion: The study demonstrated the use of RE to help overcome challenges of automatic discovery of CPs. It also demonstrated that RE data could be used to identify relevant clinical tasks and incomplete data, remove noise (outliers), simplify CPs and validate mined CPs.
ContributorsDeotale, Aditya Vijay (Author) / Liu, Huan (Thesis advisor) / Grando, Maria (Thesis advisor) / Manikonda, Lydia (Committee member) / Arizona State University (Publisher)
Created2020