Search Content

Displaying 1 - 3 of 3

Filtering by

Creators: Sankar, Lalitha

A Tunable Loss Function for Robust, Rigorous, and Reliable Machine Learning

Description

In the era of big data, more and more decisions and recommendations are being made by machine learning (ML) systems and algorithms. Despite their many successes, there have been notable deficiencies in the robustness, rigor, and reliability of these ML systems, which have had detrimental societal impacts. In the next generation of ML, these significant challenges must be addressed through careful algorithmic design, and it is crucial that practitioners and meta-algorithms have the necessary tools to construct ML models that align with human values and interests. In an effort to help address these problems, this dissertation studies a tunable loss function called α-loss for the ML setting of classification. The alpha-loss is a hyperparameterized loss function originating from information theory that continuously interpolates between the exponential (alpha = 1/2), log (alpha = 1), and 0-1 (alpha = infinity) losses, hence providing a holistic perspective of several classical loss functions in ML. Furthermore, the alpha-loss exhibits unique operating characteristics depending on the value (and different regimes) of alpha; notably, for alpha > 1, alpha-loss robustly trains models when noisy training data is present. Thus, the alpha-loss can provide robustness to ML systems for classification tasks, and this has bearing in many applications, e.g., social media, finance, academia, and medicine; indeed, results are presented where alpha-loss produces more robust logistic regression models for COVID-19 survey data with gains over state of the art algorithmic approaches.

ContributorsSypherd, Tyler (Author) / Sankar, Lalitha (Thesis advisor) / Berisha, Visar (Committee member) / Dasarathy, Gautam (Committee member) / Kosut, Oliver (Committee member) / Arizona State University (Publisher)

Created2022

A Secure Protocol for Contact Tracing and Hotspots Histogram Computation

Description

Contact tracing has been shown to be effective in limiting the rate of spread of infectious diseases like COVID-19. Several solutions based on the exchange of random, anonymous tokens between users’ mobile devices via Bluetooth, or using users’ location traces have been proposed and deployed. These solutions require the user device to download the tokens (or traces) of infected users from the server. The user tokens are matched with infected users’ tokens to determine an exposure event. These solutions are vulnerable to a range of security and privacy issues, and require large downloads, thus warranting the need for an efficient protocol with strong privacy guarantees. Moreover, these solutions are based solely on proximity between user devices, while COVID-19 can spread from common surfaces as well. Knowledge of areas with a large number of visits by infected users (hotspots) can help inform users to avoid those areas and thereby reduce surface transmission. This thesis proposes a strong secure system for contact tracing and hotspots histogram computation. The contact tracing protocol uses a combination of Bluetooth Low Energy and Global Positioning System (GPS) location data. A novel and deployment-friendly Delegated Private Set Intersection Cardinality protocol is proposed for efficient and secure server aided matching of tokens. Secure aggregation techniques are used to allow the server to learn areas of high risk from location traces of diagnosed users, without revealing any individual user’s location history.

ContributorsSurana, Chetan (Author) / Trieu, Ni (Thesis advisor) / Sankar, Lalitha (Committee member) / Berisha, Visar (Committee member) / Zhao, Ming (Committee member) / Arizona State University (Publisher)

Created2021

Predicting COVID-19 Using Self-Reported Survey Data

Description

Infectious diseases spread at a rapid rate, due to the increasing mobility of the human population. It is important to have a variety of containment and assessment strategies to prevent and limit their spread. In the on-going COVID-19 pandemic, telehealth services including daily health surveys are used to study the prevalence and severity of the disease. Daily health surveys can also help to study the progression and fluctuation of symptoms as recalling, tracking, and explaining symptoms to doctors can often be challenging for patients. Data aggregates collected from the daily health surveys can be used to identify the surge of a disease in a community. This thesis enhances a well-known boosting algorithm, XGBoost, to predict COVID-19 from the anonymized self-reported survey responses provided by Carnegie Mellon University (CMU) - Delphi research group in collaboration with Facebook. Despite the tremendous COVID-19 surge in the United States, this survey dataset is highly imbalanced with 84% negative COVID-19 cases and 16% positive cases. It is tedious to learn from an imbalanced dataset, especially when the dataset could also be noisy, as seen commonly in self-reported surveys. This thesis addresses these challenges by enhancing XGBoost with a tunable loss function, ?-loss, that interpolates between the exponential loss (? = 1/2), the log-loss (? = 1), and the 0-1 loss (? = ∞). Results show that tuning XGBoost with ?-loss can enhance performance over the standard XGBoost with log-loss (? = 1).

ContributorsVikash Babu, Gokulan (Author) / Sankar, Lalitha (Thesis advisor) / Berisha, Visar (Committee member) / Zhao, Ming (Committee member) / Trieu, Ni (Committee member) / Arizona State University (Publisher)

Created2021

Theses and Dissertations

Filtering by

A Tunable Loss Function for Robust, Rigorous, and Reliable Machine Learning

A Secure Protocol for Contact Tracing and Hotspots Histogram Computation

Predicting COVID-19 Using Self-Reported Survey Data