Barrett, The Honors College at Arizona State University proudly showcases the work of undergraduate honors students by sharing this collection exclusively with the ASU community.

Barrett accepts high performing, academically engaged undergraduate students and works with them in collaboration with all of the other academic units at Arizona State University. All Barrett students complete a thesis or creative project which is an opportunity to explore an intellectual interest and produce an original piece of scholarly research. The thesis or creative project is supervised and defended in front of a faculty committee. Students are able to engage with professors who are nationally recognized in their fields and committed to working with honors students. Completing a Barrett thesis or creative project is an opportunity for undergraduate honors students to contribute to the ASU academic community in a meaningful way.

Displaying 1 - 2 of 2
Filtering by

Clear all filters

Description

In this thesis, six experiments which were computer simulations were conducted in order to replicate the negative association between sample size and accuracy that is repeatedly found in ML literature by accounting for data leakage and publication bias. The reason why it is critical to understand why this negative association

In this thesis, six experiments which were computer simulations were conducted in order to replicate the negative association between sample size and accuracy that is repeatedly found in ML literature by accounting for data leakage and publication bias. The reason why it is critical to understand why this negative association is occurring is that in published studies, there have been multiple reports that the accuracies in ML models are overoptimistic leading to cases where the results are irreproducible despite conducting multiple trials and experiments. Additionally, after replicating the negative association between sample size and accuracy, parametric curves (learning curves with the parametric function) were fitted along the empirical learning curves in order to evaluate the performance. It was found that there is a significant variance in accuracies when the sample size is small, but little to no variation when the sample size is large. In other words, the empirical learning curves with data leakage and publication bias were able to achieve the same accuracy as the learning curve without data leakage at a large sample size.

ContributorsKottooru, Rishab (Author) / Berisha, Visar (Thesis director) / Dasarathy, Gautam (Committee member) / Saidi, Pouria (Committee member) / Barrett, The Honors College (Contributor) / Chemical Engineering Program (Contributor)
Created2023-05
161220-Thumbnail Image.png
Description

Classification in machine learning is quite crucial to solve many problems that the world is presented with today. Therefore, it is key to understand one’s problem and develop an efficient model to achieve a solution. One technique to achieve greater model selection and thus further ease in problem solving is

Classification in machine learning is quite crucial to solve many problems that the world is presented with today. Therefore, it is key to understand one’s problem and develop an efficient model to achieve a solution. One technique to achieve greater model selection and thus further ease in problem solving is estimation of the Bayes Error Rate. This paper provides the development and analysis of two methods used to estimate the Bayes Error Rate on a given set of data to evaluate performance. The first method takes a “global” approach, looking at the data as a whole, and the second is more “local”—partitioning the data at the outset and then building up to a Bayes Error Estimation of the whole. It is found that one of the methods provides an accurate estimation of the true Bayes Error Rate when the dataset is at high dimension, while the other method provides accurate estimation at large sample size. This second conclusion, in particular, can have significant ramifications on “big data” problems, as one would be able to clarify the distribution with an accurate estimation of the Bayes Error Rate by using this method.

ContributorsLattus, Robert (Author) / Dasarathy, Gautam (Thesis director) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)
Created2021-12