Barrett, The Honors College Thesis/Creative Project Collection
Barrett, The Honors College at Arizona State University proudly showcases the work of undergraduate honors students by sharing this collection exclusively with the ASU community.
Barrett accepts high performing, academically engaged undergraduate students and works with them in collaboration with all of the other academic units at Arizona State University. All Barrett students complete a thesis or creative project which is an opportunity to explore an intellectual interest and produce an original piece of scholarly research. The thesis or creative project is supervised and defended in front of a faculty committee. Students are able to engage with professors who are nationally recognized in their fields and committed to working with honors students. Completing a Barrett thesis or creative project is an opportunity for undergraduate honors students to contribute to the ASU academic community in a meaningful way.
Filtering by
- All Subjects: Machine Learning
Leveraging Machine Learning and Wireless Sensing for Robot Localization - Location Variance Analysis
Modern communication networks heavily depend upon an estimate of the communication channel, which represents the distortions that a transmitted signal takes as it moves towards a receiver. A channel can become quite complicated due to signal reflections, delays, and other undesirable effects and, as a result, varies significantly with each different location. This localization system seeks to take advantage of this distinctness by feeding channel information into a machine learning algorithm, which will be trained to associate channels with their respective locations. A device in need of localization would then only need to calculate a channel estimate and pose it to this algorithm to obtain its location.
As an additional step, the effect of location noise is investigated in this report. Once the localization system described above demonstrates promising results, the team demonstrates that the system is robust to noise on its location labels. In doing so, the team demonstrates that this system could be implemented in a continued learning environment, in which some user agents report their estimated (noisy) location over a wireless communication network, such that the model can be implemented in an environment without extensive data collection prior to release.
This project considers the FPGA implementations of MLP and CNN feedforward. While FPGAs provide significant performance improvements, they come at a substantial financial cost. We explore the options of implementing these algorithms on a smaller budget. We successfully implement a multilayer perceptron that identifies handwritten digits from the MNIST dataset on a student-level DE10-Lite FPGA with a test accuracy of 91.99%. We also apply our trained network to external image data loaded through a webcam and a Raspberry Pi, but we observe lower test accuracy in these images. Later, we consider the requirements necessary to implement a more elaborate convolutional neural network on the same FPGA. The study deems the CNN implementation feasible in the criteria of memory requirements and basic architecture. We suggest the CNN implementation on the same FPGA to be worthy of further exploration.
Methods: The standard NLP process was used for this study in which a gold standard was reached through matched paired annotations of the forum text in brat and a neural network was trained on the content. Following the annotation process, adjudication occurred to increase the inter-annotator agreement. Categories were developed by local physicians to describe the questions and three pilots were run to test the best way to categorize the questions.
Results: The inter-annotator agreement, calculated via F-score, before adjudication for a 0.7 threshold was 0.378 for the annotation activity. After adjudication at a threshold of 0.7, the inter-annotator agreement increased to 0.560. Pilots 1, 2, and 3 of the categorization activity had an inter-annotator agreement of 0.375, 0.5, and 0.966 respectively.
Discussion: The inter-annotator agreement of the annotation activity may have been low initially since the annotators were students who may have not been as invested in the project as necessary to accurately annotate the text. Also, as everyone interprets the text slightly differently, it is possible that that contributed to the differences in the matched pairs’ annotations. The F-score variation for the categorization activity partially had to do with different delivery systems of the instructions and partially with the area of study of the participants. The first pilot did not mandate the use of the original context located in brat and the instructions were provided in the form of a downloadable document. The participants were computer science graduate students. The second pilot also had the instructions delivered via a document, but it was strongly suggested that the context be used to gain an understanding of the questions’ meanings. The participants were also computer science graduate students who upon a discussion of their results after the pilot expressed that they did not have a good understanding of the medical jargon in the posts. The final pilot used a combination of students with and without medical background, required to use the context, and included verbal instructions in combination with the written ones. The combination of these factors increased the F-score significantly. For a full-scale experiment, students with a medical background should be used to categorize the questions.
The e-commerce market utilizes information to target customers and drive business. More and more online services have become available, allowing consumers to make purchases and interact with an online system. For example, Amazon is one of the largest Internet-based retail companies. As people shop through this website, Amazon gathers huge amounts of data on its customers from personal information to shopping history to viewing history. After purchasing a product, the customer may leave reviews and give a rating based on their experience. Performing analytics on all of this data can provide insights into making more informed business and marketing decisions that can lead to business growth and also improve the customer experience.
For this thesis, I have trained binary classification models on a publicly available product review dataset from Amazon to predict whether a review has a positive or negative sentiment. The sentiment analysis process includes analyzing and encoding the human language, then extracting the sentiment from the resulting values. In the business world, sentiment analysis provides value by revealing insights into customer opinions and their behaviors. In this thesis, I will explain how to perform a sentiment analysis and analyze several different machine learning models. The algorithms for which I compared the results are KNN, Logistic Regression, Decision Trees, Random Forest, Naïve Bayes, Linear Support Vector Machines, and Support Vector Machines with an RBF kernel.
(LC-MS/MS) is used to identify and quantify peptides and proteins. LC-MS/MS produces mass spectra, which must be searched by one or more engines, which employ
algorithms to match spectra to theoretical spectra derived from a reference database.
These engines identify and characterize proteins and their component peptides. By
training a convolutional neural network on a dataset of over 6 million MS/MS spectra
derived from human proteins, we aim to create a tool that can quickly and effectively
identify spectra as peptides prior to database searching. This can significantly reduce search space and thus run time for database searches, thereby accelerating LCMS/MS-based proteomics data acquisition. Additionally, by training neural networks
on labels derived from the search results of three different database search engines, we
aim to examine and compare which features are best identified by individual search
engines, a neural network, or a combination of these.