Search Content

Data Management Behind Machine Learning

Description

This thesis dives into the world of artificial intelligence by exploring the functionality of a single layer artificial neural network through a simple housing price classification example while simultaneously considering its impact from a data management perspective on both the software and hardware level. To begin this study, the universally…

This thesis dives into the world of artificial intelligence by exploring the functionality of a single layer artificial neural network through a simple housing price classification example while simultaneously considering its impact from a data management perspective on both the software and hardware level. To begin this study, the universally accepted model of an artificial neuron is broken down into its key components and then analyzed for functionality by relating back to its biological counterpart. The role of a neuron is then described in the context of a neural network, with equal emphasis placed on how it individually undergoes training and then for an entire network. Using the technique of supervised learning, the neural network is trained with three main factors for housing price classification, including its total number of rooms, bathrooms, and square footage. Once trained with most of the generated data set, it is tested for accuracy by introducing the remainder of the data-set and observing how closely its computed output for each set of inputs compares to the target value. From a programming perspective, the artificial neuron is implemented in C so that it would be more closely tied to the operating system and therefore make the collected profiler data more precise during the program's execution. The program is designed to break down each stage of the neuron's training process into distinct functions. In addition to utilizing more functional code, the struct data type is used as the underlying data structure for this project to not only represent the neuron but for implementing the neuron's training and test data. Once fully trained, the neuron's test results are then graphed to visually depict how well the neuron learned from its sample training set. Finally, the profiler data is analyzed to describe how the program operated from a data management perspective on the software and hardware level.

ContributorsRichards, Nicholas Giovanni (Author) / Miller, Phillip (Thesis director) / Meuth, Ryan (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Exploring Computational Thinking in 9-12 Education: Developing a Computer Science Curriculum for Bioscience High School

Description

Bioscience High School, a small magnet high school located in Downtown Phoenix and a STEAM (Science, Technology, Engineering, Arts, Math) focused school, has been pushing to establish a computer science curriculum for all of their students from freshman to senior year. The school's Mision (Mission and Vision) is to: "..provide…

Bioscience High School, a small magnet high school located in Downtown Phoenix and a STEAM (Science, Technology, Engineering, Arts, Math) focused school, has been pushing to establish a computer science curriculum for all of their students from freshman to senior year. The school's Mision (Mission and Vision) is to: "..provide a rigorous, collaborative, and relevant academic program emphasizing an innovative, problem-based curriculum that develops literacy in the sciences, mathematics, and the arts, thus cultivating critical thinkers, creative problem-solvers, and compassionate citizens, who are able to thrive in our increasingly complex and technological communities." Computational thinking is an important part in developing a future problem solver Bioscience High School is looking to produce. Bioscience High School is unique in the fact that every student has a computer available for him or her to use. Therefore, it makes complete sense for the school to add computer science to their curriculum because one of the school's goals is to be able to utilize their resources to their full potential. However, the school's attempt at computer science integration falls short due to the lack of expertise amongst the math and science teachers. The lack of training and support has postponed the development of the program and they are desperately in need of someone with expertise in the field to help reboot the program. As a result, I've decided to create a course that is focused on teaching students the concepts of computational thinking and its application through Scratch and Arduino programming.

ContributorsLiu, Deming (Author) / Meuth, Ryan (Thesis director) / Nakamura, Mutsumi (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Comparing and Analyzing Electromyography and Electroencephalography

Description

Electromyography (EMG) and Electroencephalography (EEG) are techniques used to detect electrical activity produced by the human body. EMG detects electrical activity in the skeletal muscles, while EEG detects electrical activity from the scalp. The purpose of this study is to capture different types of EMG and EEG signals and to…

Electromyography (EMG) and Electroencephalography (EEG) are techniques used to detect electrical activity produced by the human body. EMG detects electrical activity in the skeletal muscles, while EEG detects electrical activity from the scalp. The purpose of this study is to capture different types of EMG and EEG signals and to determine if the signals can be distinguished between each other and processed into output signals to trigger events in prosthetics. Results from the study suggest that the PSD estimates can be used to compare signals that have significant differences such as the wrist, scalp, and fingers, but it cannot fully distinguish between signals that are closely related, such as two different fingers. The signals that were identified were able to be translated into the physical output simulated on the Arduino circuit.

ContributorsJanis, William Edward (Author) / LaBelle, Jeffrey (Thesis director) / Santello, Marco (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2013-12

Predicting Sneaker Resale Prices using Machine Learning

Description

This thesis dives into the world of machine learning by attempting to create an application that will accurately predict whether or not a sneaker will resell at a profit. To begin this study, I first researched different machine learning algorithms to determine which would be best for this project. After…

This thesis dives into the world of machine learning by attempting to create an application that will accurately predict whether or not a sneaker will resell at a profit. To begin this study, I first researched different machine learning algorithms to determine which would be best for this project. After ultimately deciding on using an artificial neural network, I then moved on to collecting data, using StockX and Twitter. StockX is a platform where individuals can post and resell shoes, while also providing statistics and analytics about each pair of shoes. I used StockX to retrieve data about the actual shoe, which involved retrieving data for the network feature variables: gender, brand, and retail price. Additionally, I also retrieved the data for the average deadstock price for each shoe, which describes what the mean price of new, unworn shoes are selling for on StockX. This data was used with the retail price data to determine whether or not a shoe has been, on average, selling for a profit. I used Twitter’s API to retrieve links to different shoes on StockX along with retrieving the number of favorites and retweets each of those links had. These metrics were used to account for ‘hype’ of the shoe, with shoes traditionally being more profitable the larger the hype surrounding them. After preprocessing the data, I trained the model using a randomized 80% of the data. On average, the model had about a 65-70% accuracy range when tested with the remaining 20% of the data. Once the model was optimized, I saved it and uploaded it to a web application that took in user input for the five feature variables, tested the datapoint using the model, and outputted the confidence in whether or not the shoe would generate a profit.
From a technical perspective, I used Python for the whole project, while also using HTML/CSS for the front-end of the application. As for key packages, I used Keras, an open source neural network library to build the model; data preprocessing was done using sklearn’s various subpackages. All charts and graphs were done using data visualization libraries matplotlib and seaborn. These charts provided insight as to what the final dataset looked like. They showed how the brand distribution is relatively close to what it should be, while the gender distribution was heavily skewed. Future work on this project would involve expanding the dataset, automating the entirety of the data retrieval process, and finally deploying the project on the cloud for users everywhere to use the application.

ContributorsShah, Shail (Author) / Meuth, Ryan (Thesis director) / Nakamura, Mutsumi (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Utilizing Machine Learning Methods to Model Cryptocurrency

Description

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries…

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.

ContributorsBrowning, Jacob Christian (Author) / Meuth, Ryan (Thesis director) / Jones, Donald (Committee member) / McCulloch, Robert (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Machine Learning Enabled Analytics for Health-Related Demographics: a Case Study Identifying Important Factors in Cardiac Disease

Description

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently…

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently in the BioElectrical Systems and Technology Lab, there is a biosensor in development that retrieves and analyzes data manually. In a proof of concept, this project uses the neural network architecture to automatically parse and classify a cardiac disease data set as well as explore health related factors impacting cardiac disease in patients of all ages.

ContributorsMurella, Akhila Sainagaki (Author) / Blain-Christen, Jennifer (Thesis director) / Meuth, Ryan (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / School of the Arts, Media and Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Web Interfacing Water Cooler System

Description

Abstract
This work details the process of designing and implementing an embedded system
utilized to take measurements from a water cooler and post that data onto a publicly accessible web server. It embraces the Web 4.0, Internet of Things, mindset of making everyday appliances web accessible. The project was designed to satisfy…

Abstract
This work details the process of designing and implementing an embedded system
utilized to take measurements from a water cooler and post that data onto a publicly accessible web server. It embraces the Web 4.0, Internet of Things, mindset of making everyday appliances web accessible. The project was designed to satisfy the needs of a local faculty member who wished to know the water levels available in his office water cooler, potentially saving him the disappointment of discovering an empty container.  

This project utilizes an Arduino microprocessor, an ESP 8266 Wi-Fi module, and a variety of sensors to detect water levels in filtered water unit located on the fourth floor of the the Brickyard Building, BYENG, at Arizona State University. This implementation will not interfere with the system already set in place to store and transfer water. The level of accuracy in water levels is expected to give the ability to discern +/- 1.5 liters of water. This system will send will send information to a created web service from which anyone with internet capabilities can gain access. The interface will display current water levels and attempt to predict at what time the water levels will be depleted. In the short term, this information will be useful for individuals on the floor to discern when they are able to extract water from the system. Overtime, the information this system gathers will map the drinking trends of the floor and can allow for a scheduling of water delivery that is more consistent with the demand of those working on the floor.

ContributorsEnriquez, Alexander (Author) / Meuth, Ryan (Thesis director) / Burger, Kevin (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

US Forest Fire Size Prediction using Machine Learning

Description

The number of extreme wildfires is on the rise globally, and predicting the size of a fire will help officials make appropriate decisions to mitigate the risk the fire poses against the environment and humans. This study attempts to find the burned area of fires in the United States based…

The number of extreme wildfires is on the rise globally, and predicting the size of a fire will help officials make appropriate decisions to mitigate the risk the fire poses against the environment and humans. This study attempts to find the burned area of fires in the United States based on attributes such as time, weather, and location of the fire using machine learning methods.

ContributorsPrabagaran, Padma (Author, Co-author) / Meuth, Ryan (Thesis director) / McCulloch, Robert (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-12

Improving Crowdsourcing-Based Stock Price Predictions through Expanded Input Elicitation and Machine Learning

Description

This study aims to combine the wisdom of crowds with ML to make more accurate stock price predictions for a select set of stocks. Different from prior works, this study uses different input elicitation techniques to improve crowd performance. In addition, machine learning is used to support the crowd. The…

This study aims to combine the wisdom of crowds with ML to make more accurate stock price predictions for a select set of stocks. Different from prior works, this study uses different input elicitation techniques to improve crowd performance. In addition, machine learning is used to support the crowd. The influence of ML on the crowd is tested by priming participants with suggestions from an ML model. Lastly, the market conditions and stock popularity is observed to better understand crowd behavior.

ContributorsBhogaraju, Harika (Author) / Escobedo, Adolfo R (Thesis director) / Meuth, Ryan (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-12

Machine Learning: A Sentiment Analysis of Customer Reviews

Description

Machine learning is the process of training a computer with algorithms to learn from data and make informed predictions. In a world where large amounts of data are constantly collected, machine learning is an important tool to analyze this data to find patterns and learn useful information from it. Machine…

Machine learning is the process of training a computer with algorithms to learn from data and make informed predictions. In a world where large amounts of data are constantly collected, machine learning is an important tool to analyze this data to find patterns and learn useful information from it. Machine learning applications expand to numerous fields; however, I chose to focus on machine learning with a business perspective for this thesis, specifically e-commerce.

The e-commerce market utilizes information to target customers and drive business. More and more online services have become available, allowing consumers to make purchases and interact with an online system. For example, Amazon is one of the largest Internet-based retail companies. As people shop through this website, Amazon gathers huge amounts of data on its customers from personal information to shopping history to viewing history. After purchasing a product, the customer may leave reviews and give a rating based on their experience. Performing analytics on all of this data can provide insights into making more informed business and marketing decisions that can lead to business growth and also improve the customer experience.
For this thesis, I have trained binary classification models on a publicly available product review dataset from Amazon to predict whether a review has a positive or negative sentiment. The sentiment analysis process includes analyzing and encoding the human language, then extracting the sentiment from the resulting values. In the business world, sentiment analysis provides value by revealing insights into customer opinions and their behaviors. In this thesis, I will explain how to perform a sentiment analysis and analyze several different machine learning models. The algorithms for which I compared the results are KNN, Logistic Regression, Decision Trees, Random Forest, Naïve Bayes, Linear Support Vector Machines, and Support Vector Machines with an RBF kernel.

ContributorsMadaan, Shreya (Author) / Meuth, Ryan (Thesis director) / Nakamura, Mutsumi (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Dean, W.P. Carey School of Business (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Filtering by