Search Content

Machine Learning Methods for Prediction of Physical System Behavior

Description

The advancement and marked increase in the use of computing devices in health care for large scale and personal medical use has transformed the field of medicine and health care into a data rich domain. This surge in the availability of data has allowed domain experts to investigate, study and…

The advancement and marked increase in the use of computing devices in health care for large scale and personal medical use has transformed the field of medicine and health care into a data rich domain. This surge in the availability of data has allowed domain experts to investigate, study and discover inherent patterns in diseases from new perspectives and in turn, further the field of medicine. Storage and analysis of this data in real time aids in enhancing the response time and efficiency of doctors and health care specialists. However, due to the time critical nature of most life- threatening diseases, there is a growing need to make informed decisions prior to the occurrence of any fatal outcome. Alongside time sensitivity, analyzing data specific to diseases and their effects on an individual basis leads to more efficient prognosis and rapid deployment of cures. The primary challenge in addressing both of these issues arises from the time varying and time sensitive nature of the data being studied and in the ability to successfully predict anomalous events using only observed data.This dissertation introduces adaptive machine learning algorithms that aid in the prediction of anomalous situations arising due to abnormalities present in patients diagnosed with certain types of diseases. Emphasis is given to the adaptation and development of algorithms based on an individual basis to further the accuracy of all predictions made. The main objectives are to learn the underlying representation of the data using empirical methods and enhance it using domain knowledge. The learned model is then utilized as a guide for statistical machine learning methods to predict the occurrence of anomalous events in the near future. Further enhancement of the learned model is achieved by means of tuning the objective function of the algorithm to incorporate domain knowledge. Along with anomaly forecasting using multi-modal data, this dissertation also investigates the use of univariate time series data towards the prediction of onset of diseases using Bayesian nonparametrics.

ContributorsDas, Subhasish (Author) / Gupta, Sandeep K.S. (Thesis advisor) / Banerjee, Ayan (Committee member) / Indic, Premananda (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2022

Improving Crowdsourcing-Based Stock Price Predictions through Expanded Input Elicitation and Machine Learning

Description

This study aims to combine the wisdom of crowds with ML to make more accurate stock price predictions for a select set of stocks. Different from prior works, this study uses different input elicitation techniques to improve crowd performance. In addition, machine learning is used to support the crowd. The…

This study aims to combine the wisdom of crowds with ML to make more accurate stock price predictions for a select set of stocks. Different from prior works, this study uses different input elicitation techniques to improve crowd performance. In addition, machine learning is used to support the crowd. The influence of ML on the crowd is tested by priming participants with suggestions from an ML model. Lastly, the market conditions and stock popularity is observed to better understand crowd behavior.

ContributorsBhogaraju, Harika (Author) / Escobedo, Adolfo R (Thesis director) / Meuth, Ryan (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-12

Analyzing Failure Modes of Inscrutable Machine Learning Models

Description

Machine learning models and in specific, neural networks, are well known for being inscrutable in nature. From image classification tasks and generative techniques for data augmentation, to general purpose natural language models, neural networks are currently the algorithm of preference that is riding the top of the current artificial intelligence…

Machine learning models and in specific, neural networks, are well known for being inscrutable in nature. From image classification tasks and generative techniques for data augmentation, to general purpose natural language models, neural networks are currently the algorithm of preference that is riding the top of the current artificial intelligence (AI) wave, having experienced the greatest boost in popularity above any other machine learning solution. However, due to their inscrutable design based on the optimization of millions of parameters, it is ever so complex to understand how their decision is influenced nor why (and when) they fail. While some works aim at explaining neural network decisions or making systems to be inherently interpretable the great majority of state of the art machine learning works prioritize performance over interpretability effectively becoming black boxes. Hence, there is still uncertainty in the decision boundaries of these already deployed solutions whose predictions should still be analyzed and taken with care. This becomes even more important when these models are used on sensitive scenarios such as medicine, criminal justice, settings with native inherent social biases or where egregious mispredictions can negatively impact the system or human trust down the line. Thus, the aim of this work is to provide a comprehensive analysis on the failure modes of the state of the art neural networks from three domains: large image classifiers and their misclassifications, generative adversarial networks when used for data augmentation and transformer networks applied to structured representations and reasoning about actions and change.

ContributorsOlmo Hernandez, Alberto (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Li, Baoxin (Committee member) / Sengupta, Sailik (Committee member) / Arizona State University (Publisher)

Created2022

An Evaluation of Machine Learning Algorithms for Cardiovascular Disease Detection

Description

This thesis aims to advance healthcare and heart disease prevention by utilizing the Python programming language and various machine learning algorithms for heart disease detection. Being one of the main causes of death worldwide, cardiovascular disease is a serious global health concern. One person passes away from cardiovascular disease every…

This thesis aims to advance healthcare and heart disease prevention by utilizing the Python programming language and various machine learning algorithms for heart disease detection. Being one of the main causes of death worldwide, cardiovascular disease is a serious global health concern. One person passes away from cardiovascular disease every 33 seconds in the United States alone. As the leading cause of death, early identification becomes critical for early intervention and prevention. The study addresses key research questions, including the role of machine learning in enhancing heart disease detection, comparative analysis of the six machine learning models, and the importance of predictive indicators. By leveraging machine learning algorithms for medical data interpretation, the thesis contributes insights into early disease detection.

ContributorsLa, Nikki (Author) / Sheehan, Connor (Thesis director) / Connor, Dylan (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2024-05

PyAntiPhish: A Python-Based Machine Learning Detector of Phishing Websites and An Examination of Relevant URL-Based Features

Description

Phishing is one of most common and effective attack vectors in modern cybercrime. Rather than targeting a technical vulnerability in a computer system, phishing attacks target human behavioral or emotional tendencies through manipulative emails, text messages, or phone calls. Through PyAntiPhish, I attempt to create my own version of an…

Phishing is one of most common and effective attack vectors in modern cybercrime. Rather than targeting a technical vulnerability in a computer system, phishing attacks target human behavioral or emotional tendencies through manipulative emails, text messages, or phone calls. Through PyAntiPhish, I attempt to create my own version of an anti-phishing solution, through a series of experiments testing different machine learning classifiers and URL features. With an end-goal implementation as a Chromium browser extension utilizing Python-based machine learning classifiers (those available via the scikit-learn library), my project uses a combination of Python, TypeScript, Node.js, as well as AWS Lambda and API Gateway to act as a solution capable of blocking phishing attacks from the web browser.

ContributorsYang, Branden (Author) / Osburn, Steven (Thesis director) / Malpe, Adwith (Committee member) / Ahn, Gail-Joon (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2024-05

Data-Driven Sustainability: A Machine Learning Approach to Assessing ESG Performance in B Corporations

Description

The purpose of this research is to create predictive models for a leading sustainability certification - the B Corporation certification issued by the non-profit company B Lab based on the B Impact Assessment. This certification is one of many that are currently being used to assess sustainability in the corporate…

The purpose of this research is to create predictive models for a leading sustainability certification - the B Corporation certification issued by the non-profit company B Lab based on the B Impact Assessment. This certification is one of many that are currently being used to assess sustainability in the corporate world, and this research seeks to understand the relationships between a corporation's characteristics (e.g. market, size, country) and the B Certification. The data used for the analysis comes from a B Lab upload to data.world, providing descriptive information on each company, current certification status, and B Impact Assessment scores. Further data engineering was used to include attributes on publicly traded status and years certified. Comparing Logistic Regression and Random Forest Classification machine learning methods, a predictive model was produced with 87.58% accuracy discerning between certified and de-certified B Corporations.

ContributorsBrandwick, Katelynn (Author) / Samara, Marko (Thesis director) / Tran, Samantha (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2024-05

Applications of Machine Learning to Botanical Classification

Description

In the field of botany, it is often necessary for plants to be identified based on their phenotypical characteristics, whether in person or using previously collected image samples. This work can be tedious and challenging for a human botanist to complete, as datasets can be large and several species of…

In the field of botany, it is often necessary for plants to be identified based on their phenotypical characteristics, whether in person or using previously collected image samples. This work can be tedious and challenging for a human botanist to complete, as datasets can be large and several species of plants strongly resemble each other. Various machine learning techniques, both supervised and unsupervised, can address this task with varying degrees of accuracy and efficiency thanks to their ability to identify subtle patterns in data. The objective of this research is to both conduct a review of previous studies that measure the effectiveness of various machine learning methods for plant identification and to build and test various models to draw up a comparison of the accuracies and efficiencies of the set of techniques. A review of the existing literature found that any of the studied machine learning techniques can yield a high level of accuracy when used in the correct situations and on a suitable dataset. The results gathered from the models built from this research show that all else being equal, complex convolutional neural networks perform the best on this task, yielding an accuracy of 85.4% on the larger dataset. The other models tested in descending order of accuracy on the same dataset are k-nearest neighbors, random forest, k-means clustering, and a decision tree classifier.

ContributorsOlsen, Laela (Author) / Carter, Lynn Robert (Thesis director) / Bhargav, Vishnu (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2024-05

Utilization of Deep Neural Networks to Investigate Sex-Dependent and Cerebellar Modulation Impacts on Social Behavior in Mice

Description

The cerebellum is recognized for its role in motor movement, balance, and more recently, social behavior. Cerebellar injury at birth and during critical periods reduces social preference in animal models and increases the risk of autism in humans. Social behavior is commonly assessed with the three-chamber test, where a mouse…

The cerebellum is recognized for its role in motor movement, balance, and more recently, social behavior. Cerebellar injury at birth and during critical periods reduces social preference in animal models and increases the risk of autism in humans. Social behavior is commonly assessed with the three-chamber test, where a mouse travels between chambers that contain a conspecific and an object confined under a wire cup. However, this test is unable to quantify interactive behaviors between pairs of mice, which could not be tracked until the recent development of machine learning programs that track animal behavior. In this study, both the three-chamber test and a novel freely-moving social interaction test assessed social behavior in untreated male and female mice, as well as in male mice injected with hM3Dq (excitatory) DREADDs. In the three-chamber test, significant differences were found in the time spent (female: p < 0.05, male: p < 0.001) and distance traveled (female: p < 0.05, male: p < 0.001) in the chamber with the familiar conspecific, compared to the chamber with the object, for untreated male, untreated female, and mice with activated hM3Dq DREADDs. A social memory test was added, where the object was replaced with a novel mouse. Untreated male mice spent significantly more time (p < 0.05) and traveled a greater distance (p < 0.05) in the chamber with the novel mouse, while male mice with activated hM3Dq DREADDs spent more time (p<0.05) in the chamber with the familiar conspecific. Data from the freely-moving social interaction test was used to calculate freely-moving interactive behaviors between pairs of mice and interactions with an object. No sex differences were found, but mice with excited hM3Dq DREADDs engaged in significantly more anogenital sniffing (p < 0.05) and side-side contact (p < 0.05) behaviors. All these results indicate how machine learning allows for nuanced insights into how both sex and chemogenetic excitation impact social behavior in freely-moving mice.

ContributorsNelson, Megan (Author) / Verpeut, Jessica (Thesis director) / Bimonte-Nelson, Heather (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor) / School of Life Sciences (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2024-05

Implementation of Machine Learning on Low Power Microcontrollers

Description

Machine learning has been increasingly integrated into several new areas, namely those related to vision processing and language learning models. These implementations of these processes in new products have demanded increasingly more expensive memory usage and computational requirements. Microcontrollers can lower this increasing cost. However, implementation of such a system…

Machine learning has been increasingly integrated into several new areas, namely those related to vision processing and language learning models. These implementations of these processes in new products have demanded increasingly more expensive memory usage and computational requirements. Microcontrollers can lower this increasing cost. However, implementation of such a system on a microcontroller is difficult and has to be culled appropriately in order to find the right balance between optimization of the system and allocation of resources present in the system. A proof of concept that these algorithms can be implemented on such as system will be attempted in order to find points of contention of the construction of such a system on such limited hardware, as well as the steps taken to enable the usage of machine learning onto a limited system such as the general purpose MSP430 from Texas Instruments.

ContributorsMalcolm, Ian (Author) / Allee, David (Thesis director) / Spanias, Andreas (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2024-05

Novel Data-driven Emulator for Predicting Microstructure Evolutions

Description

Phase-field (PF) models are one of the most powerful tools to simulate microstructural evolution in metallic materials, polymers, and ceramics. However, existing PF approaches rely on rigorous mathematical model development, sophisticated numerical schemes, and high-performance computing for accuracy. Although recently developed surrogate microstructure models employ deep-learning techniques and reconstruction of…

Phase-field (PF) models are one of the most powerful tools to simulate microstructural evolution in metallic materials, polymers, and ceramics. However, existing PF approaches rely on rigorous mathematical model development, sophisticated numerical schemes, and high-performance computing for accuracy. Although recently developed surrogate microstructure models employ deep-learning techniques and reconstruction of microstructures from lower-dimensional data, their accuracy is fairly limited as spatio-temporal information is lost in the pursuit of dimensional reduction. Given these limitations, a novel data-driven emulator (DDE) for extrapolation prediction of microstructural evolution is presented, which combines an image-based convolutional and recurrent neural network (CRNN) with tensor decomposition, while leveraging previously obtained PF datasets for training. To assess the robustness of DDE, the emulation sequence and the scaling behavior with phase-field simulations for several noisy initial states are compared. In conclusion, the effectiveness of the microstructure emulation technique is explored in the context of accelerating runtime, along with an emphasis on its trade-off with accuracy.Meanwhile, an interpolation DDE has also been tested, which is based on obtaining a low-dimensional representation of the microstructures via tensor decomposition and subsequently predicting the microstructure evolution in the low-dimensional space using Gaussian process regression (GPR). Once the microstructure predictions are obtained in the low-dimensional space, a hybrid input-output phase retrieval algorithm will be employed to reconstruct the microstructures. As proof of concept, the results on microstructure prediction for spinodal decomposition are presented, although the method itself is agnostic of the material parameters. Results show that GPR-based DDE model are able to predict microstructure evolution sequences that closely resemble the true microstructures (average normalized mean square of 6.78 × 10−7) at time scales half of that employed in obtaining training data. This data-driven microstructure emulator opens new avenues to predict the microstructural evolution by leveraging phase-field simulations and physical experimentation where the time resolution is often quite large due to limited resources and physical constraints, such as the phase coarsening experiments previously performed in microgravity. Future work will also be discussed and demonstrate the intended utilization of these two approaches for 3D microstructure prediction through their combined application.

ContributorsWu, Peichen (Author) / Ankit, Kumar (Thesis advisor) / Iquebal, Ashif (Committee member) / Jiao, Yang (Committee member) / Zhuang, Houlong (Committee member) / Arizona State University (Publisher)

Created2024

Filtering by