Search Content

Quantum Machine Learning for Brain Tumor Detection

Description

Recent advances in quantum computing have broadened the available techniques towards addressing existing computing problems. One area of interest is that of the emerging field of machine learning. The intersection of these fields, quantum machine learning, has the ability to perform high impact work such as that in the health…

Recent advances in quantum computing have broadened the available techniques towards addressing existing computing problems. One area of interest is that of the emerging field of machine learning. The intersection of these fields, quantum machine learning, has the ability to perform high impact work such as that in the health industry. Use cases seen in previous research include that of the detection of illnesses in medical imaging through image classification. In this work, we explore the utilization of a hybrid quantum-classical approach for the classification of brain Magnetic Resonance Imaging (MRI) images for brain tumor detection utilizing public Kaggle datasets. More specifically, we aim to assess the performance and utility of a hybrid model, comprised of a classical pretrained portion and a quantum variational circuit. We will compare these results to purely classical approaches, one utilizing transfer learning and one without, for the stated datasets. While more research should be done for proving generalized quantum advantage, our work shows potential quantum advantages in validation accuracy and sensitivity for the specified task, particularly when training with limited data availability in a minimally skewed dataset under specific conditions. Utilizing the IBM’s Qiskit Runtime Estimator with built in error mitigation, our experiments on a physical quantum system confirmed some results generated through simulations.

ContributorsDiaz, Maryannette (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Examining the usage of New NLP Techniques to process Raw Text Data Entries

Description

2018, Google researchers published the BERT (Bidirectional Encoder Representations from Transformers) model, which has since served as a starting point for hundreds of NLP (Natural Language Processing) related experiments and other derivative models. BERT was trained on masked-language modelling (sentence prediction) but its capabilities extend to more common NLP tasks,…

2018, Google researchers published the BERT (Bidirectional Encoder Representations from Transformers) model, which has since served as a starting point for hundreds of NLP (Natural Language Processing) related experiments and other derivative models. BERT was trained on masked-language modelling (sentence prediction) but its capabilities extend to more common NLP tasks, such as language inference and text classification. Naralytics is a company that seeks to use natural language in order to be able to categorize users who create text into multiple categories – which is a modified version of classification. However, the text that Naralytics seeks to pull from exceed the maximum token length of 512 tokens that BERT supports – so this report discusses the research towards multiple BERT derivatives that seek to address this problem – and then implements a solution that addresses the multiple concerns that are attached to this kind of model.

ContributorsNgo, Nicholas (Author) / Carter, Lynn (Thesis director) / Lee, Gyou-Re (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Economics Program in CLAS (Contributor)

Created2023-05

Deriving Intrinsic Baseball Pitcher Value By Predicting Pitcher Performance From Individual Pitch Metrics

Description

Historically, the predominant strategy for evaluating baseball pitchers has been through statistics created directly from the offensive production against the pitcher, such as ERA. Such statistics are inherently relative to the abilities and competition level of the opposing offense and the field defense, which the pitcher has no control over,…

Historically, the predominant strategy for evaluating baseball pitchers has been through statistics created directly from the offensive production against the pitcher, such as ERA. Such statistics are inherently relative to the abilities and competition level of the opposing offense and the field defense, which the pitcher has no control over, making it difficult to compare pitchers across leagues. In this paper, I use cutting edge pitch-tracking data to develop a pitch evaluation model that is intrinsic to the attributes of the pitches themselves, and not influenced directly by the outcomes of each individual pitch. I train four different classifiers to predict the probability of each pitch belonging to different subsets of outcomes, then multiply the probability of each outcome by that outcome’s average run value to arrive at an expected run value for the pitch. I compare the performance of each classifier to a baseline, examine the most impactful features, and compare the top pitchers identified by the model to those identified by a different baseball statistics resource, ultimately concluding that three of the four classification models are productive and that the overall intrinsic evaluation model accurately identifies the sports top performers.

ContributorsSmith, Roman (Author) / Shakarian, Paulo (Thesis director) / Macdonald, Brian (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Founders Lab: Simple Stocks

Description

This thesis project focuses on the creation and assessment of the "Simple Stocks" app, a straightforward investment tool specifically developed for people who are new to investing and find it challenging to comprehend the complexities of the stock market. We identified a significant gap in the availability of easy-to-understand resources…

This thesis project focuses on the creation and assessment of the "Simple Stocks" app, a straightforward investment tool specifically developed for people who are new to investing and find it challenging to comprehend the complexities of the stock market. We identified a significant gap in the availability of easy-to-understand resources and information for beginner investors, which led us to design an app that provides clear and simple data, professional advice from financial analysts, and an advanced machine learning feature to predict stock trends. The "Simple Stocks" app also incorporates a voting feature, allowing users to see what other investors think about specific stocks. This functionality not only helps users make informed decisions but also encourages a sense of community, as users can learn from each other's experiences and opinions. By creating a supportive environment, the app promotes a more approachable and enjoyable experience for those who are new to investing. Following the successful release of the "Simple Stocks'' app on the App Store, our current objectives include expanding the user base and looking into various ways to generate income. One possible approach is to collaborate with other companies and establish an advertising-based revenue model, which would benefit both parties by attracting more users and increasing profits.

ContributorsBiyani, Saloni (Author) / Karuppiah, Meena (Co-author) / Kancherla, Sohan (Co-author) / Byrne, Jared (Thesis director) / Lee, Christopher (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Machine Learning-Based Approach to Predictive Modeling for Energy Access

Description

Energy poverty is a pressing issue in agricultural areas that affects the livelihoods of millions of people worldwide. The lack of access to modern energy services in rural communities hinders the development of the agricultural sector and limits economic opportunities. To address this issue, this thesis aims to develop a…

Energy poverty is a pressing issue in agricultural areas that affects the livelihoods of millions of people worldwide. The lack of access to modern energy services in rural communities hinders the development of the agricultural sector and limits economic opportunities. To address this issue, this thesis aims to develop a predictive modeling framework using machine learning techniques to identify feasible interventions that can improve energy access in specific rural agricultural regions. Machine learning plays a pivotal role in addressing energy poverty in rural agricultural regions. By leveraging the power of advanced data analytics and predictive modeling, machine learning algorithms can analyze vast datasets related to energy usage, agricultural practices, geographic factors, and socioeconomic conditions. These algorithms can uncover valuable insights and patterns that are not readily apparent through traditional analytical methods. Moreover, machine learning enables the development of predictive models that can forecast energy demand and identify optimal strategies for improving energy access in rural areas. These models can take into account various variables, such as crop cycles, weather conditions, and community needs, to recommend interventions that are tailored to the specific requirements of each region.

ContributorsKonatam, Saisumana (Author) / Osburn, Steven (Thesis director) / Kerner, Hanah (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-12

A Graph-Based Machine Learning Approach to Realistic Traffic Volume Generation

Description

In this work, we explore the potential for realistic and accurate generation of hourly traffic volume with machine learning (ML), using the ground-truth data of Manhattan road segments collected by the New York State Department of Transportation (NYSDOT). Specifically, we address the following question– can we develop a ML algorithm…

In this work, we explore the potential for realistic and accurate generation of hourly traffic volume with machine learning (ML), using the ground-truth data of Manhattan road segments collected by the New York State Department of Transportation (NYSDOT). Specifically, we address the following question– can we develop a ML algorithm that generalizes the existing NYSDOT data to all road segments in Manhattan?– by introducing a supervised learning task of multi-output regression, where ML algorithms use road segment attributes to predict hourly traffic volume. We consider four ML algorithms– K-Nearest Neighbors, Decision Tree, Random Forest, and Neural Network– and hyperparameter tune by evaluating the performances of each algorithm with 10-fold cross validation. Ultimately, we conclude that neural networks are the best-performing models and require the least amount of testing time. Lastly, we provide insight into the quantification of “trustworthiness” in a model, followed by brief discussions on interpreting model performance, suggesting potential project improvements, and identifying the biggest takeaways. Overall, we hope our work can serve as an effective baseline for realistic traffic volume generation, and open new directions in the processes of supervised dataset generation and ML algorithm design.

ContributorsOtstot, Kyle (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

An Introduction to Unstructured Case Management

Description

In the age of information, collecting and processing large amounts of data is an integral part of running a business. From training artificial intelligence to driving decision making, the applications of data are far-reaching. However, it is difficult to process many types of data; namely, unstructured data. Unstructured data is…

In the age of information, collecting and processing large amounts of data is an integral part of running a business. From training artificial intelligence to driving decision making, the applications of data are far-reaching. However, it is difficult to process many types of data; namely, unstructured data. Unstructured data is “information that either does not have a predefined data model or is not organized in a pre-defined manner” (Balducci & Marinova 2018). Such data are difficult to put into spreadsheets and relational databases due to their lack of numeric values and often come in the form of text fields written by the consumers (Wolff, R. 2020). The goal of this project is to help in the development of a machine learning model to aid CommonSpirit Health and ServiceNow, hence why this approach using unstructured data was selected. This paper provides a general overview of the process of unstructured data management and explores some existing implementations and their efficacy. It will then discuss our approach to converting unstructured cases into usable data that were used to develop an artificial intelligence model which is estimated to be worth $400,000 and save CommonSpirit Health $1,200,000 in organizational impact.

ContributorsBergsagel, Matteo (Author) / De Waard, Jan (Co-author) / Chavez-Echeagaray, Maria Elena (Thesis director) / Burns, Christopher (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

The Efficacy of Different Timesteps in Data when Predicting Cryptocurrency Prices

Description

This thesis serves as an experimental investigation into the potential of machine learning through attempting to predict the future price of a cryptocurrency. Through the use of web scraping, short interval data was collected on both Bitcoin and Dogecoin. Dogecoin was the dataset that was eventually used in this thesis…

This thesis serves as an experimental investigation into the potential of machine learning through attempting to predict the future price of a cryptocurrency. Through the use of web scraping, short interval data was collected on both Bitcoin and Dogecoin. Dogecoin was the dataset that was eventually used in this thesis due to its relative stability compared to Bitcoin. At the time of the data collection, Bitcoin became a much more frequent topic in the media and had more significant fluctuations due to it. The data was processed into consistent three separate, consistent timesteps, and used to generate predictive models. The models were able to accurately predict test data given all the preceding test data but were unable to autoregressively predict future data given only the first set of test data points. Ultimately, this project helps illustrate the complexities of extended future price prediction when using simple models like linear regression.

ContributorsMurwin, Andrew (Author) / Bryan, Chris (Thesis director) / Ghayekhloo, Samira (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-12

Comparison of Different Circuit Ansatz to Optimize Quantum Machine Learning Performance

Description

The field of quantum computing is an exciting area of research that allows quantum mechanics such as superposition, interference, and entanglement to be utilized in solving complex computing problems. One real world application of quantum computing involves applying it to machine learning problems. In this thesis, I explore the effects…

The field of quantum computing is an exciting area of research that allows quantum mechanics such as superposition, interference, and entanglement to be utilized in solving complex computing problems. One real world application of quantum computing involves applying it to machine learning problems. In this thesis, I explore the effects of choosing different circuit ansatz and optimizers on the performance of a variational quantum classifier tasked with binary classification.

ContributorsHsu, Brightan (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-12

US Forest Fire Size Prediction using Machine Learning

Description

The number of extreme wildfires is on the rise globally, and predicting the size of a fire will help officials make appropriate decisions to mitigate the risk the fire poses against the environment and humans. This study attempts to find the burned area of fires in the United States based…

The number of extreme wildfires is on the rise globally, and predicting the size of a fire will help officials make appropriate decisions to mitigate the risk the fire poses against the environment and humans. This study attempts to find the burned area of fires in the United States based on attributes such as time, weather, and location of the fire using machine learning methods.

ContributorsPrabagaran, Padma (Author, Co-author) / Meuth, Ryan (Thesis director) / McCulloch, Robert (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-12

Filtering by