Search Content

Comparison of Machine Learning Algorithms for Predicting Breast Cancer Malignancy

Description

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized…

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized images of breast tissue samples, called fine-needle aspirates. Breast cancer diagnosis typically involves a combination of mammography, ultrasound, and biopsy. However, machine learning algorithms can assist in the detection and diagnosis of breast cancer by analyzing large amounts of data and identifying patterns that may not be discernible to the human eye. By using these algorithms, healthcare professionals can potentially detect breast cancer at an earlier stage, leading to more effective treatment and better patient outcomes. The results showed that the gradient boosting classifier performed the best, achieving an accuracy of 96% on the test set. This indicates that this algorithm can be a useful tool for healthcare professionals in the early detection and diagnosis of breast cancer, potentially leading to improved patient outcomes.

ContributorsMallya, Aatmik (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Using Machine Learning to Predict Performance in the NFL

Description

In the last two decades, fantasy sports have grown massively in popularity. Fantasy football in particular is the most popular fantasy sport in the United States. People spend hours upon hours every year building, researching, and perfecting their teams to compete with others for money or bragging rights. One problem,…

In the last two decades, fantasy sports have grown massively in popularity. Fantasy football in particular is the most popular fantasy sport in the United States. People spend hours upon hours every year building, researching, and perfecting their teams to compete with others for money or bragging rights. One problem, however, is that National Football League (NFL) players are human and will not perform the same as they did last week or last season. Because of this, there is a need to create a machine learning model to help predict when players will have a tough game or when they can perform above average. This report discusses the history and science of fantasy football, gathering large amounts of player data, manipulating the information to create more insightful data points, creating a machine learning model, and how to use this tool in a real-world situation. The initial model created significantly accurate predictions for quarterbacks and running backs but not receivers and tight ends. Improvements significantly increased the accuracy by reducing the mean average error to below one for all positions, resulting in a successful model for all four positions.

ContributorsCase, Spencer (Author) / Johnson, Jarod (Co-author) / Kostelich, Eric (Thesis director) / Zhuang, Houlong (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor) / Mechanical and Aerospace Engineering Program (Contributor)

Created2023-05

Modeling Future Cropland Viability in the Eastern Continental United States

Description

Climate is a critical determinant of agricultural productivity, and the ability to accurately predict this productivity is necessary to provide guidance regarding food security and agricultural management. Previous predictions vary in approach due to the myriad of factors influencing agricultural productivity but generally suggest long-term declines in productivity and agricultural…

Climate is a critical determinant of agricultural productivity, and the ability to accurately predict this productivity is necessary to provide guidance regarding food security and agricultural management. Previous predictions vary in approach due to the myriad of factors influencing agricultural productivity but generally suggest long-term declines in productivity and agricultural land suitability under climate change. In this paper, I relate predicted climate changes to yield for three major United States crops, namely corn, soybeans, and wheat, using a moderate emissions scenario. By adopting data-driven machine learning approaches, I used the following machine learning methods: random forest (RF), extreme gradient boosting (XGB), and artificial neural networks (ANN) to perform comparative analysis and ensemble methodology. I omitted the western US due to the region's susceptibility to water stress and the prevalence of artificial irrigation as a means to compensate for dry conditions. By considering only climate, the model's results suggest an ensemble mean decline in crop yield of 23.4\% for corn, 19.1\% for soybeans, and 7.8\% for wheat between the years of 2017 and 2100. These results emphasize potential negative impacts of climate change on the current agricultural industry as a result of shifting bio-climactic conditions.

ContributorsSwarup, Shray (Author) / Eikenberry, Steffen (Thesis director) / Mahalov, Alex (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Improving Quantum Mechanical Calculations Using Graph Neural Networks to Predict Energies from Atomic Structure

Description

Graph neural networks (GNN) offer a potential method of bypassing the Kohn-Sham equations in density functional theory (DFT) calculations by learning both the Hohenberg-Kohn (HK) mapping of electron density to energy, allowing for calculations of much larger atomic systems and time scales and enabling large-scale MD simulations with DFT-level accuracy.…

Graph neural networks (GNN) offer a potential method of bypassing the Kohn-Sham equations in density functional theory (DFT) calculations by learning both the Hohenberg-Kohn (HK) mapping of electron density to energy, allowing for calculations of much larger atomic systems and time scales and enabling large-scale MD simulations with DFT-level accuracy. In this work, we investigate the feasibility of GNNs to learn the HK map from the external potential approximated as Gaussians to the electron density 𝑛(𝑟), and the mapping from 𝑛(𝑟) to the energy density 𝑒(𝑟) using Pytorch Geometric. We develop a graph representation for densities on radial grid points and determine that a k-nearest neighbor algorithm for determining node connections is an effective approach compared to a distance cutoff model, having an average graph size of 6.31 MB and 32.0 MB for datasets with 𝑘 = 10 and 𝑘 = 50 respectively. Furthermore, we develop two GNNs in Pytorch Geometric, and demonstrate a decrease in training losses for a 𝑛(𝑟) to 𝑒(𝑟) of 8.52 · 10^14 and 3.10 · 10^14 for 𝑘 = 10 and 𝑘 = 20 datasets respectively, suggesting the model could be further trained and optimized to learn the electron density to energy functional.

ContributorsHayes, Matthew (Author) / Muhich, Christopher (Thesis director) / Oswald, Jay (Committee member) / Barrett, The Honors College (Contributor) / Chemical Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2023-05

Machine Learning and Mario Speedruns

Description

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will…

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will be described is self-play in video games, where a neural model will be researched and described that will teach a computer to complete a level of Super Mario World (1990) on its own. The neural model in question was inspired by the academic paper “Evolving Neural Networks through Augmenting Topologies”, which was written by Kenneth O. Stanley and Risto Miikkulainen of University of Texas at Austin. The model that will actually be described is from YouTuber SethBling of the California Institute of Technology. The second department that will be described is cybersecurity, where an algorithm is described from the academic paper “Process Based Volatile Memory Forensics for Ransomware Detection”, written by Asad Arfeen, Muhammad Asim Khan, Obad Zafar, and Usama Ahsan. This algorithm utilizes Python and the Volatility framework to detect malicious software in an infected system.

ContributorsBallecer, Joshua (Author) / Yang, Yezhou (Thesis director) / Luo, Yiran (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Quantum Machine Learning for Optical and SAR Classification

Description

We present in this paper a method to compare scene classification accuracy of C-band Synthetic aperture radar (SAR) and optical images utilizing both classical and quantum computing algorithms. This REU study uses data from the Sentinel satellite. The dataset contains (i) synthetic aperture radar images collected from the Sentinel-1 satellite…

We present in this paper a method to compare scene classification accuracy of C-band Synthetic aperture radar (SAR) and optical images utilizing both classical and quantum computing algorithms. This REU study uses data from the Sentinel satellite. The dataset contains (i) synthetic aperture radar images collected from the Sentinel-1 satellite and (ii) optical images for the same area as the SAR images collected from the Sentinel-2 satellite. We utilize classical neural networks to classify four classes of images. We then use Quantum Convolutional Neural Networks and deep learning techniques to take advantage of machine learning to help the system train, learn, and identify at a higher classification accuracy. A hybrid Quantum-classical model that is trained on the Sentinel1-2 dataset is proposed, and the performance is then compared against the classical in terms of classification accuracy.

ContributorsMiller, Leslie (Author) / Spanias, Andreas (Thesis director) / Uehara, Glen (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2023-05

Evaluation of Machine Learning Techniques for Pneumonia Detection

Description

Although relatively new technology, machine learning has rapidly demonstrated its many uses. One potential application of machine learning is the diagnosis of ailments in medical imaging. Ideally, through classification methods, a computer program would be able to identify different medical conditions when provided with an X-ray or other such scan.…

Although relatively new technology, machine learning has rapidly demonstrated its many uses. One potential application of machine learning is the diagnosis of ailments in medical imaging. Ideally, through classification methods, a computer program would be able to identify different medical conditions when provided with an X-ray or other such scan. This would be very beneficial for overworked doctors, and could act as a potential crutch to aid in giving accurate diagnoses. For this thesis project, five different machine-learning algorithms were tested on two datasets containing 5,856 lung X-ray scans labeled as either “Pneumonia” or “Normal”. The goal was to determine which algorithm achieved the highest accuracy, as well as how preprocessing the data affected the accuracy of the models. The following supervised-learning methods were tested: support vector machines, logistic regression, decision trees, random forest, and a convolutional neural network. Each model was adjusted independently in order to achieve maximum performance before accuracy metrics were generated to pit the models against each other. Additionally, the effect of resizing images on model performance was investigated. Overall, a convolutional neural network proved to be the superior model for pneumonia detection, with a 91% accuracy. After resizing to 28x28, CNN accuracy decreased to 85%. The random forest model performed second best. The 28x28 PneumoniaMNIST dataset achieved higher accuracy using traditional machine learning models than the HD Chest X-Ray dataset. Resizing the Chest X-ray images had minimal effect on traditional model performance when resized to 28x28 or larger.

ContributorsVollkommer, Margie (Author) / Spanias, Andreas (Thesis director) / Sivaraman Narayanaswamy, Vivek (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor)

Created2023-05

Quantum Machine Learning for Brain Tumor Detection

Description

Recent advances in quantum computing have broadened the available techniques towards addressing existing computing problems. One area of interest is that of the emerging field of machine learning. The intersection of these fields, quantum machine learning, has the ability to perform high impact work such as that in the health…

Recent advances in quantum computing have broadened the available techniques towards addressing existing computing problems. One area of interest is that of the emerging field of machine learning. The intersection of these fields, quantum machine learning, has the ability to perform high impact work such as that in the health industry. Use cases seen in previous research include that of the detection of illnesses in medical imaging through image classification. In this work, we explore the utilization of a hybrid quantum-classical approach for the classification of brain Magnetic Resonance Imaging (MRI) images for brain tumor detection utilizing public Kaggle datasets. More specifically, we aim to assess the performance and utility of a hybrid model, comprised of a classical pretrained portion and a quantum variational circuit. We will compare these results to purely classical approaches, one utilizing transfer learning and one without, for the stated datasets. While more research should be done for proving generalized quantum advantage, our work shows potential quantum advantages in validation accuracy and sensitivity for the specified task, particularly when training with limited data availability in a minimally skewed dataset under specific conditions. Utilizing the IBM’s Qiskit Runtime Estimator with built in error mitigation, our experiments on a physical quantum system confirmed some results generated through simulations.

ContributorsDiaz, Maryannette (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Examining the usage of New NLP Techniques to process Raw Text Data Entries

Description

2018, Google researchers published the BERT (Bidirectional Encoder Representations from Transformers) model, which has since served as a starting point for hundreds of NLP (Natural Language Processing) related experiments and other derivative models. BERT was trained on masked-language modelling (sentence prediction) but its capabilities extend to more common NLP tasks,…

2018, Google researchers published the BERT (Bidirectional Encoder Representations from Transformers) model, which has since served as a starting point for hundreds of NLP (Natural Language Processing) related experiments and other derivative models. BERT was trained on masked-language modelling (sentence prediction) but its capabilities extend to more common NLP tasks, such as language inference and text classification. Naralytics is a company that seeks to use natural language in order to be able to categorize users who create text into multiple categories – which is a modified version of classification. However, the text that Naralytics seeks to pull from exceed the maximum token length of 512 tokens that BERT supports – so this report discusses the research towards multiple BERT derivatives that seek to address this problem – and then implements a solution that addresses the multiple concerns that are attached to this kind of model.

ContributorsNgo, Nicholas (Author) / Carter, Lynn (Thesis director) / Lee, Gyou-Re (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Economics Program in CLAS (Contributor)

Created2023-05

Bernstein_Spring_2023.pdf

ContributorsBernstein, Daniel (Author) / Pizziconi, Vincent (Thesis director) / Glattke, Kaycee (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor)

Created2023-05

Filtering by