Search Content

An Evaluation of Machine Learning Algorithms for Cardiovascular Disease Detection

Description

This thesis aims to advance healthcare and heart disease prevention by utilizing the Python programming language and various machine learning algorithms for heart disease detection. Being one of the main causes of death worldwide, cardiovascular disease is a serious global health concern. One person passes away from cardiovascular disease every…

This thesis aims to advance healthcare and heart disease prevention by utilizing the Python programming language and various machine learning algorithms for heart disease detection. Being one of the main causes of death worldwide, cardiovascular disease is a serious global health concern. One person passes away from cardiovascular disease every 33 seconds in the United States alone. As the leading cause of death, early identification becomes critical for early intervention and prevention. The study addresses key research questions, including the role of machine learning in enhancing heart disease detection, comparative analysis of the six machine learning models, and the importance of predictive indicators. By leveraging machine learning algorithms for medical data interpretation, the thesis contributes insights into early disease detection.

ContributorsLa, Nikki (Author) / Sheehan, Connor (Thesis director) / Connor, Dylan (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2024-05

Data-Driven Sustainability: A Machine Learning Approach to Assessing ESG Performance in B Corporations

Description

The purpose of this research is to create predictive models for a leading sustainability certification - the B Corporation certification issued by the non-profit company B Lab based on the B Impact Assessment. This certification is one of many that are currently being used to assess sustainability in the corporate…

The purpose of this research is to create predictive models for a leading sustainability certification - the B Corporation certification issued by the non-profit company B Lab based on the B Impact Assessment. This certification is one of many that are currently being used to assess sustainability in the corporate world, and this research seeks to understand the relationships between a corporation's characteristics (e.g. market, size, country) and the B Certification. The data used for the analysis comes from a B Lab upload to data.world, providing descriptive information on each company, current certification status, and B Impact Assessment scores. Further data engineering was used to include attributes on publicly traded status and years certified. Comparing Logistic Regression and Random Forest Classification machine learning methods, a predictive model was produced with 87.58% accuracy discerning between certified and de-certified B Corporations.

ContributorsBrandwick, Katelynn (Author) / Samara, Marko (Thesis director) / Tran, Samantha (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2024-05

Utilization of Deep Neural Networks to Investigate Sex-Dependent and Cerebellar Modulation Impacts on Social Behavior in Mice

Description

The cerebellum is recognized for its role in motor movement, balance, and more recently, social behavior. Cerebellar injury at birth and during critical periods reduces social preference in animal models and increases the risk of autism in humans. Social behavior is commonly assessed with the three-chamber test, where a mouse…

The cerebellum is recognized for its role in motor movement, balance, and more recently, social behavior. Cerebellar injury at birth and during critical periods reduces social preference in animal models and increases the risk of autism in humans. Social behavior is commonly assessed with the three-chamber test, where a mouse travels between chambers that contain a conspecific and an object confined under a wire cup. However, this test is unable to quantify interactive behaviors between pairs of mice, which could not be tracked until the recent development of machine learning programs that track animal behavior. In this study, both the three-chamber test and a novel freely-moving social interaction test assessed social behavior in untreated male and female mice, as well as in male mice injected with hM3Dq (excitatory) DREADDs. In the three-chamber test, significant differences were found in the time spent (female: p < 0.05, male: p < 0.001) and distance traveled (female: p < 0.05, male: p < 0.001) in the chamber with the familiar conspecific, compared to the chamber with the object, for untreated male, untreated female, and mice with activated hM3Dq DREADDs. A social memory test was added, where the object was replaced with a novel mouse. Untreated male mice spent significantly more time (p < 0.05) and traveled a greater distance (p < 0.05) in the chamber with the novel mouse, while male mice with activated hM3Dq DREADDs spent more time (p<0.05) in the chamber with the familiar conspecific. Data from the freely-moving social interaction test was used to calculate freely-moving interactive behaviors between pairs of mice and interactions with an object. No sex differences were found, but mice with excited hM3Dq DREADDs engaged in significantly more anogenital sniffing (p < 0.05) and side-side contact (p < 0.05) behaviors. All these results indicate how machine learning allows for nuanced insights into how both sex and chemogenetic excitation impact social behavior in freely-moving mice.

ContributorsNelson, Megan (Author) / Verpeut, Jessica (Thesis director) / Bimonte-Nelson, Heather (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor) / School of Life Sciences (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2024-05

Agora: Introducing the Internet’s Opinion to Traditional Stock Analysis and Prediction

Description

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence…

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence Aware Dictionary for Sentiment Reasoning (VADER) natural language processing package to determine numerical polarities which represented positivity or negativity for a given stock ticker. These generated polarities were paired with stock metrics typically observed by stock analysts as the feature set for a Logistic Regression machine learning model. The model was trained on roughly 1500 major stocks to determine a binary classification between a “Buy” or “Not Buy” rating for each stock, and the results of the model were inserted into the back-end of the Agora Web UI which emulates search engine behavior specifically for stocks found in NYSE and NASDAQ. The model reported an accuracy of 82.5% and for most major stocks, the model’s prediction correlated with stock analysts’ ratings. Given the volatility of the stock market and the propensity for hive-mind behavior in online forums, the performance of the Logistic Regression model would benefit from incorporating historical stock data and more sources of opinion to balance any subjectivity in the model.

ContributorsRao, Jayanth (Author) / Ramaraju, Venkat (Co-author) / Bansal, Ajay (Thesis director) / Smith, James (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2021-12

Moving Target Defense: Defending against Adversarial Defense

Description

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the…

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the reasons why particular combinations were more effective than others is explored.

ContributorsMazboudi, Yassine Ahmad (Author) / Yang, Yezhou (Thesis director) / Ren, Yi (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Economics Program in CLAS (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Improving Peptide Identification in Shotgun Proteomics Using Deep Neural Networks

Description

In shotgun proteomics, liquid chromatography coupled to tandem mass spectrometry
(LC-MS/MS) is used to identify and quantify peptides and proteins. LC-MS/MS produces mass spectra, which must be searched by one or more engines, which employ
algorithms to match spectra to theoretical spectra derived from a reference database.
These engines identify and characterize proteins…

In shotgun proteomics, liquid chromatography coupled to tandem mass spectrometry
(LC-MS/MS) is used to identify and quantify peptides and proteins. LC-MS/MS produces mass spectra, which must be searched by one or more engines, which employ
algorithms to match spectra to theoretical spectra derived from a reference database.
These engines identify and characterize proteins and their component peptides. By
training a convolutional neural network on a dataset of over 6 million MS/MS spectra
derived from human proteins, we aim to create a tool that can quickly and effectively
identify spectra as peptides prior to database searching. This can significantly reduce search space and thus run time for database searches, thereby accelerating LCMS/MS-based proteomics data acquisition. Additionally, by training neural networks
on labels derived from the search results of three different database search engines, we
aim to examine and compare which features are best identified by individual search
engines, a neural network, or a combination of these.

ContributorsWhyte, Cameron Stafford (Author) / Suren, Jayasuriya (Thesis director) / Gil, Speyer (Committee member) / Patrick, Pirrotte (Committee member) / School of Mathematical and Statistical Sciences (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Exploring Financial Credit Contracts Using Natural Language Processing Techniques

Description

Natural Language Processing (NLP) techniques have increasingly been used in finance, accounting, and economics research to analyze text-based information more efficiently and effectively than primarily human-centered methods. The literature is rich with computational textual analysis techniques applied to consistent annual or quarterly financial fillings, with promising results to identify similarities…

Natural Language Processing (NLP) techniques have increasingly been used in finance, accounting, and economics research to analyze text-based information more efficiently and effectively than primarily human-centered methods. The literature is rich with computational textual analysis techniques applied to consistent annual or quarterly financial fillings, with promising results to identify similarities between documents and firms, in addition to further using this information in relation to other economic phenomena. Building upon the knowledge gained from previous research and extending the application of NLP methods to other categories of financial documents, this project explores financial credit contracts, better understanding the information provided through their textual data by assessing patterns and relationships between documents and firms. The main methods used throughout this project is Term Frequency-Inverse Document Frequency (to represent each document as a numerical vector), Cosine Similarity (to measure the similarity between contracts), and K-Means Clustering (to organically derive clusters of documents based on the text included in the contract itself). Using these methods, the dimensions analyzed are various grouping methodologies (external industry classifications and text derived classifications), various granularities (document-wise and firm-wise), various financial documents associated with a single firm (the relationship between credit contracts and 10-K product descriptions), and how various mean cosine similarity distributions change over time.

ContributorsLiu, Jeremy J (Author) / Wahal, Sunil (Thesis director) / Bharath, Sreedhar (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / School for the Future of Innovation in Society (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Artificial Intelligence with Graph Neural Networks Applied to a Risk-like Board Game

Description

This project aspires to develop an AI capable of playing on a variety of maps in a Risk-like board game. While AI has been successfully applied to many other board games, such as Chess and Go, most research is confined to a single board and is inflexible to topological changes.…

This project aspires to develop an AI capable of playing on a variety of maps in a Risk-like board game. While AI has been successfully applied to many other board games, such as Chess and Go, most research is confined to a single board and is inflexible to topological changes. Further, almost all of these games are played on a rectangular grid. Contrarily, this project develops an AI player, referred to as GG-net, to play the online strategy game Warzone, which is based on the classic board game Risk. Warzone is played on a wide variety of irregularly shaped maps. Prior research has struggled to create an effective AI for Risk-like games due to the immense branching factor. The most successful attempts tended to rely on manually restricting the set of actions the AI considered while also engineering useful features for the AI to consider. GG-net uses no human knowledge, but rather a genetic algorithm combined with a graph neural network. Together, these methods allow GG-net to perform competitively across a multitude of maps. GG-net outperformed the built-in rule-based AI by 413 Elo (representing an 80.7% chance of winning) and an approach based on AlphaZero using graph neural networks by 304 Elo (representing a 74.2% chance of winning). This same advantage holds across both seen and unseen maps. GG-net appears to be a strong opponent on both small and medium maps, however, on large maps with hundreds of territories, inefficiencies in GG-net become more significant and GG-net struggles against the rule-based approach. Overall, GG-net was able to successfully learn the game and generalize across maps of a similar size, albeit further work is required for GG-net to become more successful on large maps.

ContributorsBauer, Andrew (Author) / Yang, Yezhou (Thesis director) / Harrison, Blake (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Convoluted Processes: The Use and Misuse of Machine Learning in Data Analysis and Prediction

Description

With the rapid increase of technological capabilities, particularly in processing power and speed, the usage of machine learning is becoming increasingly widespread, especially in fields where real-time assessment of complex data is extremely valuable. This surge in popularity of machine learning gives rise to an abundance of potential research and…

With the rapid increase of technological capabilities, particularly in processing power and speed, the usage of machine learning is becoming increasingly widespread, especially in fields where real-time assessment of complex data is extremely valuable. This surge in popularity of machine learning gives rise to an abundance of potential research and projects on further broadening applications of artificial intelligence. From these opportunities comes the purpose of this thesis. Our work seeks to meaningfully increase our understanding of current capabilities of machine learning and the problems they can solve. One extremely popular application of machine learning is in data prediction, as machines are capable of finding trends that humans often miss. Our effort to this end was to examine the CVE dataset and attempt to predict future entries with Random Forests. The second area of interest lies within the great promise being demonstrated by neural networks in the field of autonomous driving. We sought to understand the research being put out by the most prominent bodies within this field and to implement a model on one of the largest standing datasets, Berkeley DeepDrive 100k. This thesis describes our efforts to build, train, and optimize a Random Forest model on the CVE dataset and a convolutional neural network on the Berkeley DeepDrive 100k dataset. We document these efforts with the goal of growing our knowledge on (and usage of) machine learning in these topics.

ContributorsSelzer, Cora (Author) / Smith, Zachary (Co-author) / Ingram-Waters, Mary (Thesis director) / Rendell, Dawn (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Predicting Self-Correction Attempts with FACT, an Automated Teaching Assistant for Algebra Classes

Description

Machine learning is a rapidly growing field, with no doubt in part due to its countless applications to other fields, including pedagogy and the creation of computer-aided tutoring systems. To extend the functionality of FACT, an automated teaching assistant, we want to predict, using metadata produced by student activity, whether…

Machine learning is a rapidly growing field, with no doubt in part due to its countless applications to other fields, including pedagogy and the creation of computer-aided tutoring systems. To extend the functionality of FACT, an automated teaching assistant, we want to predict, using metadata produced by student activity, whether a student is capable of fixing their own mistakes. Logs were collected from previous FACT trials with middle school math teachers and students. The data was converted to time series sequences for deep learning, and ordinary features were extracted for statistical machine learning. Ultimately, deep learning models attained an accuracy of 60%, while tree-based methods attained an accuracy of 65%, showing that some correlation, although small, exists between how a student fixes their mistakes and whether their correction is correct.

ContributorsZhou, David (Author) / VanLehn, Kurt (Thesis director) / Wetzel, Jon (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

Filtering by