Search Content

Agora: Introducing the Internet's Opinion to Traditional Stock Analysis and Prediction.

Description

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence…

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence Aware Dictionary for Sentiment Reasoning (VADER) natural language processing package to determine numerical polarities which represented positivity or negativity for a given stock ticker. These generated polarities were paired with stock metrics typically observed by stock analysts as the feature set for a Logistic Regression machine learning model. The model was trained on roughly 1500 major stocks to determine a binary classification between a “Buy” or “Not Buy” rating for each stock, and the results of the model were inserted into the back-end of the Agora Web UI which emulates search engine behavior specifically for stocks found in NYSE and NASDAQ. The model reported an accuracy of 82.5% and for most major stocks, the model’s prediction correlated with stock analysts’ ratings. Given the volatility of the stock market and the propensity for hive-mind behavior in online forums, the performance of the Logistic Regression model would benefit from incorporating historical stock data and more sources of opinion to balance any subjectivity in the model.

ContributorsRamaraju, Venkat (Author) / Rao, Jayanth (Co-author) / Bansal, Ajay (Thesis director) / Smith, James (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Agora: Introducing the Internet’s Opinion to Traditional Stock Analysis and Prediction

Description

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence…

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence Aware Dictionary for Sentiment Reasoning (VADER) natural language processing package to determine numerical polarities which represented positivity or negativity for a given stock ticker. These generated polarities were paired with stock metrics typically observed by stock analysts as the feature set for a Logistic Regression machine learning model. The model was trained on roughly 1500 major stocks to determine a binary classification between a “Buy” or “Not Buy” rating for each stock, and the results of the model were inserted into the back-end of the Agora Web UI which emulates search engine behavior specifically for stocks found in NYSE and NASDAQ. The model reported an accuracy of 82.5% and for most major stocks, the model’s prediction correlated with stock analysts’ ratings. Given the volatility of the stock market and the propensity for hive-mind behavior in online forums, the performance of the Logistic Regression model would benefit from incorporating historical stock data and more sources of opinion to balance any subjectivity in the model.

ContributorsRao, Jayanth (Author) / Ramaraju, Venkat (Co-author) / Bansal, Ajay (Thesis director) / Smith, James (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2021-12

Photovoltaic Array Fault Detection and Optimization Using Machine Learning

Description

The increasing demand for clean energy solutions requires more than just expansion, but also improvements in the efficiency of renewable sources, such as solar. This requires analytics for each panel regarding voltage, current, temperature, and irradiance. This project involves the development of machine learning algorithms along with a data logger…

The increasing demand for clean energy solutions requires more than just expansion, but also improvements in the efficiency of renewable sources, such as solar. This requires analytics for each panel regarding voltage, current, temperature, and irradiance. This project involves the development of machine learning algorithms along with a data logger for the purpose of photovoltaic (PV) monitoring and control. Machine learning is used for fault classification. Once a fault is detected, the system can change its reconfiguration to minimize the power losses. Accuracy in the fault detection was demonstrated to be at a level over 90% and topology reconfiguration showed to increase power output by as much as 5%.

ContributorsNavas, John (Author) / Spanias, Andreas (Thesis director) / Rao, Sunil (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Using Facebook to Examine Smoking Behavior through ""Quit Smoking"" Support Groups

Description

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for increasing understanding of substance abuse. In this study, Facebook was…

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for increasing understanding of substance abuse. In this study, Facebook was used to monitor nicotine addiction through the public support groups users can join to aid their quitting process. Objective: The main objective of this project was to gain a better understanding of the mechanisms of nicotine addiction online and provide content analysis of Facebook posts obtained from "quit smoking" support groups. Methods: Using the Facebook Application Programming Interface (API) for Python, a sample of 9,970 posts were collected in October 2015. Information regarding the user's name and the number of likes and comments they received on their post were also included. The posts crawled were then manually classified by one annotator into one of three categories: positive, negative, and neutral. Where positive posts are those that describe current quits, negative posts are those that discuss relapsing, and neutral posts are those that were not be used to train the classifiers, which include posts where users have yet to attempt a quit, ads, random questions, etc. For this project, the performance of two machine learning algorithms on a corpus of manually labeled Facebook posts were compared. The classification goal was to test the plausibility of creating a natural language processing machine learning classifier which could be used to distinguish between relapse (labeled negative) and quitting success (labeled positive) posts from a set of smoking related posts. Results: From the corpus of 9,970 posts that were manually labeled: 6,254 (62.7%) were labeled positive, 1,249 (12.5%) were labeled negative, and 2467 (24.8%) were labeled neutral. Since the posts labeled neutral are those which are irrelevant to the classification task, 7,503 posts were used to train the classifiers: 83.4% positive and 16.6% negative. The SVM classifier was 84.1% accurate and 84.1% precise, had a recall of 1, and an F-score of 0.914. The MNB classifier was 82.8% accurate and 82.8% precise, had a recall of 1, and an F-score of 0.906. Conclusions: From the Facebook surveillance results, a small peak is given into the behavior of those looking to quit smoking. Ultimately, what makes Facebook a great tool for public health surveillance is that it has an extremely large and diverse user base with information that is easily obtainable. This, and the fact that so many people are actually willing to use Facebook support groups to aid their quitting processes demonstrates that it can be used to learn a lot about quitting and smoking behavior.

ContributorsMolina, Daniel Antonio (Author) / Li, Baoxin (Thesis director) / Tian, Qiongjie (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Analyzing the History of Flight Delays within the United States and Modeling a Flight Route to Decrease Delay Rate in Collaboration with Honeywell Aerospace

Description

This thesis project focused on determining the primary causes of flight delays within the United States then building a machine learning model using the collected flight data to determine a more efficient flight route from Phoenix Sky Harbor International Airport in Phoenix, Arizona to Harry Reid International Airport in Las…

This thesis project focused on determining the primary causes of flight delays within the United States then building a machine learning model using the collected flight data to determine a more efficient flight route from Phoenix Sky Harbor International Airport in Phoenix, Arizona to Harry Reid International Airport in Las Vegas, Nevada. In collaboration with Honeywell Aerospace as part of the Ira A. Fulton Schools of Engineering Capstone Course, CSE 485 and 486, this project consisted of using open source data from FlightAware and the United States Bureau of Transportation Statistics to identify 5 primary causes of flight delays and determine if any of them could be solved using machine learning. The machine learning model was a 3-layer Feedforward Neural Network that focused on reducing the impact of Late Arriving Aircraft for the Phoenix to Las Vegas route. Evaluation metrics used to determine the efficiency and success of the model include Mean Squared Error (MSE), Mean Average Error (MAE), and R-Squared Score. The benefits of this project are wide-ranging, for both consumers and corporations. Consumers will be able to arrive at their destination earlier than expected, which would provide them a better experience with the airline. On the other side, the airline can take credit for the customer's satisfaction, in addition to reducing fuel usage, thus making their flights more environmentally friendly. This project represents a significant contribution to the field of aviation as it proves that flights can be made more efficient through the usage of open source data.

ContributorsRosenbloom, Yonatan (Author) / Chavez Echeagaray, Maria Elena (Thesis director) / Govindillam, Sreenivasan (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of International Letters and Cultures (Contributor)

Created2024-05

Lettuce Nutritional Deficiency and Disease Identification with ResNet-50 and CapsNet

Description

Manually determining the health of a plant requires time and expertise from a human. Automating this process utilizing machine learning could provide significant benefits to the agricultural field. The detection and classification of health defects in crops by analyzing visual data using computer vision tools can accomplish this. In this…

Manually determining the health of a plant requires time and expertise from a human. Automating this process utilizing machine learning could provide significant benefits to the agricultural field. The detection and classification of health defects in crops by analyzing visual data using computer vision tools can accomplish this. In this paper, the task is completed using two different types of existing machine learning algorithms, ResNet50 and CapsNet, which take images of crops as input and return a classification that denotes the health defect the crop suffers from. Specifically, the models analyze the images to determine if a nutritional deficiency or disease is present and, if so, identify it. The purpose of this project is to apply the proven deep learning architecture, ResNet50, to the data, which serves as a baseline for comparison of performance with the less researched architecture, CapsNet. This comparison highlights differences in the performance of the two architectures when applied to a complex dataset with a multitude of classes. This report details the data pipeline process, including dataset collection and validation, as well as preprocessing and application to the model. Additionally, methods of improving the accuracy of the models are recorded and analyzed to provide further insights into the comparison of the different architectures. The ResNet-50 model achieved an accuracy of 100% after being trained on the nutritional deficiency dataset. It achieved an accuracy of 88.5% on the disease dataset. The CapsNet model achieved an accuracy of 90% on the nutritional deficiency dataset but only 70% on the disease dataset. In comparing the performance of the two models, the ResNet model outperformed the other; however, the CapsNet model shows promise for future implementations. With larger, more complete datasets as well as improvements to the design of capsule networks, they will likely provide exceptional performance for complex image classification tasks.

ContributorsChristner, Drew (Author) / Carter, Lynn (Thesis director) / Ghayekhloo, Samira (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor)

Created2024-05

The InceptionTime Model for Measuring Neurotransmitters in the Human Brain

Description

The InceptionTime model is a tool modified for time series regression. For the first time in history, Read Montague’s lab at Virginia Tech has developed methods to measure neurotransmitters in the human brain using InceptionTime to analyze fast-scan cyclic voltammetry (FSCV) data. FSCV has been around for decades and has…

The InceptionTime model is a tool modified for time series regression. For the first time in history, Read Montague’s lab at Virginia Tech has developed methods to measure neurotransmitters in the human brain using InceptionTime to analyze fast-scan cyclic voltammetry (FSCV) data. FSCV has been around for decades and has been previously used to study concentrations of the neurotransmitter dopamine. However, unlike older analysis techniques such as principal component regression, InceptionTime can distinguish between catecholamines such as dopamine, norepinephrine, and serotonin, thereby vastly increasing FSCV’s utility. This paper serves as an investigation of the InceptionTime model, its applications in FSCV experiments, and provides information about electrochemical concepts that are integral in understanding the value of this research.

ContributorsAger, Katrina (Author) / McClure, Samuel (Thesis director) / Brewer, Gene (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor)

Created2024-05

Applying and Implementing AI Tools in Supply Chain Management

Description

Supply chain management is a complex field that deals with a variety of ever-changing factors, and artificial intelligence has the opportunity to create lots of value and drive efficiency if organizations can implement it effectively. This thesis examines the different types of AI based on functionality and capability and provides…

Supply chain management is a complex field that deals with a variety of ever-changing factors, and artificial intelligence has the opportunity to create lots of value and drive efficiency if organizations can implement it effectively. This thesis examines the different types of AI based on functionality and capability and provides a brief overview of the history behind artificial intelligence. Different supply chain functions including demand forecasting, inventory management, route optimization, supply transparency, and safety and sustainability were analyzed before and after adding AI systems. After examining AI missteps and successes in recent years, a detailed roadmap was created to help decision-makers deal with the numerous complexities when implementing AI technology within a business to improve the supply chain.

ContributorsHildebrand, Ryan (Author) / Printezis, Antonios (Thesis director) / Pofahl, Geoffrey (Committee member) / Barrett, The Honors College (Contributor) / School of International Letters and Cultures (Contributor) / Department of Supply Chain Management (Contributor)

Created2024-05

Industry Planted: Investigation into the Promotion of Local Music Events using Content-Based Streaming Data

Description

This thesis explores strategies to enhance visibility and engagement within local music ecosystems using a data-driven approach that leverages streaming platform data. It employs a two-pronged approach, consisting of a Proof of Concept (PoC) and a Business Model Canvas (BMC). The PoC involves the development and refinement of two novel…

This thesis explores strategies to enhance visibility and engagement within local music ecosystems using a data-driven approach that leverages streaming platform data. It employs a two-pronged approach, consisting of a Proof of Concept (PoC) and a Business Model Canvas (BMC). The PoC involves the development and refinement of two novel machine learning-based music recommendation algorithms, specifically tailored for local stakeholders in the Valley Metro area. Empirical testing of these algorithms has shown a significant potential increase in visibility and engagement for local music events. Utilizing these results, the study proposes informed revisions to the existing streaming BMC, aiming to better support local music ecosystems through strategic enhancements derived from the validated PoC findings.

ContributorsEllini, Andre (Author) / Clarkin, Michael (Co-author) / Bradley, Robert (Co-author) / Mancenido, Michelle (Thesis director) / Sirugudi, Kumar (Committee member) / Barrett, The Honors College (Contributor) / Mechanical and Aerospace Engineering Program (Contributor)

Created2024-05

Parameter Optimization with Conscious Allocation (POCA): Efficient Bayesian Hyperparameter Optimization with Adaptive Budget Assignment

Description

The performance of modern machine learning algorithms depends upon the selection of a set of hyperparameters. Common examples of hyperparameters are learning rate and the number of layers in a dense neural network. Auto-ML is a branch of optimization that has produced important contributions in this area. Within Auto-ML, multi-fidelity approaches, which eliminate poorly-performing…

The performance of modern machine learning algorithms depends upon the selection of a set of hyperparameters. Common examples of hyperparameters are learning rate and the number of layers in a dense neural network. Auto-ML is a branch of optimization that has produced important contributions in this area. Within Auto-ML, multi-fidelity approaches, which eliminate poorly-performing configurations after evaluating them at low budgets, are among the most effective. However, the performance of these algorithms strongly depends on how effectively they allocate the computational budget to various hyperparameter configurations. We first present Parameter Optimization with Conscious Allocation 1.0 (POCA 1.0), a hyperband- based algorithm for hyperparameter optimization that adaptively allocates the inputted budget to the hyperparameter configurations it generates following a Bayesian sampling scheme. We then present its successor Parameter Optimization with Conscious Allocation 2.0 (POCA 2.0), which follows POCA 1.0’s successful philosophy while utilizing a time-series model to reduce wasted computational cost and providing a more flexible framework. We compare POCA 1.0 and 2.0 to its nearest competitor BOHB at optimizing the hyperparameters of a multi-layered perceptron and find that both POCA algorithms exceed BOHB in low-budget hyperparameter optimization while performing similarly in high-budget scenarios.

ContributorsInman, Joshua (Author) / Sankar, Lalitha (Thesis director) / Pedrielli, Giulia (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2024-05

Filtering by