Search Content

Hilliker Thesis Paper

ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Characterizing the Performance of Machine Learning Algorithms: A Study and Novel Techniques

Description

Classification in machine learning is quite crucial to solve many problems that the world is presented with today. Therefore, it is key to understand one’s problem and develop an efficient model to achieve a solution. One technique to achieve greater model selection and thus further ease in problem solving is…

Classification in machine learning is quite crucial to solve many problems that the world is presented with today. Therefore, it is key to understand one’s problem and develop an efficient model to achieve a solution. One technique to achieve greater model selection and thus further ease in problem solving is estimation of the Bayes Error Rate. This paper provides the development and analysis of two methods used to estimate the Bayes Error Rate on a given set of data to evaluate performance. The first method takes a “global” approach, looking at the data as a whole, and the second is more “local”—partitioning the data at the outset and then building up to a Bayes Error Estimation of the whole. It is found that one of the methods provides an accurate estimation of the true Bayes Error Rate when the dataset is at high dimension, while the other method provides accurate estimation at large sample size. This second conclusion, in particular, can have significant ramifications on “big data” problems, as one would be able to clarify the distribution with an accurate estimation of the Bayes Error Rate by using this method.

ContributorsLattus, Robert (Author) / Dasarathy, Gautam (Thesis director) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2021-12

Agora: Introducing the Internet’s Opinion to Traditional Stock Analysis and Prediction

Description

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence…

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence Aware Dictionary for Sentiment Reasoning (VADER) natural language processing package to determine numerical polarities which represented positivity or negativity for a given stock ticker. These generated polarities were paired with stock metrics typically observed by stock analysts as the feature set for a Logistic Regression machine learning model. The model was trained on roughly 1500 major stocks to determine a binary classification between a “Buy” or “Not Buy” rating for each stock, and the results of the model were inserted into the back-end of the Agora Web UI which emulates search engine behavior specifically for stocks found in NYSE and NASDAQ. The model reported an accuracy of 82.5% and for most major stocks, the model’s prediction correlated with stock analysts’ ratings. Given the volatility of the stock market and the propensity for hive-mind behavior in online forums, the performance of the Logistic Regression model would benefit from incorporating historical stock data and more sources of opinion to balance any subjectivity in the model.

ContributorsRao, Jayanth (Author) / Ramaraju, Venkat (Co-author) / Bansal, Ajay (Thesis director) / Smith, James (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2021-12

FPGA Machine Learning: MLP and CNN Feedforward with Minimal Hardware Resources

Description

Machine learning is a powerful tool for processing and understanding the vast amounts of data produced by sensors every day. Machine learning has found use in a wide variety of fields, from making medical predictions through correlations invisible to the human eye to classifying images in computer vision applications. A…

Machine learning is a powerful tool for processing and understanding the vast amounts of data produced by sensors every day. Machine learning has found use in a wide variety of fields, from making medical predictions through correlations invisible to the human eye to classifying images in computer vision applications. A wide range of machine learning algorithms have been developed to attempt to solve these problems, each with different metrics in accuracy, throughput, and energy efficiency. However, even after they are trained, these algorithms require substantial computations to make a prediction. General-purpose CPUs are not well-optimized to this task, so other hardware solutions have developed over time, including the use of a GPU, FPGA, or ASIC.

This project considers the FPGA implementations of MLP and CNN feedforward. While FPGAs provide significant performance improvements, they come at a substantial financial cost. We explore the options of implementing these algorithms on a smaller budget. We successfully implement a multilayer perceptron that identifies handwritten digits from the MNIST dataset on a student-level DE10-Lite FPGA with a test accuracy of 91.99%. We also apply our trained network to external image data loaded through a webcam and a Raspberry Pi, but we observe lower test accuracy in these images. Later, we consider the requirements necessary to implement a more elaborate convolutional neural network on the same FPGA. The study deems the CNN implementation feasible in the criteria of memory requirements and basic architecture. We suggest the CNN implementation on the same FPGA to be worthy of further exploration.

ContributorsLythgoe, Zachary James (Author) / Allee, David (Thesis director) / Hartin, Olin (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Using Natural Language Processing to Identify Questions and Answers Written by People Addicted to Opioids

Description

Background: Natural Language Processing models have been trained to locate questions and answers in forum settings before but on topics such as cancer and diabetes. Also, studies have used filtering methods to understand themes in forum settings regarding opioid use. However, studies have not been conducted regarding training an NLP…

Background: Natural Language Processing models have been trained to locate questions and answers in forum settings before but on topics such as cancer and diabetes. Also, studies have used filtering methods to understand themes in forum settings regarding opioid use. However, studies have not been conducted regarding training an NLP model to locate the questions people addicted to opioids are asking their peers and the answers they are receiving in forums. There are a variety of annotation tools available to help aid the data collection to train NLP models. For academic purposes, brat is the best tool for this purpose. This study will inform clinical practice by indicating what the inner thoughts of their patients who are addicted to opioids are so that they will be able to have more meaningful conversations during appointments that the patient may be too afraid to start.

Methods: The standard NLP process was used for this study in which a gold standard was reached through matched paired annotations of the forum text in brat and a neural network was trained on the content. Following the annotation process, adjudication occurred to increase the inter-annotator agreement. Categories were developed by local physicians to describe the questions and three pilots were run to test the best way to categorize the questions.

Results: The inter-annotator agreement, calculated via F-score, before adjudication for a 0.7 threshold was 0.378 for the annotation activity. After adjudication at a threshold of 0.7, the inter-annotator agreement increased to 0.560. Pilots 1, 2, and 3 of the categorization activity had an inter-annotator agreement of 0.375, 0.5, and 0.966 respectively.

Discussion: The inter-annotator agreement of the annotation activity may have been low initially since the annotators were students who may have not been as invested in the project as necessary to accurately annotate the text. Also, as everyone interprets the text slightly differently, it is possible that that contributed to the differences in the matched pairs’ annotations. The F-score variation for the categorization activity partially had to do with different delivery systems of the instructions and partially with the area of study of the participants. The first pilot did not mandate the use of the original context located in brat and the instructions were provided in the form of a downloadable document. The participants were computer science graduate students. The second pilot also had the instructions delivered via a document, but it was strongly suggested that the context be used to gain an understanding of the questions’ meanings. The participants were also computer science graduate students who upon a discussion of their results after the pilot expressed that they did not have a good understanding of the medical jargon in the posts. The final pilot used a combination of students with and without medical background, required to use the context, and included verbal instructions in combination with the written ones. The combination of these factors increased the F-score significantly. For a full-scale experiment, students with a medical background should be used to categorize the questions.

ContributorsPawlik, Katie (Author) / Devarakonda, Murthy (Thesis director) / Murcko, Anita (Committee member) / Green, Ellen (Committee member) / College of Health Solutions (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Burn, Baby, Burn: the Centralia Mine Fire

Description

The Centralia Council, representative of a small Pennsylvania borough, arranged for an illegal controlled burn of the Centralia landfill in late May 1962. It happened the same way every year. As Memorial Day drew closer, the Council contracted volunteer firefighters to burn the top layer of refuse in the landfill…

The Centralia Council, representative of a small Pennsylvania borough, arranged for an illegal controlled burn of the Centralia landfill in late May 1962. It happened the same way every year. As Memorial Day drew closer, the Council contracted volunteer firefighters to burn the top layer of refuse in the landfill in preparation for the day’s festivities, but intentionally burning landfills violated state law. A tangle of events over the years saw the “controlled” burn develop into an underground mine fire and then into a coal seam fire. Excavation costs lie far beyond the state’s budget, and Pennsylvania plans to let the fire burn until its natural end--anticipated at another 240 years. The tangled mess of poor decisions over 21 years begs one question: did the people or the fire kill Centralia?

This paper’s field of study falls into the cross section of geology and fire science, history, social conflict, public service ethics, and collaborative failures. I explore how a series of small choices snowballed into a full, government funded relocation effort after attempts at controlling the anthracite coal seam fire failed. Geology and fire science worked in tandem during the mine fire, influencing each other and complicating the firefighting efforts. The fire itself was a unique challenge. The history of Centralia played a large role in the government and community response efforts. I use the borough and regional history to contextualize the social conflict that divided Centralia. Social conflict impaired the community’s ability to unify and form a therapeutic community, and in turn, it damaged community-government relationships. The government agencies involved in the mine fire response did their own damage to community relationships by pursuing their own interests. Agencies worried about their brand image, and politicians worried about re-election. I study how these ethical failures impacted the situation. Finally, I look at a few examples of collaborative failures on behalf of the government and the community. Over the course of my research, it became apparent the people killed Centralia, not the fire.

ContributorsLandes, Jazmyne (Author) / Bentley, Margaretha (Thesis director) / Gutierrez, Veronica (Committee member) / School of Public Affairs (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Farms of the Future: Food Security in a Changing World

Description

The purpose of this thesis is to imagine and predict the ways in which humans will utilize technology to feed the world population in the 21st century, in spite of significant challenges we have not faced before. This project will first thoroughly identify and explain the most pressing challenges the…

The purpose of this thesis is to imagine and predict the ways in which humans will utilize technology to feed the world population in the 21st century, in spite of significant challenges we have not faced before. This project will first thoroughly identify and explain the most pressing challenges the future will bring in climate change and population growth; both projected to worsen as time goes on. To guide the prediction of how technology will impact the 21st century, a theoretical framework will be established, based upon the green revolution of the 20th century. The theoretical framework will summarize this important historical event, and analyze current thought concerning the socio-economic impacts of the agricultural technologies introduced during this time. Special attention will be paid to the unequal disbursement of benefits of this green revolution, and particularly how it affected small rural farmers. Analysis of the technologies introduced during the green revolution will be used to predict how 21st century technologies will further shape the agricultural sector. Then, the world’s current food crisis will be compared to the crisis that preceded the green revolution. A “second green revolution” is predicted, and the agricultural/economic impact of these advances is theorized based upon analysis of farming advances in the 20th century.

ContributorsWilson, Joshua J (Author) / Strumsky, Deborah (Thesis director) / Benjamin, Victor (Committee member) / Department of Supply Chain Management (Contributor) / School of Sustainability (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Moving Target Defense: Defending against Adversarial Defense

Description

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the…

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the reasons why particular combinations were more effective than others is explored.

ContributorsMazboudi, Yassine Ahmad (Author) / Yang, Yezhou (Thesis director) / Ren, Yi (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Economics Program in CLAS (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

HelperBot: An Adaptive AI for Teaching Advanced Fighting Game Techniques

Description

Popular competitive fighting games such as Super Smash Brothers and Street Fighter have some of the steepest learning curves in the gaming industry. These incredibly technical games require the full attention of the player and often take years to master completely. This barrier of entry prevents newer players from enjoying…

Popular competitive fighting games such as Super Smash Brothers and Street Fighter have some of the steepest learning curves in the gaming industry. These incredibly technical games require the full attention of the player and often take years to master completely. This barrier of entry prevents newer players from enjoying the competitive social environment that such games offer, creating a rift between casual and competitive players. Learning the rules can sometimes be more difficult than playing the game itself. To truly master these concepts requires personal attention from someone who deeply understands the core mechanics that operate behind the scenes.
Meanwhile, machine learning is growing more advanced by the day. Online retailers like Amazon run complex algorithms to recommend future purchases and monitor price changes. Mobile phones use neural networks to interpret speech. GPS apps track anonymous motion data in smartphones to give real-time traffic estimates. Artificial intelligence is becoming increasingly ubiquitous because of its versatility in analyzing and solving human problems; it follows, then, that a machine could learn how to teach humans skills and techniques. HelperBot is a platform fighting game project that employs this cutting-edge learning technology to close the skill gap between novice and veteran gamers as quickly and seamlessly as possible.

ContributorsPalermo, Seth Daniel (Author) / Olson, Loren (Thesis director) / Marinelli, Donald (Committee member) / Arts, Media and Engineering Sch T (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

An Investigation of Morality in Driving Situations as a Basis for Determining Autonomous Vehicle Ethics

Description

As urban populations increase, so does the demand for innovative transportation solutions which reduce traffic congestion, reduce pollution, and reduce inequalities by providing mobility for all kinds of people. One emerging solution is self-driving vehicles, which have been coined as a safer driving method by reducing fatalities due to driving…

As urban populations increase, so does the demand for innovative transportation solutions which reduce traffic congestion, reduce pollution, and reduce inequalities by providing mobility for all kinds of people. One emerging solution is self-driving vehicles, which have been coined as a safer driving method by reducing fatalities due to driving accidents. While completely automated vehicles are still in the testing and development phase, the United Nations predict their full debut by 2030 [1]. While many resources are focusing their time on creating the technology to execute decisions such as the controls, communications, and sensing, engineers often leave ethics as an afterthought. The truth is autonomous vehicles are imperfect systems that will still experience possible crash scenarios even if all systems are working perfectly. Because of this, ethical machine learning must be considered and implemented to avoid an ethical catastrophe which could delay or completely halt future autonomous vehicle development. This paper presents an experiment for determining a more complete view of human morality and how this translates into ideal driving behaviors.
This paper analyzes responses to deviated Trolley Problem scenarios [5] in a simulated driving environment and still images from MIT’s moral machine website [8] to better understand how humans respond to various crashes. Also included is participants driving habits and personal values, however the bulk of that analysis is not included here. The results of the simulation prove that for the most part in driving scenarios, people would rather sacrifice themselves over people outside of the vehicle. The moral machine scenarios prove that self-sacrifice changes as the trend to harm one’s own vehicle was not so strong when passengers were introduced. Further defending this idea is the importance placed on Family Security over any other value.
Suggestions for implementing ethics into autonomous vehicle crashes stem from the results of this experiment but are dependent on more research and greater sample sizes. Once enough data is collected and analyzed, a moral baseline for human’s moral domain may be agreed upon, quantified, and turned into hard rules governing how self-driving cars should act in different scenarios. With these hard rules as boundary conditions, artificial intelligence should provide training and incremental learning for scenarios which cannot be determined by the rules. Finally, the neural networks which make decisions in artificial intelligence must move from their current “black box” state to something more traceable. This will allow researchers to understand why an autonomous vehicle made a certain decision and allow tweaks as needed.

ContributorsBeaulieu, Natalie Nicole (Author) / Berman, Spring (Thesis director) / Cooke, Nancy (Committee member) / Watts College of Public Service & Community Solut (Contributor) / School for Engineering of Matter,Transport & Enrgy (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Filtering by