Search Content

Using Machine Learning Models to Detect Fake News, Bots, and Rumors on Social Media

Description

In this paper, I introduce the fake news problem and detail how it has been exacerbated through social media. I explore current practices for fake news detection using natural language processing and current benchmarks in ranking the efficacy of various language models. Using a Twitter-specific benchmark, I attempt to reproduce the scores of…

In this paper, I introduce the fake news problem and detail how it has been exacerbated through social media. I explore current practices for fake news detection using natural language processing and current benchmarks in ranking the efficacy of various language models. Using a Twitter-specific benchmark, I attempt to reproduce the scores of six language models demonstrating their effectiveness in seven tweet classification tasks. I explain the successes and challenges in reproducing these results and provide analysis for the future implications of fake news research.

ContributorsChang, Ariz Bay (Author) / Liu, Huan (Thesis director) / Tahir, Anique (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

PayPal - Social Injustice Index

Description

Social injustice issues are a familiar, yet very arduous topic to define. This is because they are difficult to predict and tough to understand. Injustice issues negatively affect communities because they directly violate human rights and they span a wide range of areas. For instance, injustice issues can relate to…

Social injustice issues are a familiar, yet very arduous topic to define. This is because they are difficult to predict and tough to understand. Injustice issues negatively affect communities because they directly violate human rights and they span a wide range of areas. For instance, injustice issues can relate to unfair labor practices, racism, gender bias, politics etc. This leaves numerous individuals wondering how they can make sense of social injustice issues and perhaps take efforts to stop them from occurring in the future. In an attempt to understand the rather complicated nature of social injustice, this thesis takes a data driven approach to define a social injustice index for a specific country, India. The thesis is an attempt to quantify and track social injustice through social media to see the current social climate. This was accomplished by developing a web scraper to collect hate speech data from Twitter. The tweets collected were then classified by their level of hate and presented on a choropleth map of India. Ultimately, a user viewing the ‘India Social Injustice Index’ map should be able to simply view an index score for a desired state in India through a single click. This thesis hopes to make it simple for any user viewing the social injustice map to make better sense of injustice issues.

ContributorsDeosthali, Shefali (Author) / Chavez-Echeagaray, Maria Elena (Thesis director) / Mathews, Nicolle (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

Using ML to Predict Online Course Ratings

Description

The pandemic that hit in 2020 has boosted the growth of online learning that involves the booming of Massive Open Online Course (MOOC). To support this situation, it will be helpful to have tools that can help students in choosing between the different courses and can help instructors to understand…

The pandemic that hit in 2020 has boosted the growth of online learning that involves the booming of Massive Open Online Course (MOOC). To support this situation, it will be helpful to have tools that can help students in choosing between the different courses and can help instructors to understand what the students need. One of those tools is an online course ratings predictor. Using the predictor, online course instructors can learn the qualities that majority course takers deem as important, and thus they can adjust their lesson plans to fit those qualities. Meanwhile, students will be able to use it to help them in choosing the course to take by comparing the ratings. This research aims to find the best way to predict the rating of online courses using machine learning (ML). To create the ML model, different combinations of the length of the course, the number of materials it contains, the price of the course, the number of students taking the course, the course’s difficulty level, the usage of jargons or technical terms in the course description, the course’s instructors’ rating, the number of reviews the instructors got, and the number of classes the instructors have created on the same platform are used as the inputs. Meanwhile, the output of the model would be the average rating of a course. Data from 350 courses are used for this model, where 280 of them are used for training, 35 for testing, and the last 35 for validation. After trying out different machine learning models, wide neural networks model constantly gives the best training results while the medium tree model gives the best testing results. However, further research needs to be conducted as none of the results are not accurate, with 0.51 R-squared test result for the tree model.

ContributorsWidodo, Herlina (Author) / VanLehn, Kurt (Thesis director) / Craig, Scotty (Committee member) / Barrett, The Honors College (Contributor) / Department of Management and Entrepreneurship (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Comparison of Machine Learning Algorithms for Predicting Breast Cancer Malignancy

Description

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized…

Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized images of breast tissue samples, called fine-needle aspirates. Breast cancer diagnosis typically involves a combination of mammography, ultrasound, and biopsy. However, machine learning algorithms can assist in the detection and diagnosis of breast cancer by analyzing large amounts of data and identifying patterns that may not be discernible to the human eye. By using these algorithms, healthcare professionals can potentially detect breast cancer at an earlier stage, leading to more effective treatment and better patient outcomes. The results showed that the gradient boosting classifier performed the best, achieving an accuracy of 96% on the test set. This indicates that this algorithm can be a useful tool for healthcare professionals in the early detection and diagnosis of breast cancer, potentially leading to improved patient outcomes.

ContributorsMallya, Aatmik (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Using Machine Learning to Predict Performance in the NFL

Description

In the last two decades, fantasy sports have grown massively in popularity. Fantasy football in particular is the most popular fantasy sport in the United States. People spend hours upon hours every year building, researching, and perfecting their teams to compete with others for money or bragging rights. One problem,…

In the last two decades, fantasy sports have grown massively in popularity. Fantasy football in particular is the most popular fantasy sport in the United States. People spend hours upon hours every year building, researching, and perfecting their teams to compete with others for money or bragging rights. One problem, however, is that National Football League (NFL) players are human and will not perform the same as they did last week or last season. Because of this, there is a need to create a machine learning model to help predict when players will have a tough game or when they can perform above average. This report discusses the history and science of fantasy football, gathering large amounts of player data, manipulating the information to create more insightful data points, creating a machine learning model, and how to use this tool in a real-world situation. The initial model created significantly accurate predictions for quarterbacks and running backs but not receivers and tight ends. Improvements significantly increased the accuracy by reducing the mean average error to below one for all positions, resulting in a successful model for all four positions.

ContributorsCase, Spencer (Author) / Johnson, Jarod (Co-author) / Kostelich, Eric (Thesis director) / Zhuang, Houlong (Committee member) / Barrett, The Honors College (Contributor) / Department of Psychology (Contributor) / Mechanical and Aerospace Engineering Program (Contributor)

Created2023-05

Modeling Future Cropland Viability in the Eastern Continental United States

Description

Climate is a critical determinant of agricultural productivity, and the ability to accurately predict this productivity is necessary to provide guidance regarding food security and agricultural management. Previous predictions vary in approach due to the myriad of factors influencing agricultural productivity but generally suggest long-term declines in productivity and agricultural…

Climate is a critical determinant of agricultural productivity, and the ability to accurately predict this productivity is necessary to provide guidance regarding food security and agricultural management. Previous predictions vary in approach due to the myriad of factors influencing agricultural productivity but generally suggest long-term declines in productivity and agricultural land suitability under climate change. In this paper, I relate predicted climate changes to yield for three major United States crops, namely corn, soybeans, and wheat, using a moderate emissions scenario. By adopting data-driven machine learning approaches, I used the following machine learning methods: random forest (RF), extreme gradient boosting (XGB), and artificial neural networks (ANN) to perform comparative analysis and ensemble methodology. I omitted the western US due to the region's susceptibility to water stress and the prevalence of artificial irrigation as a means to compensate for dry conditions. By considering only climate, the model's results suggest an ensemble mean decline in crop yield of 23.4\% for corn, 19.1\% for soybeans, and 7.8\% for wheat between the years of 2017 and 2100. These results emphasize potential negative impacts of climate change on the current agricultural industry as a result of shifting bio-climactic conditions.

ContributorsSwarup, Shray (Author) / Eikenberry, Steffen (Thesis director) / Mahalov, Alex (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Improving Quantum Mechanical Calculations Using Graph Neural Networks to Predict Energies from Atomic Structure

Description

Graph neural networks (GNN) offer a potential method of bypassing the Kohn-Sham equations in density functional theory (DFT) calculations by learning both the Hohenberg-Kohn (HK) mapping of electron density to energy, allowing for calculations of much larger atomic systems and time scales and enabling large-scale MD simulations with DFT-level accuracy.…

Graph neural networks (GNN) offer a potential method of bypassing the Kohn-Sham equations in density functional theory (DFT) calculations by learning both the Hohenberg-Kohn (HK) mapping of electron density to energy, allowing for calculations of much larger atomic systems and time scales and enabling large-scale MD simulations with DFT-level accuracy. In this work, we investigate the feasibility of GNNs to learn the HK map from the external potential approximated as Gaussians to the electron density 𝑛(𝑟), and the mapping from 𝑛(𝑟) to the energy density 𝑒(𝑟) using Pytorch Geometric. We develop a graph representation for densities on radial grid points and determine that a k-nearest neighbor algorithm for determining node connections is an effective approach compared to a distance cutoff model, having an average graph size of 6.31 MB and 32.0 MB for datasets with 𝑘 = 10 and 𝑘 = 50 respectively. Furthermore, we develop two GNNs in Pytorch Geometric, and demonstrate a decrease in training losses for a 𝑛(𝑟) to 𝑒(𝑟) of 8.52 · 10^14 and 3.10 · 10^14 for 𝑘 = 10 and 𝑘 = 20 datasets respectively, suggesting the model could be further trained and optimized to learn the electron density to energy functional.

ContributorsHayes, Matthew (Author) / Muhich, Christopher (Thesis director) / Oswald, Jay (Committee member) / Barrett, The Honors College (Contributor) / Chemical Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2023-05

Machine Learning and Mario Speedruns

Description

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will…

Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will be described is self-play in video games, where a neural model will be researched and described that will teach a computer to complete a level of Super Mario World (1990) on its own. The neural model in question was inspired by the academic paper “Evolving Neural Networks through Augmenting Topologies”, which was written by Kenneth O. Stanley and Risto Miikkulainen of University of Texas at Austin. The model that will actually be described is from YouTuber SethBling of the California Institute of Technology. The second department that will be described is cybersecurity, where an algorithm is described from the academic paper “Process Based Volatile Memory Forensics for Ransomware Detection”, written by Asad Arfeen, Muhammad Asim Khan, Obad Zafar, and Usama Ahsan. This algorithm utilizes Python and the Volatility framework to detect malicious software in an infected system.

ContributorsBallecer, Joshua (Author) / Yang, Yezhou (Thesis director) / Luo, Yiran (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2023-05

Quantum Machine Learning for Optical and SAR Classification

Description

We present in this paper a method to compare scene classification accuracy of C-band Synthetic aperture radar (SAR) and optical images utilizing both classical and quantum computing algorithms. This REU study uses data from the Sentinel satellite. The dataset contains (i) synthetic aperture radar images collected from the Sentinel-1 satellite…

We present in this paper a method to compare scene classification accuracy of C-band Synthetic aperture radar (SAR) and optical images utilizing both classical and quantum computing algorithms. This REU study uses data from the Sentinel satellite. The dataset contains (i) synthetic aperture radar images collected from the Sentinel-1 satellite and (ii) optical images for the same area as the SAR images collected from the Sentinel-2 satellite. We utilize classical neural networks to classify four classes of images. We then use Quantum Convolutional Neural Networks and deep learning techniques to take advantage of machine learning to help the system train, learn, and identify at a higher classification accuracy. A hybrid Quantum-classical model that is trained on the Sentinel1-2 dataset is proposed, and the performance is then compared against the classical in terms of classification accuracy.

ContributorsMiller, Leslie (Author) / Spanias, Andreas (Thesis director) / Uehara, Glen (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2023-05

Evaluation of Machine Learning Techniques for Pneumonia Detection

Description

Although relatively new technology, machine learning has rapidly demonstrated its many uses. One potential application of machine learning is the diagnosis of ailments in medical imaging. Ideally, through classification methods, a computer program would be able to identify different medical conditions when provided with an X-ray or other such scan.…

Although relatively new technology, machine learning has rapidly demonstrated its many uses. One potential application of machine learning is the diagnosis of ailments in medical imaging. Ideally, through classification methods, a computer program would be able to identify different medical conditions when provided with an X-ray or other such scan. This would be very beneficial for overworked doctors, and could act as a potential crutch to aid in giving accurate diagnoses. For this thesis project, five different machine-learning algorithms were tested on two datasets containing 5,856 lung X-ray scans labeled as either “Pneumonia” or “Normal”. The goal was to determine which algorithm achieved the highest accuracy, as well as how preprocessing the data affected the accuracy of the models. The following supervised-learning methods were tested: support vector machines, logistic regression, decision trees, random forest, and a convolutional neural network. Each model was adjusted independently in order to achieve maximum performance before accuracy metrics were generated to pit the models against each other. Additionally, the effect of resizing images on model performance was investigated. Overall, a convolutional neural network proved to be the superior model for pneumonia detection, with a 91% accuracy. After resizing to 28x28, CNN accuracy decreased to 85%. The random forest model performed second best. The 28x28 PneumoniaMNIST dataset achieved higher accuracy using traditional machine learning models than the HD Chest X-Ray dataset. Resizing the Chest X-ray images had minimal effect on traditional model performance when resized to 28x28 or larger.

ContributorsVollkommer, Margie (Author) / Spanias, Andreas (Thesis director) / Sivaraman Narayanaswamy, Vivek (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor)

Created2023-05

Filtering by