Search Content

PayPal - Social Injustice Index

Description

Social injustice issues are a familiar, yet very arduous topic to define. This is because they are difficult to predict and tough to understand. Injustice issues negatively affect communities because they directly violate human rights and they span a wide range of areas. For instance, injustice issues can relate to…

Social injustice issues are a familiar, yet very arduous topic to define. This is because they are difficult to predict and tough to understand. Injustice issues negatively affect communities because they directly violate human rights and they span a wide range of areas. For instance, injustice issues can relate to unfair labor practices, racism, gender bias, politics etc. This leaves numerous individuals wondering how they can make sense of social injustice issues and perhaps take efforts to stop them from occurring in the future. In an attempt to understand the rather complicated nature of social injustice, this thesis takes a data driven approach to define a social injustice index for a specific country, India. The thesis is an attempt to quantify and track social injustice through social media to see the current social climate. This was accomplished by developing a web scraper to collect hate speech data from Twitter. The tweets collected were then classified by their level of hate and presented on a choropleth map of India. Ultimately, a user viewing the ‘India Social Injustice Index’ map should be able to simply view an index score for a desired state in India through a single click. This thesis hopes to make it simple for any user viewing the social injustice map to make better sense of injustice issues.

ContributorsDeosthali, Shefali (Author) / Chavez-Echeagaray, Maria Elena (Thesis director) / Mathews, Nicolle (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

Using ML to Predict Online Course Ratings

Description

The pandemic that hit in 2020 has boosted the growth of online learning that involves the booming of Massive Open Online Course (MOOC). To support this situation, it will be helpful to have tools that can help students in choosing between the different courses and can help instructors to understand…

The pandemic that hit in 2020 has boosted the growth of online learning that involves the booming of Massive Open Online Course (MOOC). To support this situation, it will be helpful to have tools that can help students in choosing between the different courses and can help instructors to understand what the students need. One of those tools is an online course ratings predictor. Using the predictor, online course instructors can learn the qualities that majority course takers deem as important, and thus they can adjust their lesson plans to fit those qualities. Meanwhile, students will be able to use it to help them in choosing the course to take by comparing the ratings. This research aims to find the best way to predict the rating of online courses using machine learning (ML). To create the ML model, different combinations of the length of the course, the number of materials it contains, the price of the course, the number of students taking the course, the course’s difficulty level, the usage of jargons or technical terms in the course description, the course’s instructors’ rating, the number of reviews the instructors got, and the number of classes the instructors have created on the same platform are used as the inputs. Meanwhile, the output of the model would be the average rating of a course. Data from 350 courses are used for this model, where 280 of them are used for training, 35 for testing, and the last 35 for validation. After trying out different machine learning models, wide neural networks model constantly gives the best training results while the medium tree model gives the best testing results. However, further research needs to be conducted as none of the results are not accurate, with 0.51 R-squared test result for the tree model.

ContributorsWidodo, Herlina (Author) / VanLehn, Kurt (Thesis director) / Craig, Scotty (Committee member) / Barrett, The Honors College (Contributor) / Department of Management and Entrepreneurship (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

A Graph-Based Machine Learning Approach to Realistic Traffic Volume Generation

Description

In this work, we explore the potential for realistic and accurate generation of hourly traffic volume with machine learning (ML), using the ground-truth data of Manhattan road segments collected by the New York State Department of Transportation (NYSDOT). Specifically, we address the following question– can we develop a ML algorithm…

In this work, we explore the potential for realistic and accurate generation of hourly traffic volume with machine learning (ML), using the ground-truth data of Manhattan road segments collected by the New York State Department of Transportation (NYSDOT). Specifically, we address the following question– can we develop a ML algorithm that generalizes the existing NYSDOT data to all road segments in Manhattan?– by introducing a supervised learning task of multi-output regression, where ML algorithms use road segment attributes to predict hourly traffic volume. We consider four ML algorithms– K-Nearest Neighbors, Decision Tree, Random Forest, and Neural Network– and hyperparameter tune by evaluating the performances of each algorithm with 10-fold cross validation. Ultimately, we conclude that neural networks are the best-performing models and require the least amount of testing time. Lastly, we provide insight into the quantification of “trustworthiness” in a model, followed by brief discussions on interpreting model performance, suggesting potential project improvements, and identifying the biggest takeaways. Overall, we hope our work can serve as an effective baseline for realistic traffic volume generation, and open new directions in the processes of supervised dataset generation and ML algorithm design.

ContributorsOtstot, Kyle (Author) / De Luca, Gennaro (Thesis director) / Chen, Yinong (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

An Introduction to Unstructured Case Management

Description

In the age of information, collecting and processing large amounts of data is an integral part of running a business. From training artificial intelligence to driving decision making, the applications of data are far-reaching. However, it is difficult to process many types of data; namely, unstructured data. Unstructured data is…

In the age of information, collecting and processing large amounts of data is an integral part of running a business. From training artificial intelligence to driving decision making, the applications of data are far-reaching. However, it is difficult to process many types of data; namely, unstructured data. Unstructured data is “information that either does not have a predefined data model or is not organized in a pre-defined manner” (Balducci & Marinova 2018). Such data are difficult to put into spreadsheets and relational databases due to their lack of numeric values and often come in the form of text fields written by the consumers (Wolff, R. 2020). The goal of this project is to help in the development of a machine learning model to aid CommonSpirit Health and ServiceNow, hence why this approach using unstructured data was selected. This paper provides a general overview of the process of unstructured data management and explores some existing implementations and their efficacy. It will then discuss our approach to converting unstructured cases into usable data that were used to develop an artificial intelligence model which is estimated to be worth $400,000 and save CommonSpirit Health $1,200,000 in organizational impact.

ContributorsBergsagel, Matteo (Author) / De Waard, Jan (Co-author) / Chavez-Echeagaray, Maria Elena (Thesis director) / Burns, Christopher (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

A Survey Study of the Current Understanding of Artificial Intelligence and Machine Learning in Medical Practice among Healthcare Professionals and the Lay Public

Description

Artificial intelligence (AI) and machine learning (ML) is rapidly evolving with enormous impact on a wide range of individual and societal matters including in health care, now and in the future. The goal of this research project is to assess the current knowledge level of AI and ML in health…

Artificial intelligence (AI) and machine learning (ML) is rapidly evolving with enormous impact on a wide range of individual and societal matters including in health care, now and in the future. The goal of this research project is to assess the current knowledge level of AI and ML in health care among healthcare professionals and the lay public. Results from this research will identify knowledge gaps and educational opportunities to improve future use and applications of AI and ML in health care.

ContributorsShen, Maria (Author) / Martin, Thomas (Thesis director) / Wheatley-Guy, Courtney (Committee member) / Barrett, The Honors College (Contributor) / College of Health Solutions (Contributor)

Created2022-05

Agora: Introducing the Internet's Opinion to Traditional Stock Analysis and Prediction.

Description

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence…

This project aims to incorporate the aspect of sentiment analysis into traditional stock analysis to enhance stock rating predictions by applying a reliance on the opinion of various stocks from the Internet. Headlines from eight major news publications and conversations from Yahoo! Finance’s “Conversations” feature were parsed through the Valence Aware Dictionary for Sentiment Reasoning (VADER) natural language processing package to determine numerical polarities which represented positivity or negativity for a given stock ticker. These generated polarities were paired with stock metrics typically observed by stock analysts as the feature set for a Logistic Regression machine learning model. The model was trained on roughly 1500 major stocks to determine a binary classification between a “Buy” or “Not Buy” rating for each stock, and the results of the model were inserted into the back-end of the Agora Web UI which emulates search engine behavior specifically for stocks found in NYSE and NASDAQ. The model reported an accuracy of 82.5% and for most major stocks, the model’s prediction correlated with stock analysts’ ratings. Given the volatility of the stock market and the propensity for hive-mind behavior in online forums, the performance of the Logistic Regression model would benefit from incorporating historical stock data and more sources of opinion to balance any subjectivity in the model.

ContributorsRamaraju, Venkat (Author) / Rao, Jayanth (Co-author) / Bansal, Ajay (Thesis director) / Smith, James (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Comparative Evaluation of Generative Machine Learning Models for Jazz Improvisation using Numerical Metrics

Description

Standardization is sorely lacking in the field of musical machine learning. This thesis project endeavors to contribute to this standardization by training three machine learning models on the same dataset and comparing them using the same metrics. The music-specific metrics utilized provide more relevant information for diagnosing the shortcomings of…

Standardization is sorely lacking in the field of musical machine learning. This thesis project endeavors to contribute to this standardization by training three machine learning models on the same dataset and comparing them using the same metrics. The music-specific metrics utilized provide more relevant information for diagnosing the shortcomings of each model.

ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Hilliker Thesis Paper

ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Missing Metrics Python Program

ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Characterizing the Performance of Machine Learning Algorithms: A Study and Novel Techniques

Description

Classification in machine learning is quite crucial to solve many problems that the world is presented with today. Therefore, it is key to understand one’s problem and develop an efficient model to achieve a solution. One technique to achieve greater model selection and thus further ease in problem solving is…

Classification in machine learning is quite crucial to solve many problems that the world is presented with today. Therefore, it is key to understand one’s problem and develop an efficient model to achieve a solution. One technique to achieve greater model selection and thus further ease in problem solving is estimation of the Bayes Error Rate. This paper provides the development and analysis of two methods used to estimate the Bayes Error Rate on a given set of data to evaluate performance. The first method takes a “global” approach, looking at the data as a whole, and the second is more “local”—partitioning the data at the outset and then building up to a Bayes Error Estimation of the whole. It is found that one of the methods provides an accurate estimation of the true Bayes Error Rate when the dataset is at high dimension, while the other method provides accurate estimation at large sample size. This second conclusion, in particular, can have significant ramifications on “big data” problems, as one would be able to clarify the distribution with an accurate estimation of the Bayes Error Rate by using this method.

ContributorsLattus, Robert (Author) / Dasarathy, Gautam (Thesis director) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2021-12

Barrett, The Honors College Thesis/Creative Project Collection

Filtering by

PayPal - Social Injustice Index

Using ML to Predict Online Course Ratings

A Graph-Based Machine Learning Approach to Realistic Traffic Volume Generation

An Introduction to Unstructured Case Management

A Survey Study of the Current Understanding of Artificial Intelligence and Machine Learning in Medical Practice among Healthcare Professionals and the Lay Public

Agora: Introducing the Internet's Opinion to Traditional Stock Analysis and Prediction.

Comparative Evaluation of Generative Machine Learning Models for Jazz Improvisation using Numerical Metrics

Hilliker Thesis Paper

Missing Metrics Python Program

Characterizing the Performance of Machine Learning Algorithms: A Study and Novel Techniques