Filtering by
- Creators: Computer Science and Engineering Program
- Creators: Voorhees, Matthew
- Member of: Barrett, The Honors College Thesis/Creative Project Collection
Over the past couple of years, the focus on the prevalence of hate-speech and misinformation on the internet has increased. Lawmakers feel that repealing or reforming Section 230 of the Communication Decency Act is the way to go, considering that the law has been used to protect companies from any liability in the past. In this podcast series, I will be explaining what Section 230 is, how it affects us, and what changes are being proposed. In doing so, I wish to shed a light on how the problems of the internet are not solely in the hands of social media giants and a 26-word long law, but all its users that make up our global community.
The role of technology in shaping modern society has become increasingly important in the context of current democratic politics, especially when examined through the lens of social media. Twitter is a prominent social media platform used as a political medium, contributing to political movements such as #OccupyWallStreet, #MeToo, and #BlackLivesMatter. Using the #BlackLivesMatter movement as an illustrative case to establish patterns in Twitter usage, this thesis aims to answer the question “to what extent is Twitter an accurate representation of “real life” in terms of performative activism and user engagement?” The discussion of Twitter is contextualized by research on Twitter’s use in politics, both as a mobilizing force and potential to divide and mislead. Using intervals of time between 2014 – 2020, Twitter data containing #BlackLivesMatter is collected and analyzed. The discussion of findings centers around the role of performative activism in social mobilization on twitter. The analysis shows patterns in the data that indicates performative activism can skew the real picture of civic engagement, which can impact the way in which public opinion affects future public policy and mobilization.
Social injustice issues are a familiar, yet very arduous topic to define. This is because they are difficult to predict and tough to understand. Injustice issues negatively affect communities because they directly violate human rights and they span a wide range of areas. For instance, injustice issues can relate to unfair labor practices, racism, gender bias, politics etc. This leaves numerous individuals wondering how they can make sense of social injustice issues and perhaps take efforts to stop them from occurring in the future. In an attempt to understand the rather complicated nature of social injustice, this thesis takes a data driven approach to define a social injustice index for a specific country, India. The thesis is an attempt to quantify and track social injustice through social media to see the current social climate. This was accomplished by developing a web scraper to collect hate speech data from Twitter. The tweets collected were then classified by their level of hate and presented on a choropleth map of India. Ultimately, a user viewing the ‘India Social Injustice Index’ map should be able to simply view an index score for a desired state in India through a single click. This thesis hopes to make it simple for any user viewing the social injustice map to make better sense of injustice issues.
On January 5, 2020, the World Health Organization (WHO) reported on the outbreak of pneumonia of unknown cause in Wuhan, China. Two weeks later, a 35-year-old Washington resident checked into a local urgent care clinic with a 4-day cough and fever. Laboratory testing would confirm this individual as the first case of the novel coronavirus in the U.S., and on January 20, 2020, the Center for Disease Control (CDC) reported this case to the public. In the days and weeks to follow, Twitter, a social media platform with 450 million active monthly users as of 2020, provided many American residents the opportunity to share their thoughts on the developing pandemic online. Social media sites like Twitter are a prominent source of discourse surrounding contemporary political issues, allowing for direct communication between users in real-time. As more population centers around the world gain access to the internet, most democratic discussion, both nationally and internationally, will take place in online spaces. The activity of elected officials as private citizens in these online spaces is often overlooked. I find the ability of publics—which philosopher John Dewey defines as groups of people with shared needs—to communicate effectively and monitor the interests of political elites online to be lacking. To best align the interests of officials and citizens, and achieve transparency between publics and elected officials, we need an efficient way to measure and record these interests. Through this thesis, I found that natural language processing methods like sentiment analyses can provide an effective means of gauging the attitudes of politicians towards contemporary issues.
Throughout this project, I decided on a number of learning goals to consider it a success. I needed to learn how to use the supporting libraries that would help me to design this system. I also learned how to use the Twitter API, as well as create the infrastructure behind it that would allow me to collect large amounts of data for machine learning. I needed to become familiar with common machine learning libraries in Python in order to create the necessary algorithms and pipelines to make predictions based on Twitter data.
This paper details the steps and decisions needed to determine how to collect this data and apply it to machine learning algorithms. I determined how to create labelled data using pre-existing Botometer ratings, and the levels of confidence I needed to label data for training. I use the scikit-learn library to create these algorithms to best detect these bots. I used a number of pre-processing routines to refine the classifiers’ precision, including natural language processing and data analysis techniques. I eventually move to remotely-hosted versions of the system on Amazon web instances to collect larger amounts of data and train more advanced classifiers. This leads to the details of my final implementation of a user-facing server, hosted on AWS and interfacing over Gmail’s IMAP server.
The current and future development of this system is laid out. This includes more advanced classifiers, better data analysis, conversions to third party Twitter data collection systems, and user features. I detail what it is I have learned from this exercise, and what it is I hope to continue working on.