Filtering by
- Creators: Computer Science and Engineering Program
- Member of: Theses and Dissertations
Over the past couple of years, the focus on the prevalence of hate-speech and misinformation on the internet has increased. Lawmakers feel that repealing or reforming Section 230 of the Communication Decency Act is the way to go, considering that the law has been used to protect companies from any liability in the past. In this podcast series, I will be explaining what Section 230 is, how it affects us, and what changes are being proposed. In doing so, I wish to shed a light on how the problems of the internet are not solely in the hands of social media giants and a 26-word long law, but all its users that make up our global community.
The role of technology in shaping modern society has become increasingly important in the context of current democratic politics, especially when examined through the lens of social media. Twitter is a prominent social media platform used as a political medium, contributing to political movements such as #OccupyWallStreet, #MeToo, and #BlackLivesMatter. Using the #BlackLivesMatter movement as an illustrative case to establish patterns in Twitter usage, this thesis aims to answer the question “to what extent is Twitter an accurate representation of “real life” in terms of performative activism and user engagement?” The discussion of Twitter is contextualized by research on Twitter’s use in politics, both as a mobilizing force and potential to divide and mislead. Using intervals of time between 2014 – 2020, Twitter data containing #BlackLivesMatter is collected and analyzed. The discussion of findings centers around the role of performative activism in social mobilization on twitter. The analysis shows patterns in the data that indicates performative activism can skew the real picture of civic engagement, which can impact the way in which public opinion affects future public policy and mobilization.
Social injustice issues are a familiar, yet very arduous topic to define. This is because they are difficult to predict and tough to understand. Injustice issues negatively affect communities because they directly violate human rights and they span a wide range of areas. For instance, injustice issues can relate to unfair labor practices, racism, gender bias, politics etc. This leaves numerous individuals wondering how they can make sense of social injustice issues and perhaps take efforts to stop them from occurring in the future. In an attempt to understand the rather complicated nature of social injustice, this thesis takes a data driven approach to define a social injustice index for a specific country, India. The thesis is an attempt to quantify and track social injustice through social media to see the current social climate. This was accomplished by developing a web scraper to collect hate speech data from Twitter. The tweets collected were then classified by their level of hate and presented on a choropleth map of India. Ultimately, a user viewing the ‘India Social Injustice Index’ map should be able to simply view an index score for a desired state in India through a single click. This thesis hopes to make it simple for any user viewing the social injustice map to make better sense of injustice issues.
Throughout this project, I decided on a number of learning goals to consider it a success. I needed to learn how to use the supporting libraries that would help me to design this system. I also learned how to use the Twitter API, as well as create the infrastructure behind it that would allow me to collect large amounts of data for machine learning. I needed to become familiar with common machine learning libraries in Python in order to create the necessary algorithms and pipelines to make predictions based on Twitter data.
This paper details the steps and decisions needed to determine how to collect this data and apply it to machine learning algorithms. I determined how to create labelled data using pre-existing Botometer ratings, and the levels of confidence I needed to label data for training. I use the scikit-learn library to create these algorithms to best detect these bots. I used a number of pre-processing routines to refine the classifiers’ precision, including natural language processing and data analysis techniques. I eventually move to remotely-hosted versions of the system on Amazon web instances to collect larger amounts of data and train more advanced classifiers. This leads to the details of my final implementation of a user-facing server, hosted on AWS and interfacing over Gmail’s IMAP server.
The current and future development of this system is laid out. This includes more advanced classifiers, better data analysis, conversions to third party Twitter data collection systems, and user features. I detail what it is I have learned from this exercise, and what it is I hope to continue working on.