Social injustice issues are a familiar, yet very arduous topic to define. This is because they are difficult to predict and tough to understand. Injustice issues negatively affect communities because they directly violate human rights and they span a wide range of areas. For instance, injustice issues can relate to unfair labor practices, racism, gender bias, politics etc. This leaves numerous individuals wondering how they can make sense of social injustice issues and perhaps take efforts to stop them from occurring in the future. In an attempt to understand the rather complicated nature of social injustice, this thesis takes a data driven approach to define a social injustice index for a specific country, India. The thesis is an attempt to quantify and track social injustice through social media to see the current social climate. This was accomplished by developing a web scraper to collect hate speech data from Twitter. The tweets collected were then classified by their level of hate and presented on a choropleth map of India. Ultimately, a user viewing the ‘India Social Injustice Index’ map should be able to simply view an index score for a desired state in India through a single click. This thesis hopes to make it simple for any user viewing the social injustice map to make better sense of injustice issues.
The aim of this project is to understand the basic algorithmic components of the transformer deep learning architecture. At a high level, a transformer is a machine learning model based off of a recurrent neural network that adopts a self-attention mechanism, which can weigh significant parts of sequential input data which is very useful for solving problems in natural language processing and computer vision. There are other approaches to solving these problems which have been implemented in the past (i.e., convolutional neural networks and recurrent neural networks), but these architectures introduce the issue of the vanishing gradient problem when an input becomes too long (which essentially means the network loses its memory and halts learning) and have a slow training time in general. The transformer architecture’s features enable a much better “memory” and a faster training time, which makes it a more optimal architecture in solving problems. Most of this project will be spent producing a survey that captures the current state of research on the transformer, and any background material to understand it. First, I will do a keyword search of the most well cited and up-to-date peer reviewed publications on transformers to understand them conceptually. Next, I will investigate any necessary programming frameworks that will be required to implement the architecture. I will use this to implement a simplified version of the architecture or follow an easy to use guide or tutorial in implementing the architecture. Once the programming aspect of the architecture is understood, I will then Implement a transformer based on the academic paper “Attention is All You Need”. I will then slightly tweak this model using my understanding of the architecture to improve performance. Once finished, the details (i.e., successes, failures, process and inner workings) of the implementation will be evaluated and reported, as well as the fundamental concepts surveyed. The motivation behind this project is to explore the rapidly growing area of AI algorithms, and the transformer algorithm in particular was chosen because it is a major milestone for engineering with AI and software. Since their introduction, transformers have provided a very effective way of solving natural language processing, which has allowed any related applications to succeed with high speed while maintaining accuracy. Since then, this type of model can be applied to more cutting edge natural language processing applications, such as extracting semantic information from a text description and generating an image to satisfy it.
This research investigates the attitude of students towards chatbots and their potential usage in finding career resources. Survey data from two sources were analyzed using descriptive statistics and correlation analysis. The first survey found that students had a neutral attitude towards chatbots, but chatbot understanding was a key factor in increasing their usage. The survey data suggested that chatbots could provide quick and convenient access to information and personalized recommendations, but their effectiveness for career resource searches may be limited. The second survey found that students who were more satisfied with the quality of resources from the career office were more likely to use chatbots. However, students who felt more prepared to explore their career options were less likely to use chatbots. These results suggest that the W. P. Carey Career Office could benefit from offering more and better resources to prepare students for exploring their career options and could explore the use of chatbots to enhance the quality of their resources and increase student satisfaction. Further research is needed to confirm these suggestions and explore other possible factors that may affect the use of chatbots and the satisfaction with career office resources.
On January 5, 2020, the World Health Organization (WHO) reported on the outbreak of pneumonia of unknown cause in Wuhan, China. Two weeks later, a 35-year-old Washington resident checked into a local urgent care clinic with a 4-day cough and fever. Laboratory testing would confirm this individual as the first case of the novel coronavirus in the U.S., and on January 20, 2020, the Center for Disease Control (CDC) reported this case to the public. In the days and weeks to follow, Twitter, a social media platform with 450 million active monthly users as of 2020, provided many American residents the opportunity to share their thoughts on the developing pandemic online. Social media sites like Twitter are a prominent source of discourse surrounding contemporary political issues, allowing for direct communication between users in real-time. As more population centers around the world gain access to the internet, most democratic discussion, both nationally and internationally, will take place in online spaces. The activity of elected officials as private citizens in these online spaces is often overlooked. I find the ability of publics—which philosopher John Dewey defines as groups of people with shared needs—to communicate effectively and monitor the interests of political elites online to be lacking. To best align the interests of officials and citizens, and achieve transparency between publics and elected officials, we need an efficient way to measure and record these interests. Through this thesis, I found that natural language processing methods like sentiment analyses can provide an effective means of gauging the attitudes of politicians towards contemporary issues.
2018, Google researchers published the BERT (Bidirectional Encoder Representations from Transformers) model, which has since served as a starting point for hundreds of NLP (Natural Language Processing) related experiments and other derivative models. BERT was trained on masked-language modelling (sentence prediction) but its capabilities extend to more common NLP tasks, such as language inference and text classification. Naralytics is a company that seeks to use natural language in order to be able to categorize users who create text into multiple categories – which is a modified version of classification. However, the text that Naralytics seeks to pull from exceed the maximum token length of 512 tokens that BERT supports – so this report discusses the research towards multiple BERT derivatives that seek to address this problem – and then implements a solution that addresses the multiple concerns that are attached to this kind of model.