Filtering by
- All Subjects: Computer Science
- Status: Published
Researchers and practitioners use social media to extract actionable patterns such as where aid should be distributed in a crisis. However, the validity of these patterns relies on having a representative dataset. As this dissertation shows, the data collected from social media is seldom representative of the activity of the site itself, and less so of human activity. This means that the results of many studies are limited by the quality of data they collect.
The finding that social media data is biased inspires the main challenge addressed by this thesis. I introduce three sets of methodologies to correct for bias. First, I design methods to deal with data collection bias. I offer a methodology which can find bias within a social media dataset. This methodology works by comparing the collected data with other sources to find bias in a stream. The dissertation also outlines a data collection strategy which minimizes the amount of bias that will appear in a given dataset. It introduces a crawling strategy which mitigates the amount of bias in the resulting dataset. Second, I introduce a methodology to identify bots and shills within a social media dataset. This directly addresses the concern that the users of a social media site are not representative. Applying these methodologies allows the population under study on a social media site to better match that of the real world. Finally, the dissertation discusses perceptual biases, explains how they affect analysis, and introduces computational approaches to mitigate them.
The results of the dissertation allow for the discovery and removal of different levels of bias within a social media dataset. This has important implications for social media mining, namely that the behavioral patterns and insights extracted from social media will be more representative of the populations under study.
The process of learning a new skill can be time consuming and difficult for both the teacher and the student, especially when it comes to computer modeling. With so many terms and functionalities to familiarize oneself with, this task can be overwhelming to even the most knowledgeable student. The purpose of this paper is to describe the methodology used in the creation of a new set of curricula for those attempting to learn how to use the Dynamic Traffic Simulation Package with Multi-Resolution Modeling. The current DLSim curriculum currently relates information via high-concept terms and complicated graphics. The information in this paper aims to provide a streamlined set of curricula for new users of DLSim, including lesson plans and improved infographics.