In this Barrett Honors Thesis, I developed a model to quantify the complexity of Sankey diagrams, which are a type of visualization technique that shows flow between groups. To do this, I created a carefully controlled dataset of synthetic Sankey diagrams of varying sizes as study stimuli. Then, a pair of online crowdsourced user studies were conducted and analyzed. User performance for Sankey diagrams of varying size and features (number of groups, number of timesteps, and number of flow crossings) were algorithmically modeled as a formula to quantify the complexity of these diagrams. Model accuracy was measured based on the performance of users in the second crowdsourced study. The results of my experiment conclusively demonstrates that the algorithmic complexity formula I created closely models the visual complexity of the Sankey Diagrams in the dataset.
To do this, a 4-week long pilot curriculum was created, implemented, and tested through an optional class at I Am Zambia, available to women who had already graduated from the year-long I Am Zambia Academy program. A total of 18 women ages 18-24 chose to enroll in the course. There were a total of 10 lessons, taught over 20 class period. These lessons covered four main computational thinking frameworks: introduction to computational thinking, algorithmic thinking, pseudocode, and debugging. Knowledge retention was tested through the use of a CS educational tool, QuizIt, created by the CSI Lab of School of Computing, Informatics and Decision Systems Engineering at Arizona State University. Furthermore, pre and post tests were given to assess the successfulness of the curriculum in teaching students the aforementioned concepts. 14 of the 18 students successfully completed the pre and post test.
Limitations of this study and suggestions for how to improve this curriculum in order to extend it into a year long course are also presented at the conclusion of this paper.
In this dissertation, I seek to advance both the knowledge of limitations in current technologies used in practice as well as the mechanisms that can be used for large-scale support. The overall research question I explore is: “How can we support large-scale creative collaboration in distributed online communities?” I first advance existing support techniques by evaluating the impact of active support in brainstorming performance. Furthermore, I leverage existing theoretical models of individual idea generation as well as recommender system techniques to design CrowdMuse, a novel adaptive large-scale idea generation system. CrowdMuse models users in order to adapt itself to each individual. I evaluate the system’s efficacy through two large-scale studies. I also advance knowledge of current large-scale practices by examining common communication channels under the lens of Creativity Support Tools, yielding a list of creativity bottlenecks brought about by the affordances of these channels. Finally, I connect both ends of this dissertation by deploying CrowdMuse in an Open Source online community for two weeks. I evaluate their usage of the system as well as its perceived benefits and issues compared to traditional communication tools.
This dissertation makes the following contributions to the field of large-scale creativity: 1) the design and evaluation of a first-of-its-kind adaptive brainstorming system; 2) the evaluation of the effects of active inspirations compared to simple idea exposure; 3) the development and application of a set of creativity support design heuristics to uncover creativity bottlenecks; and 4) an exploration of large-scale brainstorming systems’ usefulness to online communities.
Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed by classification of the associated text as consisting an ADR or not. Although this method works efficiently for ADR classifications, if ADR evidence is present in users posts over time, drug mentions fail to capture such ADRs. It also fails to record additional user information which may provide an opportunity to perform an in-depth analysis for lifestyle habits and possible reasons for any medical problems.
Pre-market clinical trials for drugs generally do not include pregnant women, and so their effects on pregnancy outcomes are not discovered early. This thesis presents a thorough, alternative strategy for assessing the safety profiles of drugs during pregnancy by utilizing user timelines from social media. I explore the use of a variety of state-of-the-art social media mining techniques, including rule-based and machine learning techniques, to identify pregnant women, monitor their drug usage patterns, categorize their birth outcomes, and attempt to discover associations between drugs and bad birth outcomes.
The technique used models user timelines as longitudinal patient networks, which provide us with a variety of key information about pregnancy, drug usage, and post-
birth reactions. I evaluate the distinct parts of the pipeline separately, validating the usefulness of each step. The approach to use user timelines in this fashion has produced very encouraging results, and can be employed for a range of other important tasks where users/patients are required to be followed over time to derive population-based measures.
This thesis develops a unique type of textual features that generalize triplets extracted from text, by clustering them into high-level concepts. These concepts are utilized as features to detect frames in text. Compared to uni-gram and bi-gram based models, classification and clustering using generalized concepts yield better discriminating features and a higher classification accuracy with a 12% boost (i.e. from 74% to 83% F-measure) and 0.91 clustering purity for Frame/Non-Frame detection.
The automatic discovery of complex causal chains among interlinked events and their participating actors has not yet been thoroughly studied. Previous studies related to extracting causal relationships from text were based on laborious and incomplete hand-developed lists of explicit causal verbs, such as “causes" and “results in." Such approaches result in limited recall because standard causal verbs may not generalize well to accommodate surface variations in texts when different keywords and phrases are used to express similar causal effects. Therefore, I present a system that utilizes generalized concepts to extract causal relationships. The proposed algorithms overcome surface variations in written expressions of causal relationships and discover the domino effects between climate events and human security. This semi-supervised approach alleviates the need for labor intensive keyword list development and annotated datasets. Experimental evaluations by domain experts achieve an average precision of 82%. Qualitative assessments of causal chains show that results are consistent with the 2014 IPCC report illuminating causal mechanisms underlying the linkages between climatic stresses and social instability.