Data visualization is essential for communicating complex information to diverse audiences. However, a gap persists between visualization design objectives and the understanding of non-expert users, with limited experience. This dissertation addresses challenges in designing for non-experts, referred to as the…
Data visualization is essential for communicating complex information to diverse audiences. However, a gap persists between visualization design objectives and the understanding of non-expert users, with limited experience. This dissertation addresses challenges in designing for non-experts, referred to as the D.U.C.K. bridge: (i) user unfamiliarity with DATA analysis domains, (ii) variation in user UNDERSTANDING mechanisms, (iii) catering to individual differences in CREATING visualizations, and (iv) promoting KNOWLEDGE synthesis and application. By developing human-driven principles and tools, this research aims to enhance visualization creation and consumption by non-experts. Leveraging linked interactive visualizations, this dissertation explores the iterative education of non-experts when navigating unfamiliar DATA realms. VAIDA guides crowd workers in creating better NLP benchmarks through real-time visual feedback. Similarly, LeaderVis allows users to interactively customize AI leaderboards and select model configurations suited to their application. Both systems demonstrate how visual analytics can flatten the learning curve associated with complex data and technologies. Next, this dissertation examines how individuals internalize real-world visualizations—either as images or information. Experimental studies investigate the impact of design elements on perception across visualization types and styles, and an LSTM model predicts the framing of the recall process. The findings reveal mechanisms that shape the UNDERSTANDING of visualizations, enabling the design of tailored approaches to improve recall and comprehension among non-experts. This research also investigates how known design principles apply to CREATING visualizations for underrepresented populations. Findings reveal that multilingual individuals prefer varying text volumes based on annotation language, and older age groups engage more emotionally with affective visualizations than younger age groups. Additionally, underlying cognitive processes, like mind wandering, affect recall focus. These insights guide the development of more inclusive visualization solutions for diverse user demographics. This dissertation concludes by presenting projects aimed at preserving cognitive and affective KNOWLEDGE synthesized through visual analysis. The first project examines the impact of data visualizations in VR on personal viewpoints about climate change, offering insights for using VR in public scientific education. The second project introduces LINGO, which enables the creation of diverse natural language prompts for generative models across multiple languages, potentially facilitating custom visualization creation via streamlined prompting.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and…
Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of model vulnerabilities. The complexity of machine learning models, along with the extensive data sets they analyze, can result in unpredictable and unintended outcomes. Model vulnerabilities may manifest due to errors in data input, algorithm design, or model deployment, which can have significant implications for both individuals and society. To prevent such negative outcomes, it is imperative to identify model vulnerabilities at an early stage in the development process. This will aid in guaranteeing the integrity, dependability, and safety of the models, thus mitigating potential risks and enabling the full potential of these technologies to be realized. However, enumerating vulnerabilities can be challenging due to the complexity of the real-world environment. Visual analytics, situated at the intersection of human-computer interaction, computer graphics, and artificial intelligence, offers a promising approach for achieving high interpretability of complex black-box models, thus reducing the cost of obtaining insights into potential vulnerabilities of models. This research is devoted to designing novel visual analytics methods to support the identification and analysis of model vulnerabilities. Specifically, generalizable visual analytics frameworks are instantiated to explore vulnerabilities in machine learning models concerning security (adversarial attacks and data perturbation) and fairness (algorithmic bias). In the end, a visual analytics approach is proposed to enable domain experts to explain and diagnose the model improvement of addressing identified vulnerabilities of machine learning models in a human-in-the-loop fashion. The proposed methods hold the potential to enhance the security and fairness of machine learning models deployed in critical real-world applications.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL)…
Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and evaluate such models with respect to privacy, fairness, and robustness. Recent examination of DL models reveals that representations may include information that could lead to privacy violations, unfairness, and robustness issues. This results in AI systems that are potentially untrustworthy from a socio-technical standpoint. Trustworthiness in AI is defined by a set of model properties such as non-discriminatory bias, protection of users’ sensitive attributes, and lawful decision-making. The characteristics of trustworthy AI can be grouped into three categories: Reliability, Resiliency, and Responsibility. Past research has shown that the successful integration of an AI model depends on its trustworthiness. Thus it is crucial for organizations and researchers to build trustworthy AI systems to facilitate the seamless integration and adoption of intelligent technologies. The main issue with existing AI systems is that they are primarily trained to improve technical measures such as accuracy on a specific task but are not considerate of socio-technical measures. The aim of this dissertation is to propose methods for improving the trustworthiness of AI systems through representation learning. DL models’ representations contain information about a given input and can be used for tasks such as detecting fake news on social media or predicting the sentiment of a review. The findings of this dissertation significantly expand the scope of trustworthy AI research and establish a new paradigm for modifying data representations to balance between properties of trustworthy AI. Specifically, this research investigates multiple techniques such as reinforcement learning for understanding trustworthiness in users’ privacy, fairness, and robustness in classification tasks like cyberbullying detection and fake news detection. Since most social measures in trustworthy AI cannot be used to fine-tune or train an AI model directly, the main contribution of this dissertation lies in using reinforcement learning to alter an AI system’s behavior based on non-differentiable social measures.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time
given a particular energy function. A software package called ParSplice introduced a
new method to generate these simulations in parallel that…
Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time
given a particular energy function. A software package called ParSplice introduced a
new method to generate these simulations in parallel that has significantly inflated
their length. Typically, simulations are short discrete Markov chains, only captur-
ing a few microseconds of a particle’s behavior and containing tens of thousands of
transitions between states; in contrast, a typical ParSplice simulation can be as long
as a few milliseconds, containing tens of millions of transitions. Naturally, sifting
through data of this size is impossible by hand, and there are a number of visualiza-
tion systems that provide comprehensive and intuitive analyses of particle structures
throughout MD simulations. However, no visual analytics systems have been built
that can manage the simulations that ParSplice produces. To analyze these large
data-sets, I built a visual analytics system that provides multiple coordinated views
that simultaneously describe the data temporally, within its structural context, and
based on its properties. The system provides fluid and powerful user interactions
regardless of the size of the data, allowing the user to drill down into the data-set to
get detailed insights, as well as run and save various calculations, most notably the
Nudged Elastic Band method. The system also allows the comparison of multiple
trajectories, revealing more information about the general behavior of particles at different temperatures, energy states etc.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
The focus of my honors thesis is to find ways to use deep learning in tandem with tools in statistical mechanics to derive new ways to solve problems in biophysics. More specifically, I’ve been interested in finding transition pathways between…
The focus of my honors thesis is to find ways to use deep learning in tandem with tools in statistical mechanics to derive new ways to solve problems in biophysics. More specifically, I’ve been interested in finding transition pathways between two known states of a biomolecule. This is because understanding the mechanisms in which proteins fold and ligands bind is crucial to creating new medicines and understanding biological processes. In this thesis, I work with individuals in the Singharoy lab to develop a formulation to utilize reinforcement learning and sampling-based robotics planning to derive low free energy transition pathways between two known states. Our formulation uses Jarzynski’s equality and the stiff-spring approximation to obtain point estimates of energy, and construct an informed path search with atomistic resolution. At the core of this framework, is our first ever attempt we use a policy driven adaptive steered molecular dynamics (SMD) to control our molecular dynamics simulations. We show that both the reinforcement learning (RL) and robotics planning realization of the RL-guided framework can solve for pathways on toy analytical surfaces and alanine dipeptide.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Education has been at the forefront of many issues in Arizona over the past several years with concerns over lack of funding sparking the Red for Ed movement. However, despite the push for educational change, there remain many barriers to…
Education has been at the forefront of many issues in Arizona over the past several years with concerns over lack of funding sparking the Red for Ed movement. However, despite the push for educational change, there remain many barriers to education including a lack of visibility for how Arizona schools are performing at a legislative district level. While there are sources of information released at a school district level, many of these are limited and can become obscure to legislators when such school districts lie on the boundary between 2 different legislative districts. Moreover, much of this information is in the form of raw spreadsheets and is often fragmented between government websites and educational organizations. As such, a visualization dashboard that clearly identifies schools and their relative performance within each legislative district would be an extremely valuable tool to legislative bodies and the Arizona public. Although this dashboard and research are rough drafts of a larger concept, they would ideally increase transparency regarding public information about these districts and allow legislators to utilize the dashboard as a tool for greater understanding and more effective policymaking.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Short-notice disasters such as hurricanes involve uncertainties in many facets, from the time of its occurrence to its impacts’ magnitude. Failure to incorporate these uncertainties can affect the effectiveness of the emergency responses. In the case of a hurricane event,…
Short-notice disasters such as hurricanes involve uncertainties in many facets, from the time of its occurrence to its impacts’ magnitude. Failure to incorporate these uncertainties can affect the effectiveness of the emergency responses. In the case of a hurricane event, uncertainties and corresponding impacts during a storm event can quickly cascade. Over the past decades, various storm forecast models have been developed to predict the storm uncertainties; however, access to the usage of these models is limited. Hence, as the first part of this research, a data-driven simulation model is developed with aim to generate spatial-temporal storm predicted hazards for each possible hurricane track modeled. The simulation model identifies a means to represent uncertainty in storm’s movement and its associated potential hazards in the form of probabilistic scenarios tree where each branch is associated with scenario-level storm track and weather profile. Storm hazards, such as strong winds, torrential rain, and storm surges, can inflict significant damage on the road network and affect the population’s ability to move during the storm event. A cascading network failure algorithm is introduced in the second part of the research. The algorithm takes the scenario-level storm hazards to predict uncertainties in mobility states over the storm event. In the third part of the research, a methodology is proposed to generate a sequence of actions that simultaneously solve the evacuation flow scheduling and suggested routes which minimize the total flow time, or the makespan, for the evacuation process from origins to destinations in the resulting stochastic time-dependent network. The methodology is implemented for the 2017 Hurricane Irma case study to recommend an evacuation policy for Manatee County, FL. The results are compared with evacuation plans for assumed scenarios; the research suggests that evacuation recommendations that are based on single scenarios reduce the effectiveness of the evacuation procedure. The overall contributions of the research presented here are new methodologies to: (1) predict and visualize the spatial-temporal impacts of an oncoming storm event, (2) predict uncertainties in the impacts to transportation infrastructure and mobility, and (3) determine the quickest evacuation schedule and routes under the uncertainties within the resulting stochastic transportation networks.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Social media has become an important means of user-centered information sharing and communications in a gamut of domains, including news consumption, entertainment, marketing, public relations, and many more. The low cost, easy access, and rapid dissemination of information on social…
Social media has become an important means of user-centered information sharing and communications in a gamut of domains, including news consumption, entertainment, marketing, public relations, and many more. The low cost, easy access, and rapid dissemination of information on social media draws a large audience but also exacerbate the wide propagation of disinformation including fake news, i.e., news with intentionally false information. Disinformation on social media is growing fast in volume and can have detrimental societal effects. Despite the importance of this problem, our understanding of disinformation in social media is still limited. Recent advancements of computational approaches on detecting disinformation and fake news have shown some early promising results. Novel challenges are still abundant due to its complexity, diversity, dynamics, multi-modality, and costs of fact-checking or annotation.
Social media data opens the door to interdisciplinary research and allows one to collectively study large-scale human behaviors otherwise impossible. For example, user engagements over information such as news articles, including posting about, commenting on, or recommending the news on social media, contain abundant rich information. Since social media data is big, incomplete, noisy, unstructured, with abundant social relations, solely relying on user engagements can be sensitive to noisy user feedback. To alleviate the problem of limited labeled data, it is important to combine contents and this new (but weak) type of information as supervision signals, i.e., weak social supervision, to advance fake news detection.
The goal of this dissertation is to understand disinformation by proposing and exploiting weak social supervision for learning with little labeled data and effectively detect disinformation via innovative research and novel computational methods. In particular, I investigate learning with weak social supervision for understanding disinformation with the following computational tasks: bringing the heterogeneous social context as auxiliary information for effective fake news detection; discovering explanations of fake news from social media for explainable fake news detection; modeling multi-source of weak social supervision for early fake news detection; and transferring knowledge across domains with adversarial machine learning for cross-domain fake news detection. The findings of the dissertation significantly expand the boundaries of disinformation research and establish a novel paradigm of learning with weak social supervision that has important implications in broad applications in social media.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Collecting accurate collective decisions via crowdsourcing is challenging due to cognitive biases, varying worker expertise, and varying subjective scales. This work investigates new ways to determine collective decisions by prompting users to provide input in multiple formats. A crowdsourced task is created that aims to determine…
Collecting accurate collective decisions via crowdsourcing is challenging due to cognitive biases, varying worker expertise, and varying subjective scales. This work investigates new ways to determine collective decisions by prompting users to provide input in multiple formats. A crowdsourced task is created that aims to determine ground-truth by collecting information in two different ways: rankings and numerical estimates. Results indicate that accurate collective decisions can be achieved with less people when ordinal and cardinal information is collected and aggregated together using consensus-based, multimodal models. We also show that presenting users with larger problems produces more valuable ordinal information, and is a more efficient way to collect an aggregate ranking. As a result, we suggest input-elicitation to be more widely considered for future work in crowdsourcing and incorporated into future platforms to improve accuracy and efficiency.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Component simulation models, such as agent-based models, may depend on spatial data associated with geographic locations. Composition of such models can be achieved using a Geographic Knowledge Interchange Broker (GeoKIB) enabled with spatial-temporal data transformation functions, each of which is…
Component simulation models, such as agent-based models, may depend on spatial data associated with geographic locations. Composition of such models can be achieved using a Geographic Knowledge Interchange Broker (GeoKIB) enabled with spatial-temporal data transformation functions, each of which is responsible for a set of interactions between two independent models. The use of autonomous interaction models allows model composition without alteration of the composed component models. An interaction model must handle differences in the spatial resolutions between models, in addition to differences in their temporal input/output data types and resolutions.
A generalized GeoKIB was designed that regulates unidirectional spatially-based interactions between composed models. Different input and output data types are used for the interaction model, depending on whether data transfer should be passive or active. Synchronization of time-tagged input/output values is made possible with the use of dependency on a discrete simulation clock. An algorithm supporting spatial conversion is developed to transform any two-dimensional geographic data map between different region specifications. Maps belonging to the composed models can have different regions, map cell sizes, or boundaries. The GeoKIB can be extended based on the model specifications to be composed and the target application domain.
Two separate, simplistic models were created to demonstrate model composition via the GeoKIB. An interaction model was created for each of the two directions the composed models interact. This exemplar is developed to demonstrate composition and simulation of geographic-based component models.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)