Search Content

LEARNING FREE ENERGY PATHWAYS THROUGH DEEP LEARNING

Description

The focus of my honors thesis is to find ways to use deep learning in tandem with tools in statistical mechanics to derive new ways to solve problems in biophysics. More specifically, I’ve been interested in finding transition pathways between two known states of a biomolecule. This is because understanding…

The focus of my honors thesis is to find ways to use deep learning in tandem with tools in statistical mechanics to derive new ways to solve problems in biophysics. More specifically, I’ve been interested in finding transition pathways between two known states of a biomolecule. This is because understanding the mechanisms in which proteins fold and ligands bind is crucial to creating new medicines and understanding biological processes. In this thesis, I work with individuals in the Singharoy lab to develop a formulation to utilize reinforcement learning and sampling-based robotics planning to derive low free energy transition pathways between two known states. Our formulation uses Jarzynski’s equality and the stiff-spring approximation to obtain point estimates of energy, and construct an informed path search with atomistic resolution. At the core of this framework, is our first ever attempt we use a policy driven adaptive steered molecular dynamics (SMD) to control our molecular dynamics simulations. We show that both the reinforcement learning (RL) and robotics planning realization of the RL-guided framework can solve for pathways on toy analytical surfaces and alanine dipeptide.

ContributorsHo, Nicholas (Author) / Maciejewski, Ross (Thesis director) / Singharoy, Abhishek (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-12

A Visual Analytics Workflow for Detecting Transition Regions in Large Scale Molecular Dynamics Simulations

Description

Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time given a particular energy function. A software package called ParSplice introduced a new method to generate these simulations in parallel that has significantly inflated their length. Typically, simulations are short discrete Markov…

Molecular Dynamics (MD) simulations are ubiquitous throughout the physical sci-ences; they are critical in understanding how particle structures evolve over time given a particular energy function. A software package called ParSplice introduced a new method to generate these simulations in parallel that has significantly inflated their length. Typically, simulations are short discrete Markov chains, only captur- ing a few microseconds of a particle’s behavior and containing tens of thousands of transitions between states; in contrast, a typical ParSplice simulation can be as long as a few milliseconds, containing tens of millions of transitions. Naturally, sifting through data of this size is impossible by hand, and there are a number of visualiza- tion systems that provide comprehensive and intuitive analyses of particle structures throughout MD simulations. However, no visual analytics systems have been built that can manage the simulations that ParSplice produces. To analyze these large data-sets, I built a visual analytics system that provides multiple coordinated views that simultaneously describe the data temporally, within its structural context, and based on its properties. The system provides fluid and powerful user interactions regardless of the size of the data, allowing the user to drill down into the data-set to get detailed insights, as well as run and save various calculations, most notably the Nudged Elastic Band method. The system also allows the comparison of multiple trajectories, revealing more information about the general behavior of particles at different temperatures, energy states etc.

ContributorsHnatyshyn, Rostyslav (Author) / Maciejewski, Ross (Thesis advisor) / Bryan, Chris (Committee member) / Ahrens, James (Committee member) / Arizona State University (Publisher)

Created2022

Explaining the Vulnerabilities of Machine Learning through Visual Analytics

Description

Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of…

Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of model vulnerabilities. The complexity of machine learning models, along with the extensive data sets they analyze, can result in unpredictable and unintended outcomes. Model vulnerabilities may manifest due to errors in data input, algorithm design, or model deployment, which can have significant implications for both individuals and society. To prevent such negative outcomes, it is imperative to identify model vulnerabilities at an early stage in the development process. This will aid in guaranteeing the integrity, dependability, and safety of the models, thus mitigating potential risks and enabling the full potential of these technologies to be realized. However, enumerating vulnerabilities can be challenging due to the complexity of the real-world environment. Visual analytics, situated at the intersection of human-computer interaction, computer graphics, and artificial intelligence, offers a promising approach for achieving high interpretability of complex black-box models, thus reducing the cost of obtaining insights into potential vulnerabilities of models. This research is devoted to designing novel visual analytics methods to support the identification and analysis of model vulnerabilities. Specifically, generalizable visual analytics frameworks are instantiated to explore vulnerabilities in machine learning models concerning security (adversarial attacks and data perturbation) and fairness (algorithmic bias). In the end, a visual analytics approach is proposed to enable domain experts to explain and diagnose the model improvement of addressing identified vulnerabilities of machine learning models in a human-in-the-loop fashion. The proposed methods hold the potential to enhance the security and fairness of machine learning models deployed in critical real-world applications.

ContributorsXie, Tiankai (Author) / Maciejewski, Ross (Thesis advisor) / Liu, Huan (Committee member) / Bryan, Chris (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2023

Representation Learning for Trustworthy AI

Description

Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and…

Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and evaluate such models with respect to privacy, fairness, and robustness. Recent examination of DL models reveals that representations may include information that could lead to privacy violations, unfairness, and robustness issues. This results in AI systems that are potentially untrustworthy from a socio-technical standpoint. Trustworthiness in AI is defined by a set of model properties such as non-discriminatory bias, protection of users’ sensitive attributes, and lawful decision-making. The characteristics of trustworthy AI can be grouped into three categories: Reliability, Resiliency, and Responsibility. Past research has shown that the successful integration of an AI model depends on its trustworthiness. Thus it is crucial for organizations and researchers to build trustworthy AI systems to facilitate the seamless integration and adoption of intelligent technologies. The main issue with existing AI systems is that they are primarily trained to improve technical measures such as accuracy on a specific task but are not considerate of socio-technical measures. The aim of this dissertation is to propose methods for improving the trustworthiness of AI systems through representation learning. DL models’ representations contain information about a given input and can be used for tasks such as detecting fake news on social media or predicting the sentiment of a review. The findings of this dissertation significantly expand the scope of trustworthy AI research and establish a new paradigm for modifying data representations to balance between properties of trustworthy AI. Specifically, this research investigates multiple techniques such as reinforcement learning for understanding trustworthiness in users’ privacy, fairness, and robustness in classification tasks like cyberbullying detection and fake news detection. Since most social measures in trustworthy AI cannot be used to fine-tune or train an AI model directly, the main contribution of this dissertation lies in using reinforcement learning to alter an AI system’s behavior based on non-differentiable social measures.

ContributorsMosallanezhad, Ahmadreza (Author) / Liu, Huan (Thesis advisor) / Mancenido, Michelle (Thesis advisor) / Doupe, Adam (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2023

A Spatial Decision Support System for Oil Spill Response and Recovery

Description

Coastal areas are susceptible to man-made disasters, such as oil spills, which not

only have a dreadful impact on the lives of coastal communities and businesses but also

have lasting and hazardous consequences. The United States coastal areas, especially

the Gulf of Mexico, have witnessed devastating oil spills of varied sizes and durations

that…

Coastal areas are susceptible to man-made disasters, such as oil spills, which not

only have a dreadful impact on the lives of coastal communities and businesses but also

have lasting and hazardous consequences. The United States coastal areas, especially

the Gulf of Mexico, have witnessed devastating oil spills of varied sizes and durations

that resulted in major economic and ecological losses. These disasters affected the oil,

housing, forestry, tourism, and fishing industries with overall costs exceeding billions

of dollars (Baade et al. (2007); Smith et al. (2011)). Extensive research has been

done with respect to oil spill simulation techniques, spatial optimization models, and

innovative strategies to deal with spill response and planning efforts. However, most

of the research done in those areas is done independently of each other, leaving a

conceptual void between them.

In the following work, this thesis presents a Spatial Decision Support System

(SDSS), which efficiently integrates the independent facets of spill modeling techniques

and spatial optimization to enable officials to investigate and explore the various

options to clean up an offshore oil spill to make a more informed decision. This

thesis utilizes Blowout and Spill Occurrence Model (BLOSOM) developed by Sim

et al. (2015) to simulate hypothetical oil spill scenarios, followed by the Oil Spill

Cleanup and Operational Model (OSCOM) developed by Grubesic et al. (2017) to

spatially optimize the response efforts. The results of this combination are visualized

in the SDSS, featuring geographical maps, so the boat ramps from which the response

should be launched can be easily identified along with the amount of oil that hits the

shore thereby visualizing the intensity of the impact of the spill in the coastal areas

for various cleanup targets.

ContributorsPydi Medini, Prannoy Chandra (Author) / Maciejewski, Ross (Thesis advisor) / Grubesic, Anthony (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2018

The Perception of Graph Properties In Graph Layouts

Description

When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout properties, there is not much known about how these algorithms…

When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout properties, there is not much known about how these algorithms relate to a user’s ability to perceive graph properties for a given graph layout. This study applies previously established methodologies for perceptual analysis to identify which graph drawing layout will help the user best perceive a particular graph property. A large scale (n = 588) crowdsourced experiment is conducted to investigate whether the perception of two graph properties (graph density and average local clustering coefficient) can be modeled using Weber’s law. Three graph layout algorithms from three representative classes (Force Directed - FD, Circular, and Multi-Dimensional Scaling - MDS) are studied, and the results of this experiment establish the precision of judgment for these graph layouts and properties. The findings demonstrate that the perception of graph density can be modeled with Weber’s law. Furthermore, the perception of the average clustering coefficient can be modeled as an inverse of Weber’s law, and the MDS layout showed a significantly different precision of judgment than the FD layout.

ContributorsSoni, Utkarsh (Author) / Maciejewski, Ross (Thesis advisor) / Kobourov, Stephen (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2018

A Framework for Spatial Database Explanations

Description

In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was…

In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was announced by the United States administration in 2012 to address problems faced by the government. Various states and cities in the US gather spatial data about incidents like police calls for service.

When we query large amounts of data, it may lead to a lot of questions. For example, when we look at arithmetic relationships between queries in heterogeneous data, there are a lot of differences. How can we explain what factors account for these differences? If we define the observation as an arithmetic relationship between queries, this kind of problem can be solved by aggravation or intervention. Aggravation views the value of our observation for different set of tuples while intervention looks at the value of the observation after removing sets of tuples. We call the predicates which represent these tuples, explanations. Observations by themselves have limited importance. For example, if we observe a large number of taxi trips in a specific area, we might ask the question: Why are there so many trips here? Explanations attempt to answer these kinds of questions.

While aggravation and intervention are designed for non spatial data, we propose a new approach for explaining spatially heterogeneous data. Our approach expands on aggravation and intervention while using spatial partitioning/clustering to improve explanations for spatial data. Our proposed approach was evaluated against a real-world taxi dataset as well as a synthetic disease outbreak datasets. The approach was found to outperform aggravation in precision and recall while outperforming intervention in precision.

ContributorsTahir, Anique (Author) / Elsayed, Mohamed (Thesis advisor) / Hsiao, Ihan (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2018

Learning from task heterogeneity in social media

Description

In recent years, the rise in social media usage both vertically in terms of the number of users by platform and horizontally in terms of the number of platforms per user has led to data explosion.

User-generated social media content provides an excellent opportunity to mine data of interest and to…

In recent years, the rise in social media usage both vertically in terms of the number of users by platform and horizontally in terms of the number of platforms per user has led to data explosion.

User-generated social media content provides an excellent opportunity to mine data of interest and to build resourceful applications. The rise in the number of healthcare-related social media platforms and the volume of healthcare knowledge available online in the last decade has resulted in increased social media usage for personal healthcare. In the United States, nearly ninety percent of adults, in the age group 50-75, have used social media to seek and share health information. Motivated by the growth of social media usage, this thesis focuses on healthcare-related applications, study various challenges posed by social media data, and address them through novel and effective machine learning algorithms.

The major challenges for effectively and efficiently mining social media data to build functional applications include: (1) Data reliability and acceptance: most social media data (especially in the context of healthcare-related social media) is not regulated and little has been studied on the benefits of healthcare-specific social media; (2) Data heterogeneity: social media data is generated by users with both demographic and geographic diversity; (3) Model transparency and trustworthiness: most existing machine learning models for addressing heterogeneity are considered as black box models, not many providing explanations for why they do what they do to trust them.

In response to these challenges, three main research directions have been investigated in this thesis: (1) Analyzing social media influence on healthcare: to study the real world impact of social media as a source to offer or seek support for patients with chronic health conditions; (2) Learning from task heterogeneity: to propose various models and algorithms that are adaptable to new social media platforms and robust to dynamic social media data, specifically on modeling user behaviors, identifying similar actors across platforms, and adapting black box models to a specific learning scenario; (3) Explaining heterogeneous models: to interpret predictive models in the presence of task heterogeneity. In this thesis, novel algorithms with theoretical analysis from various aspects (e.g., time complexity, convergence properties) have been proposed. The effectiveness and efficiency of the proposed algorithms is demonstrated by comparison with state-of-the-art methods and relevant case studies.

ContributorsNelakurthi, Arun Reddy (Author) / He, Jingrui (Thesis advisor) / Cook, Curtiss B (Committee member) / Maciejewski, Ross (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2019

Optimization Model and Algorithm for the Design of Connected and Compact Conservation Reserves

Description

Conservation planning is fundamental to guarantee the survival of endangered species and to preserve the ecological values of some ecosystems. Planning land acquisitions increasingly requires a landscape approach to mitigate the negative impacts of spatial threats such as urbanization, agricultural development, and climate change. In this context, landscape connectivity and…

Conservation planning is fundamental to guarantee the survival of endangered species and to preserve the ecological values of some ecosystems. Planning land acquisitions increasingly requires a landscape approach to mitigate the negative impacts of spatial threats such as urbanization, agricultural development, and climate change. In this context, landscape connectivity and compactness are vital characteristics for the effective functionality of conservation reserves. Connectivity allows species to travel across landscapes, facilitating the flow of genes across populations from different protected areas. Compactness measures the spatial dispersion of protected sites, which can be used to mitigate risk factors associated with species leaving and re-entering the reserve. This research proposes an optimization model to identify areas to protect while enforcing connectivity and compactness. In the suggested projected area, this research builds upon existing methods and develops an alternative metric of compactness that penalizes the selection of patches of land with few protected neighbors. The new metric is referred as leaf because it intends to minimize the number of selected areas with 1 neighboring protected area. The model includes budget and minimum selected area constraints to reflect realistic financial and ecological requirements. Using a lexicographic approach, the model can improve the compactness of conservation reserves obtained by other methods. The use of the model is illustrated by solving instances of up to 1100 patches.

ContributorsRavishankar, Shreyas (Author) / Sefair, Jorge A (Thesis advisor) / Askin, Ronald (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2019

Visualizing Network Structures in the Food, Energy, and Water Nexus

Description

In recent years, the food, energy, and water (FEW) nexus has become a topic of considerable importance and has spurred research in many scientific and technical fields. This increased interest stems from the high level, and broad area, of impact that could occur in the long term if the interactions…

In recent years, the food, energy, and water (FEW) nexus has become a topic of considerable importance and has spurred research in many scientific and technical fields. This increased interest stems from the high level, and broad area, of impact that could occur in the long term if the interactions between these complex FEW sectors are incorrectly or only partially defined. For this reason, a significant amount of interdisciplinary collaboration is needed to accurately define these interactions and produce viable solutions to help sustain and secure resources within these sectors. Providing tools that effectively promote interdisciplinary collaboration would allow for the development of a better understanding of FEW nexus interactions, support FEW policy-making under uncertainty, facilitate identification of critical design requirements for FEW visualizations, and encourage proactive FEW visualization design.

The goal of this research will be the completion of 3 primary objectives: (i) specify visualization design requirements relating to the FEW nexus; (ii) develop visualization approaches for the FEW nexus; and (iii) provide a comparison of current FEW visualization approaches against the proposed visualization approach. These objectives will be accomplished by reviewing graph-based visualization, network evolution, and visual analysis of volume data tasks, discussion with domain experts, examination of currently used visualization methods in FEW research, and conduction of a user study. This will provide a more thorough and representative depiction of the FEW nexus, as well as a basis for further research in the area of FEW visualization. This research will enhance collaboration between policymakers and domain experts in an attempt to encourage in-depth nexus research that will help support informed policy-making and promote future resource security.

ContributorsMathis, Brandon (Author) / Maciejewski, Ross (Thesis advisor) / Mascaro, Giuseppe (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2019