The focus of my honors thesis is to find ways to use deep learning in tandem with tools in statistical mechanics to derive new ways to solve problems in biophysics. More specifically, I’ve been interested in finding transition pathways between two known states of a biomolecule. This is because understanding the mechanisms in which proteins fold and ligands bind is crucial to creating new medicines and understanding biological processes. In this thesis, I work with individuals in the Singharoy lab to develop a formulation to utilize reinforcement learning and sampling-based robotics planning to derive low free energy transition pathways between two known states. Our formulation uses Jarzynski’s equality and the stiff-spring approximation to obtain point estimates of energy, and construct an informed path search with atomistic resolution. At the core of this framework, is our first ever attempt we use a policy driven adaptive steered molecular dynamics (SMD) to control our molecular dynamics simulations. We show that both the reinforcement learning (RL) and robotics planning realization of the RL-guided framework can solve for pathways on toy analytical surfaces and alanine dipeptide.
only have a dreadful impact on the lives of coastal communities and businesses but also
have lasting and hazardous consequences. The United States coastal areas, especially
the Gulf of Mexico, have witnessed devastating oil spills of varied sizes and durations
that resulted in major economic and ecological losses. These disasters affected the oil,
housing, forestry, tourism, and fishing industries with overall costs exceeding billions
of dollars (Baade et al. (2007); Smith et al. (2011)). Extensive research has been
done with respect to oil spill simulation techniques, spatial optimization models, and
innovative strategies to deal with spill response and planning efforts. However, most
of the research done in those areas is done independently of each other, leaving a
conceptual void between them.
In the following work, this thesis presents a Spatial Decision Support System
(SDSS), which efficiently integrates the independent facets of spill modeling techniques
and spatial optimization to enable officials to investigate and explore the various
options to clean up an offshore oil spill to make a more informed decision. This
thesis utilizes Blowout and Spill Occurrence Model (BLOSOM) developed by Sim
et al. (2015) to simulate hypothetical oil spill scenarios, followed by the Oil Spill
Cleanup and Operational Model (OSCOM) developed by Grubesic et al. (2017) to
spatially optimize the response efforts. The results of this combination are visualized
in the SDSS, featuring geographical maps, so the boat ramps from which the response
should be launched can be easily identified along with the amount of oil that hits the
shore thereby visualizing the intensity of the impact of the spill in the coastal areas
for various cleanup targets.
When we query large amounts of data, it may lead to a lot of questions. For example, when we look at arithmetic relationships between queries in heterogeneous data, there are a lot of differences. How can we explain what factors account for these differences? If we define the observation as an arithmetic relationship between queries, this kind of problem can be solved by aggravation or intervention. Aggravation views the value of our observation for different set of tuples while intervention looks at the value of the observation after removing sets of tuples. We call the predicates which represent these tuples, explanations. Observations by themselves have limited importance. For example, if we observe a large number of taxi trips in a specific area, we might ask the question: Why are there so many trips here? Explanations attempt to answer these kinds of questions.
While aggravation and intervention are designed for non spatial data, we propose a new approach for explaining spatially heterogeneous data. Our approach expands on aggravation and intervention while using spatial partitioning/clustering to improve explanations for spatial data. Our proposed approach was evaluated against a real-world taxi dataset as well as a synthetic disease outbreak datasets. The approach was found to outperform aggravation in precision and recall while outperforming intervention in precision.
User-generated social media content provides an excellent opportunity to mine data of interest and to build resourceful applications. The rise in the number of healthcare-related social media platforms and the volume of healthcare knowledge available online in the last decade has resulted in increased social media usage for personal healthcare. In the United States, nearly ninety percent of adults, in the age group 50-75, have used social media to seek and share health information. Motivated by the growth of social media usage, this thesis focuses on healthcare-related applications, study various challenges posed by social media data, and address them through novel and effective machine learning algorithms.
The major challenges for effectively and efficiently mining social media data to build functional applications include: (1) Data reliability and acceptance: most social media data (especially in the context of healthcare-related social media) is not regulated and little has been studied on the benefits of healthcare-specific social media; (2) Data heterogeneity: social media data is generated by users with both demographic and geographic diversity; (3) Model transparency and trustworthiness: most existing machine learning models for addressing heterogeneity are considered as black box models, not many providing explanations for why they do what they do to trust them.
In response to these challenges, three main research directions have been investigated in this thesis: (1) Analyzing social media influence on healthcare: to study the real world impact of social media as a source to offer or seek support for patients with chronic health conditions; (2) Learning from task heterogeneity: to propose various models and algorithms that are adaptable to new social media platforms and robust to dynamic social media data, specifically on modeling user behaviors, identifying similar actors across platforms, and adapting black box models to a specific learning scenario; (3) Explaining heterogeneous models: to interpret predictive models in the presence of task heterogeneity. In this thesis, novel algorithms with theoretical analysis from various aspects (e.g., time complexity, convergence properties) have been proposed. The effectiveness and efficiency of the proposed algorithms is demonstrated by comparison with state-of-the-art methods and relevant case studies.
The goal of this research will be the completion of 3 primary objectives: (i) specify visualization design requirements relating to the FEW nexus; (ii) develop visualization approaches for the FEW nexus; and (iii) provide a comparison of current FEW visualization approaches against the proposed visualization approach. These objectives will be accomplished by reviewing graph-based visualization, network evolution, and visual analysis of volume data tasks, discussion with domain experts, examination of currently used visualization methods in FEW research, and conduction of a user study. This will provide a more thorough and representative depiction of the FEW nexus, as well as a basis for further research in the area of FEW visualization. This research will enhance collaboration between policymakers and domain experts in an attempt to encourage in-depth nexus research that will help support informed policy-making and promote future resource security.