Search Content

Statistical Sequence Alignment of Protein Coding Regions

Description

Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequence artifacts and errors made in alignment reconstruction can impact downstream analyses, leading to erroneous conclusions in comparative and functional genomic studies. While such errors are eventually…

Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequence artifacts and errors made in alignment reconstruction can impact downstream analyses, leading to erroneous conclusions in comparative and functional genomic studies. While such errors are eventually fixed in the reference genomes of model organisms, many genomes used by researchers contain these artifacts, often forcing researchers to discard large amounts of data to prevent artifacts from impacting results. I developed COATi, a statistical, codon-aware pairwise aligner designed to align protein-coding sequences in the presence of artifacts commonly introduced by sequencing or annotation errors, such as early stop codons and abiological frameshifts. Unlike common sequence aligners, which rely on amino acid translations, only model insertion and deletions between codons, or lack a statistical model, COATi combines a codon substitution model specifically designed for protein-coding regions, a complex insertion-deletion model, and a sequencing base calling error step. The alignment algorithm is based on finite state transducers (FSTs), computational machines well-suited for modeling sequence evolution. I show that COATi outperforms available methods using a simulated empirical pairwise alignment dataset as a benchmark. The FST-based model and alignment algorithm in COATi is resource-intense for sequences longer than a few kilobases. To address this constraint, I developed an approximate model compatible with traditional dynamic programming alignment algorithms. I describe how the original codon substitution model is transformed to build an approximate model and how the alignment algorithm is implemented by modifying the popular Gotoh algorithm. I simulated a benchmark of alignments and measured how well the marginal models approximate the original method. Finally, I present a novel tool for analyzing sequence alignments. Available metrics can measure the similarity between two alignments or the column uncertainty within an alignment but cannot produce a site-specific comparison of two or more alignments. AlnDotPlot is an R software package inspired by traditional dot plots that can provide valuable insights when comparing pairwise alignments. I describe AlnDotPlot and showcase its utility in displaying a single alignment, comparing different pairwise alignments, and summarizing alignment space.

ContributorsGarcia Mesa, Juan Jose (Author) / Cartwright, Reed A (Thesis advisor) / Taylor, Jesse (Committee member) / Pavlic, Theodore (Committee member) / Ozkan, Banu (Committee member) / Arizona State University (Publisher)

Created2023

Robotic Swarm Control using Deep Reinforcement Learning Strategies based on Mean-Field Models

Description

As technological advancements in silicon, sensors, and actuation continue, the development of robotic swarms is shifting from the domain of science fiction to reality. Many swarm applications, such as environmental monitoring, precision agriculture, disaster response, and lunar prospecting, will require controlling numerous robots with limited capabilities and information to redistribute…

As technological advancements in silicon, sensors, and actuation continue, the development of robotic swarms is shifting from the domain of science fiction to reality. Many swarm applications, such as environmental monitoring, precision agriculture, disaster response, and lunar prospecting, will require controlling numerous robots with limited capabilities and information to redistribute among multiple states, such as spatial locations or tasks. A scalable control approach is to program the robots with stochastic control policies such that the robot population in each state evolves according to a mean-field model, which is independent of the number and identities of the robots. Using this model, the control policies can be designed to stabilize the swarm to the target distribution. To avoid the need to reprogram the robots for different target distributions, the robot control policies can be defined to depend only on the presence of a “leader” agent, whose control policy is designed to guide the swarm to a particular distribution. This dissertation presents a novel deep reinforcement learning (deep RL) approach to designing control policies that redistribute a swarm as quickly as possible over a strongly connected graph, according to a mean-field model in the form of the discrete-time Kolmogorov forward equation. In the leader-based strategies, the leader determines its next action based on its observations of robot populations and shepherds the swarm over the graph by probabilistically repelling nearby robots. The scalability of this approach with the swarm size is demonstrated with leader control policies that are designed using two tabular Temporal-Difference learning algorithms, trained on a discretization of the swarm distribution. To improve the scalability of the approach with robot population and graph size, control policies for both leader-based and leaderless strategies are designed using an actor-critic deep RL method that is trained on the swarm distribution predicted by the mean-field model. In the leaderless strategy, the robots’ control policies depend only on their local measurements of nearby robot populations. The control approaches are validated for different graph and swarm sizes in numerical simulations, 3D robot simulations, and experiments on a multi-robot testbed.

ContributorsKakish, Zahi Mousa (Author) / Berman, Spring (Thesis advisor) / Yong, Sze Zheng (Committee member) / Marvi, Hamid (Committee member) / Pavlic, Theodore (Committee member) / Pratt, Stephen (Committee member) / Ben Amor, Hani (Committee member) / Arizona State University (Publisher)

Created2021

A Multi-Scale, Component-Based, Composable Cellular Automata Modeling and Simulation Framework

Description

The concept of multi-scale, heterogeneous modeling is well-known to be central in the complexities of natural and built systems. Therefore, whole models that have parts with different spatiotemporal scales are preferred to those specified using a monolithic modeling approach and tightly integrated. To build simulation frameworks that are expressive and…

The concept of multi-scale, heterogeneous modeling is well-known to be central in the complexities of natural and built systems. Therefore, whole models that have parts with different spatiotemporal scales are preferred to those specified using a monolithic modeling approach and tightly integrated. To build simulation frameworks that are expressive and flexible, model composability is crucial where a whole model's structure and behavior traits must be concisely specified according to those of its parts and their interactions. To undertake the spatiotemporal model composability, a breast cancer cells chemotaxis exemplar is used. In breast cancer biology, the receptors CXCR4+ and CXCR7+ and the secreting CXCL12+ cells are implicated in spreading normal and malignant cells. As discrete entities, these can be modeled using Agent-Based Modeling (ABM). The receptors and ligand bindings with chemokine diffusion regulate the cells' movement gradient. These continuous processes can be modeled as Ordinary Differential Equations (ODE) and Partial Differential Equations (PDE). A customized, text-based BrSimulator exists to model and simulate this kind of breast cancer phenomenon. To build a multi-scale, spatiotemporal simulation framework supporting model composability, this research proposes using composable cellular automata (CCA) modeling. Toward this goal, the Cellular Automata DEVS (CA-DEVS) model is used, and the novel Composable Cellular Automata DEVS (CCA-DEVS) modeling is proposed. The DEVS-Suite simulator is extended to support CA and CCA Parallel DEVS models. This simulator introduces new capabilities for controlled and modular run-time animation and superdense time trajectory visualization. Furthermore, this research proposes using the Knowledge Interchange Broker (KIB) approach to model and simulate the interactions between separate geo-referenced CCA models developed using the DEVS and Modelica modeling languages. To demonstrate the proposed model composability approach and its use in the extended DEVS-Suite simulator, the breast cancer cells chemotaxis and others have been studied. The BrSimulator is used as a proxy for evaluating the proposed model composability approach using an integrated DEVS-Suite and OpenModelica simulator. Simulation experiments are developed that show the composition of spatiotemporal ABM, ODE, and PDE models reproduce the behaviors of the same model developed in the BrSimulator.

ContributorsZhang, Chao (Author) / Sarjoughian, Hessam S (Thesis advisor) / Crook, Sharon (Committee member) / Collofello, James (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)

Created2021

Collaborating in Motion: Distributed and Stochastic Algorithms for Emergent Behavior in Programmable Matter

Description

The world is filled with systems of entities that collaborate in motion, both natural and engineered. These cooperative distributed systems are capable of sophisticated emergent behavior arising from the comparatively simple interactions of their members. A model system for emergent collective behavior is programmable matter, a physical substance capable of…

The world is filled with systems of entities that collaborate in motion, both natural and engineered. These cooperative distributed systems are capable of sophisticated emergent behavior arising from the comparatively simple interactions of their members. A model system for emergent collective behavior is programmable matter, a physical substance capable of autonomously changing its properties in response to user input or environmental stimuli. This dissertation studies distributed and stochastic algorithms that control the local behaviors of individual modules of programmable matter to induce complex collective behavior at the macroscale. It consists of four parts. In the first, the canonical amoebot model of programmable matter is proposed. A key goal of this model is to bring algorithmic theory closer to the physical realities of programmable matter hardware, especially with respect to concurrency and energy distribution. Two protocols are presented that together extend sequential, energy-agnostic algorithms to the more realistic concurrent, energy-constrained setting without sacrificing correctness, assuming the original algorithms satisfy certain conventions. In the second part, stateful distributed algorithms using amoebot memory and communication are presented for leader election, object coating, convex hull formation, and hexagon formation. The first three algorithms are proven to have linear runtimes when assuming a simplified sequential setting. The final algorithm for hexagon formation is instead proven to be correct under unfair asynchronous adversarial activation, the most general of all adversarial activation models. In the third part, distributed algorithms are combined with ideas from statistical physics and Markov chain design to replace algorithm reliance on memory and communication with biased random decisions, gaining inherent self-stabilizing and fault-tolerant properties. Using this stochastic approach, algorithms for compression, shortcut bridging, and separation are designed and analyzed. Finally, a two-pronged approach to "programming" physical ensembles is presented. This approach leverages the physics of local interactions to pair theoretical abstractions of self-organizing particle systems with experimental robot systems of active granular matter that intentionally lack digital computation and communication. By physically embodying the salient features of an algorithm in robot design, the algorithm's theoretical analysis can predict the robot ensemble's behavior. This approach is applied to phototaxing, aggregation, dispersion, and object transport.

ContributorsDaymude, Joshua (Author) / Richa, Andréa W (Thesis advisor) / Scheideler, Christian (Committee member) / Randall, Dana (Committee member) / Pavlic, Theodore (Committee member) / Gil, Stephanie (Committee member) / Arizona State University (Publisher)

Created2021

Deep Learning Approaches for Inferring Collective Macrostates from Individual Observations in Natural and Artificial Multi-Agent Systems Under Realistic Constraints

Description

A complex social system, whether artificial or natural, can possess its macroscopic properties as a collective, which may change in real time as a result of local behavioral interactions among a number of agents in it. If a reliable indicator is available to abstract the macrolevel states, decision makers could…

A complex social system, whether artificial or natural, can possess its macroscopic properties as a collective, which may change in real time as a result of local behavioral interactions among a number of agents in it. If a reliable indicator is available to abstract the macrolevel states, decision makers could use it to take a proactive action, whenever needed, in order for the entire system to avoid unacceptable states or con-verge to desired ones. In realistic scenarios, however, there can be many challenges in learning a model of dynamic global states from interactions of agents, such as 1) high complexity of the system itself, 2) absence of holistic perception, 3) variability of group size, 4) biased observations on state space, and 5) identification of salient behavioral cues. In this dissertation, I introduce useful applications of macrostate estimation in complex multi-agent systems and explore effective deep learning frameworks to ad-dress the inherited challenges. First of all, Remote Teammate Localization (ReTLo)is developed in multi-robot teams, in which an individual robot can use its local interactions with a nearby robot as an information channel to estimate the holistic view of the group. Within the problem, I will show (a) learning a model of a modular team can generalize to all others to gain the global awareness of the team of variable sizes, and (b) active interactions are necessary to diversify training data and speed up the overall learning process. The complexity of the next focal system escalates to a colony of over 50 individual ants undergoing 18-day social stabilization since a chaotic event. I will utilize this natural platform to demonstrate, in contrast to (b), (c)monotonic samples only from “before chaos” can be sufficient to model the panicked society, and (d) the model can also be used to discover salient behaviors to precisely predict macrostates.

ContributorsChoi, Taeyeong (Author) / Pavlic, Theodore (Thesis advisor) / Richa, Andrea (Committee member) / Ben Amor, Heni (Committee member) / Yang, Yezhou (Committee member) / Liebig, Juergen (Committee member) / Arizona State University (Publisher)

Created2020

Optimization Models and Algorithms for Wildlife Corridor and Reserve Design in Conservation Planning

Description

Biodiversity has been declining during the last decades due to habitat loss, landscape deterioration, environmental change, and human-related activities. In addition to its economic and cultural value, biodiversity plays an important role in keeping an environment’s ecosystem in balance. Disrupting such processes can reduce the provision of natural resources such…

Biodiversity has been declining during the last decades due to habitat loss, landscape deterioration, environmental change, and human-related activities. In addition to its economic and cultural value, biodiversity plays an important role in keeping an environment’s ecosystem in balance. Disrupting such processes can reduce the provision of natural resources such as food and water, which in turn yields a direct threat to human health. Protecting and restoring natural areas is fundamental to preserve biodiversity and to mitigate the effects of ongoing environmental change. Unfortunately, it is impossible to protect every critical area due to resource limitations, requiring the use of advanced decision tools for the design of conservation plans. This dissertation studies three problems on the design of wildlife corridors and reserves that include patch-specific conservation decisions under spatial, operational, ecological, and biological requirements. In addition to the ecological impact of each problem’s solution, this dissertation contributes a set of formulations, valid inequalities, and pre-processing and solution algorithms for optimization problems with spatial requirements. The first problem is a utility-based corridor design problem to connect fragmented habitats, where each patch has a utility value reflecting its quality. The corridor must satisfy geometry requirements such as a connectivity and minimum width. We propose a mix-integer programming (MIP) model to maximize the total utility of the corridor under the given geometry requirements as well as a budget constraint to reflect the acquisition (or restoration) cost of the selected patches. To overcome the computational difficulty when solving large-scale instances, we develop multiple acceleration techniques, including a brand-and-cut algorithm enhanced with problem-specific valid inequalities and a bound-improving heuristic triggered at each integer node in the branch-and-bound exploration. We test the proposed model and solution algorithm using large-scale fabricated instances and a real case study for the design of an ecological corridor for the Florida Panther. Our modeling framework is able to solve instances of up to 1500 patches within 2 hours to optimality or with a small optimality gap. The second problem introduces the species movement across the fragmented landscape into the corridor design problem. The premise is that dispersal dynamics, if available, must inform the design to account for the corridor’s usage by the species. To this end, we propose a spatial discrete-time absorbing Markov chain (DTMC) approach to represent species dispersal and develop short- and long-term landscape usage metrics. We explore two different types of design problems: open and closed corridors. An open corridor is a sequence of landscape patches used by the species to disperse out of a habitat. For this case, we devise a dynamic programming algorithm that implicitly enumerates possible corridors and finds that of maximum probability. The second problem is to find a closed corridor of maximum probability that connects two fragmented habitats. To solve this problem variant, we extended the framework from the utility-based corridor design problem by blending the recursive Markov chain equations with a network flow nonlinear formulation. The third problem leverages on the DTMC approach to explore a reserve design problem with spatial requirements like connectivity and compactness. We approximate the compactness using the concept of maximum reserve diameter, i.e., the largest distance allowed between two patch in the reserve. To solve this problem, we devise a two-stage approach that balances the trade-off between reserve usage probability and compactness. The first stage's problem is to detect a subset of patches of maximum usage probability, while the second stage's problem imposes the geometry requirements on the optimal solution obtained from the first stage. To overcome the computational difficulty of large-scale landscapes, we develop tailored solution algorithms, including a warm-up heuristic to initialize the branch-and-bound exploration, problem-specific valid inequalities, and a decomposition strategy that sequentially solves smaller problems on landscape partitions.

ContributorsWang, Chao (Author) / Sefair, Jorge A. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pavlic, Theodore (Committee member) / Tong, Daoqin (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by

Statistical Sequence Alignment of Protein Coding Regions

Robotic Swarm Control using Deep Reinforcement Learning Strategies based on Mean-Field Models

A Multi-Scale, Component-Based, Composable Cellular Automata Modeling and Simulation Framework

Collaborating in Motion: Distributed and Stochastic Algorithms for Emergent Behavior in Programmable Matter

Deep Learning Approaches for Inferring Collective Macrostates from Individual Observations in Natural and Artificial Multi-Agent Systems Under Realistic Constraints

Optimization Models and Algorithms for Wildlife Corridor and Reserve Design in Conservation Planning