Search Content

Environmental Learning for Robot Path-Planning Via Pareto Evolution

Description

Robots are often used in long-duration scenarios, such as on the surface of Mars,where they may need to adapt to environmental changes. Typically, robots have been built specifically for single tasks, such as moving boxes in a warehouse…

Robots are often used in long-duration scenarios, such as on the surface of Mars,where they may need to adapt to environmental changes. Typically, robots have been built specifically for single tasks, such as moving boxes in a warehouse or surveying construction sites. However, there is a modern trend away from human hand-engineering and toward robot learning. To this end, the ideal robot is not engineered,but automatically designed for a specific task. This thesis focuses on robots which learn path-planning algorithms for specific environments. Learning is accomplished via genetic programming. Path-planners are represented as Python code, which is optimized via Pareto evolution. These planners are encouraged to explore curiously and efficiently. This research asks the questions: “How can robots exhibit life-long learning where they adapt to changing environments in a robust way?”, and “How can robots learn to be curious?”.

ContributorsSaldyt, Lucas P (Author) / Ben Amor, Heni (Thesis director) / Pavlic, Theodore (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

The Individual Contribution to Cooperative Transport in the ant Novomessor albisetosus

Description

The desert ant, Novomessor albisetosus, is an ideal model system for studying collective transport in ants and self-organized cooperation in natural systems. Small teams collect and stabilize around objects encountered by these colonies in the field, and the teams carry them in straight paths at a regulated velocity back to…

The desert ant, Novomessor albisetosus, is an ideal model system for studying collective transport in ants and self-organized cooperation in natural systems. Small teams collect and stabilize around objects encountered by these colonies in the field, and the teams carry them in straight paths at a regulated velocity back to nearby nest entrances. The puzzling finding that teams are slower than individuals contrasts other cases of cooperative transport in ants. The statistical distribution of speeds has been found to be consistent with the slowest-ant model, but the key assumption that individual ants consistently vary in speed has not been tested. To test this, information is needed about the natural distribution of individual ant speeds in colonies and whether some ants are intrinsically slow or fast. To investigate the natural, individual-level variation in ants carrying loads, data were collected on single workers carrying fig seeds in arenas separated from other workers. Using three separate, small arenas, the instantaneous speed of each seed-laden worker was recorded when she picked up a fig seed and transported within the arena. Instantaneous speeds were measured by dividing the distance traveled in each frame by how much time had passed.
There were nine ants who transported a fig seed numerous times and there was a clear variation in their average instantaneous speed. Within an ant, slightly varying speeds were found as well, but within-ant speeds were not as varied as speed across ants. These results support the conclusion that there is intrinsic variation in the speed of an individual which supports the slowest-ant model, but this may require further experimentation to test thoroughly. This information aids in the understanding of the natural variation of ants cooperatively carrying larger loads in groups.

ContributorsCastro, Samantha (Author) / Pavlic, Theodore (Thesis director) / Pratt, Stephen (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2020-12

Design of a Graph Neural Network Coupled with an Advantage Actor-Critic Reinforcement Learning Algorithm for Multi-Agent Navigation

Description

A Graph Neural Network (GNN) is a type of neural network architecture that operates on data consisting of objects and their relationships, which are represented by a graph. Within the graph, nodes represent objects and edges represent associations between those objects. The representation of relationships and correlations between data is…

A Graph Neural Network (GNN) is a type of neural network architecture that operates on data consisting of objects and their relationships, which are represented by a graph. Within the graph, nodes represent objects and edges represent associations between those objects. The representation of relationships and correlations between data is unique to graph structures. GNNs exploit this feature of graphs by augmenting both forms of data, individual and relational, and have been designed to allow for communication and sharing of data within each neural network layer. These benefits allow each node to have an enriched perspective, or a better understanding, of its neighbouring nodes and its connections to those nodes. The ability of GNNs to efficiently process high-dimensional node data and multi-faceted relationships among nodes gives them advantages over neural network architectures such as Convolutional Neural Networks (CNNs) that do not implicitly handle relational data. These quintessential characteristics of GNN models make them suitable for solving problems in which the correspondences among input data are needed to produce an accurate and precise representation of these data. GNN frameworks may significantly improve existing communication and control techniques for multi-agent tasks by implicitly representing not only information associated with the individual agents, such as agent position, velocity, and camera data, but also their relationships with one another, such as distances between the agents and their ability to communicate with one another. One such task is a multi-agent navigation problem in which the agents must coordinate with one another in a decentralized manner, using proximity sensors only, to navigate safely to their intended goal positions in the environment without collisions or deadlocks. The contribution of this thesis is the design of an end-to-end decentralized control scheme for multi-agent navigation that utilizes GNNs to prevent inter-agent collisions and deadlocks. The contributions consist of the development, simulation and evaluation of the performance of an advantage actor-critic (A2C) reinforcement learning algorithm that employs actor and critic networks for training that simultaneously approximate the policy function and value function, respectively. These networks are implemented using GNN frameworks for navigation by groups of 3, 5, 10 and 15 agents in simulated two-dimensional environments. It is observed that in $40\%$ to $50\%$ of the simulation trials, between 70$\%$ to 80$\%$ of the agents reach their goal positions without colliding with other agents or becoming trapped in deadlocks. The model is also compared to a random run simulation, where actions are chosen randomly for the agents and observe that the model performs notably well for smaller groups of agents.

ContributorsAyalasomayajula, Manaswini (Author) / Berman, Spring (Thesis advisor) / Mian, Sami (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)

Created2022

Combining learning with knowledge-rich planning allows for efficient multi-agent solutions to the problem of perpetual sparse rewards

Description

This work has improved the quality of the solution to the sparse rewards problemby combining reinforcement learning (RL) with knowledge-rich planning. Classical methods for coping with sparse rewards during reinforcement learning modify the reward landscape so as to better guide the learner. In contrast, this work combines RL with a planner in order…

This work has improved the quality of the solution to the sparse rewards problemby combining reinforcement learning (RL) with knowledge-rich planning. Classical methods for coping with sparse rewards during reinforcement learning modify the reward landscape so as to better guide the learner. In contrast, this work combines RL with a planner in order to utilize other information about the environment. As the scope for representing environmental information is limited in RL, this work has conflated a model-free learning algorithm – temporal difference (TD) learning – with a Hierarchical Task Network (HTN) planner to accommodate rich environmental information in the algorithm. In the perpetual sparse rewards problem, rewards reemerge after being collected within a fixed interval of time, culminating in a lack of a well-defined goal state as an exit condition to the problem. Incorporating planning in the learning algorithm not only improves the quality of the solution, but the algorithm also avoids the ambiguity of incorporating a goal of maximizing profit while using only a planning algorithm to solve this problem. Upon occasionally using the HTN planner, this algorithm provides the necessary tweak toward the optimal solution. In this work, I have demonstrated an on-policy algorithm that has improved the quality of the solution over vanilla reinforcement learning. The objective of this work has been to observe the capacity of the synthesized algorithm in finding optimal policies to maximize rewards, awareness of the environment, and the awareness of the presence of other agents in the vicinity.

ContributorsNandan, Swastik (Author) / Pavlic, Theodore (Thesis advisor) / Das, Jnaneshwar (Thesis advisor) / Berman, Spring (Committee member) / Arizona State University (Publisher)

Created2022

LanSAR – Language-commanded Scene-aware Action Response

Description

Robot motion and control remains a complex problem both in general and inthe field of machine learning (ML). Without ML approaches, robot controllers are typically designed manually, which can take considerable time, generally requiring accounting for a range of edge cases and often producing models highly constrained to specific tasks. ML can decrease…

Robot motion and control remains a complex problem both in general and inthe field of machine learning (ML). Without ML approaches, robot controllers are typically designed manually, which can take considerable time, generally requiring accounting for a range of edge cases and often producing models highly constrained to specific tasks. ML can decrease the time it takes to create a model while simultaneously allowing it to operate on a broader range of tasks. The utilization of neural networks to learn from demonstration is, in particular, an approach with growing popularity due to its potential to quickly fit the parameters of a model to mimic training data. Many such neural networks, especially in the realm of transformer-based architectures, act more as planners, taking in an initial context and then generating a sequence from that context one step at a time. Others hybridize the approach, predicting a latent plan and conditioning immediate actions on that plan. Such approaches may limit a model’s ability to interact with a dynamic environment, needing to replan to fully update its understanding of the environmental context. In this thesis, Language-commanded Scene-aware Action Response (LanSAR) is proposed as a reactive transformer-based neural network that makes immediate decisions based on previous actions and environmental changes. Its actions are further conditioned on a language command, serving as a control mechanism while also narrowing the distribution of possible actions around this command. It is shown that LanSAR successfully learns a strong representation of multimodal visual and spatial input, and learns reasonable motions in relation to most language commands. It is also shown that LanSAR can struggle with both the accuracy of motions and understanding the specific semantics of language commands

ContributorsHardy, Adam (Author) / Ben Amor, Heni (Thesis advisor) / Srivastava, Siddharth (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)

Created2024

An exploration of statistical modelling methods on simulation data case study: biomechanical predator-prey simulations

Description

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more…

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”.

As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.

ContributorsSeto, Christian (Author) / Pavlic, Theodore (Thesis advisor) / Li, Jing (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2018

Optimization Models and Algorithms for Wildlife Corridor and Reserve Design in Conservation Planning

Description

Biodiversity has been declining during the last decades due to habitat loss, landscape deterioration, environmental change, and human-related activities. In addition to its economic and cultural value, biodiversity plays an important role in keeping an environment’s ecosystem in balance. Disrupting such processes can reduce the provision of natural resources such…

Biodiversity has been declining during the last decades due to habitat loss, landscape deterioration, environmental change, and human-related activities. In addition to its economic and cultural value, biodiversity plays an important role in keeping an environment’s ecosystem in balance. Disrupting such processes can reduce the provision of natural resources such as food and water, which in turn yields a direct threat to human health. Protecting and restoring natural areas is fundamental to preserve biodiversity and to mitigate the effects of ongoing environmental change. Unfortunately, it is impossible to protect every critical area due to resource limitations, requiring the use of advanced decision tools for the design of conservation plans. This dissertation studies three problems on the design of wildlife corridors and reserves that include patch-specific conservation decisions under spatial, operational, ecological, and biological requirements. In addition to the ecological impact of each problem’s solution, this dissertation contributes a set of formulations, valid inequalities, and pre-processing and solution algorithms for optimization problems with spatial requirements. The first problem is a utility-based corridor design problem to connect fragmented habitats, where each patch has a utility value reflecting its quality. The corridor must satisfy geometry requirements such as a connectivity and minimum width. We propose a mix-integer programming (MIP) model to maximize the total utility of the corridor under the given geometry requirements as well as a budget constraint to reflect the acquisition (or restoration) cost of the selected patches. To overcome the computational difficulty when solving large-scale instances, we develop multiple acceleration techniques, including a brand-and-cut algorithm enhanced with problem-specific valid inequalities and a bound-improving heuristic triggered at each integer node in the branch-and-bound exploration. We test the proposed model and solution algorithm using large-scale fabricated instances and a real case study for the design of an ecological corridor for the Florida Panther. Our modeling framework is able to solve instances of up to 1500 patches within 2 hours to optimality or with a small optimality gap. The second problem introduces the species movement across the fragmented landscape into the corridor design problem. The premise is that dispersal dynamics, if available, must inform the design to account for the corridor’s usage by the species. To this end, we propose a spatial discrete-time absorbing Markov chain (DTMC) approach to represent species dispersal and develop short- and long-term landscape usage metrics. We explore two different types of design problems: open and closed corridors. An open corridor is a sequence of landscape patches used by the species to disperse out of a habitat. For this case, we devise a dynamic programming algorithm that implicitly enumerates possible corridors and finds that of maximum probability. The second problem is to find a closed corridor of maximum probability that connects two fragmented habitats. To solve this problem variant, we extended the framework from the utility-based corridor design problem by blending the recursive Markov chain equations with a network flow nonlinear formulation. The third problem leverages on the DTMC approach to explore a reserve design problem with spatial requirements like connectivity and compactness. We approximate the compactness using the concept of maximum reserve diameter, i.e., the largest distance allowed between two patch in the reserve. To solve this problem, we devise a two-stage approach that balances the trade-off between reserve usage probability and compactness. The first stage's problem is to detect a subset of patches of maximum usage probability, while the second stage's problem imposes the geometry requirements on the optimal solution obtained from the first stage. To overcome the computational difficulty of large-scale landscapes, we develop tailored solution algorithms, including a warm-up heuristic to initialize the branch-and-bound exploration, problem-specific valid inequalities, and a decomposition strategy that sequentially solves smaller problems on landscape partitions.

ContributorsWang, Chao (Author) / Sefair, Jorge A. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pavlic, Theodore (Committee member) / Tong, Daoqin (Committee member) / Arizona State University (Publisher)

Created2021

Tragedy Plus Time: Capturing Human Unintended Activities from Weakly-Labeled Videos

Description

In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early age, humans are able to understand the relation between an…

In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early age, humans are able to understand the relation between an agent and their ultimate goal even if the action gets disrupted or unintentional effects occur. Inculcating this ability in artificially intelligent agents would make them better social learners by not just learning from their own mistakes, i.e, reinforcement learning, but also learning from other's mistakes. For example, this could greatly reduce the search space for artificially intelligent agents for finding the correct action sequence when trying to achieve a new goal, since they would be able to learn from others what not to do as well as how/when actions result in undesired outcomes.To validate this ability of deep learning models to perform this task, the Weakly Augmented Oops (W-Oops) dataset is proposed, built upon the Oops dataset. W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 33 unintentional video-level activity labels collected through human annotations. Inspired by previous methods on tasks such as weakly supervised action localization which show promise for achieving good localization results without ground truth segment annotations, this paper proposes a weakly supervised algorithm for localizing the goal-directed as well as the unintentional temporal region of a video using only video-level labels. In particular, an attention mechanism based strategy is employed that predicts the temporal regions which contributes the most to a classification task, leveraging solely video-level labels. Meanwhile, our designed overlap regularization allows the model to focus on distinct portions of the video for inferring the goal-directed and unintentional activity, while guaranteeing their temporal ordering. Extensive quantitative experiments verify the validity of our localization method.

ContributorsChakravarthy, Arnav (Author) / Yang, Yezhou (Thesis advisor) / Davulcu, Hasan (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by