Search Content

Learning Complex Behaviors from Simple Ones: An analysis of Behavior-based Modular Design for RL Agents

Description

Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed that a complex behavior can be described by a combination…

Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed that a complex behavior can be described by a combination of simpler behaviors. It is tempting to apply similar idea such that simpler behaviors can be combined in a meaningful way to tailor the complex combination. Such an approach would enable faster learning and modular design of behaviors. Complex behaviors can be combined with other behaviors to create even more advanced behaviors resulting in a rich set of possibilities. Similar to RL, combined behavior can keep evolving by interacting with the environment. The requirement of this method is to specify a reasonable set of simple behaviors. In this research, I present an algorithm that aims at combining behavior such that the resulting behavior has characteristics of each individual behavior. This approach has been inspired by behavior based robotics, such as the subsumption architecture and motor schema-based design. The combination algorithm outputs n weights to combine behaviors linearly. The weights are state dependent and change dynamically at every step in an episode. This idea is tested on discrete and continuous environments like OpenAI’s “Lunar Lander” and “Biped Walker”. Results are compared with related domains like Multi-objective RL, Hierarchical RL, Transfer learning, and basic RL. It is observed that the combination of behaviors is a novel way of learning which helps the agent achieve required characteristics. A combination is learned for a given state and so the agent is able to learn faster in an efficient manner compared to other similar approaches. Agent beautifully demonstrates characteristics of multiple behaviors which helps the agent to learn and adapt to the environment. Future directions are also suggested as possible extensions to this research.

ContributorsVora, Kevin Jatin (Author) / Zhang, Yu (Thesis advisor) / Yang, Yezhou (Committee member) / Praharaj, Sarbeswar (Committee member) / Arizona State University (Publisher)

Created2021

Autonomous System Control of Multiple Robotic Arms Collaboration via Machine Learning

Description

Multiple robotic arms collaboration is to control multiple robotic arms to collaborate with each other to work on the same task. During the collaboration, theagent is required to avoid all possible collisions between each part of the robotic arms. Thus, incentivizing collaboration and preventing collisions are the two principles which are followed…

Multiple robotic arms collaboration is to control multiple robotic arms to collaborate with each other to work on the same task. During the collaboration, theagent is required to avoid all possible collisions between each part of the robotic arms. Thus, incentivizing collaboration and preventing collisions are the two principles which are followed by the agent during the training process. Nowadays, more and more applications, both in industry and daily lives, require at least two arms, instead of requiring only a single arm. A dual-arm robot satisfies much more needs of different types of tasks, such as folding clothes at home, making a hamburger in a grill or picking and placing a product in a warehouse. The applications done in this paper are all about object pushing. This thesis focuses on how to train the agent to learn pushing an object away as far as possible. Reinforcement Learning (RL), which is a type of Machine Learning (ML), is then utilized in this paper to train the agent to generate optimal actions. Deep Deterministic Policy Gradient (DDPG) and Hindsight Experience Replay (HER) are the two RL methods used in this thesis.

ContributorsLin, Steve (Author) / Ben Amor, Hani (Thesis advisor) / Redkar, Sangram (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2023

Dynamic Potential Fields for Flexible Behavior-based Swarm Control via Reinforcement Learning

Description

In this thesis work, a novel learning approach to solving the problem of controllinga quadcopter (drone) swarm is explored. To deal with large sizes, swarm control is often achieved in a distributed fashion by combining different behaviors such that each behavior implements some desired swarm characteristics, such as avoiding ob- stacles and staying…

In this thesis work, a novel learning approach to solving the problem of controllinga quadcopter (drone) swarm is explored. To deal with large sizes, swarm control is often achieved in a distributed fashion by combining different behaviors such that each behavior implements some desired swarm characteristics, such as avoiding ob- stacles and staying close to neighbors. One common approach in distributed swarm control uses potential fields. A limitation of this approach is that the potential fields often depend statically on a set of control parameters that are manually specified a priori. This paper introduces Dynamic Potential Fields for flexible swarm control. These potential fields are modulated by a set of dynamic control parameters (DCPs) that can change under different environment situations. Since the focus is only on these DCPs, it simplifies the learning problem and makes it feasible for practical use. This approach uses soft actor critic (SAC) where the actor only determines how to modify DCPs in the current situation, resulting in more flexible swarm control. In the results, this work will show that the DCP approach allows for the drones to bet- ter traverse environments with obstacles compared to several state-of-the-art swarm control methods with a fixed set of control parameters. This approach also obtained a higher safety score commonly used to assess swarm behavior. A basic reinforce- ment learning approach is compared to demonstrate faster convergence. Finally, an ablation study is conducted to validate the design of this approach.

ContributorsFerraro, Calvin Shores (Author) / Zhang, Yu (Thesis advisor) / Ben Amor, Hani (Committee member) / Berman, Spring (Committee member) / Arizona State University (Publisher)

Created2022

Foundations of Human-Aware Explanations for Sequential Decision-Making Problems

Description

Recent breakthroughs in Artificial Intelligence (AI) have brought the dream of developing and deploying complex AI systems that can potentially transform everyday life closer to reality than ever before. However, the growing realization that there might soon be people from all walks of life using and working with these systems…

Recent breakthroughs in Artificial Intelligence (AI) have brought the dream of developing and deploying complex AI systems that can potentially transform everyday life closer to reality than ever before. However, the growing realization that there might soon be people from all walks of life using and working with these systems has also spurred a lot of interest in ensuring that AI systems can efficiently and effectively work and collaborate with their intended users. Chief among the efforts in this direction has been the pursuit of imbuing these agents with the ability to provide intuitive and useful explanations regarding their decisions and actions to end-users. In this dissertation, I will describe various works that I have done in the area of explaining sequential decision-making problems. Furthermore, I will frame the discussions of my work within a broader framework for understanding and analyzing explainable AI (XAI). My works herein tackle many of the core challenges related to explaining automated decisions to users including (1) techniques to address asymmetry in knowledge between the user and the system, (2) techniques to address asymmetry in inferential capabilities, and (3) techniques to address vocabulary mismatch.The dissertation will also describe the works I have done in generating interpretable behavior and policy summarization. I will conclude this dissertation, by using the framework of human-aware explanation as a lens to analyze and understand the current landscape of explainable planning.

ContributorsSreedharan, Sarath (Author) / Kambhampati, Subbarao (Thesis advisor) / Kim, Been (Committee member) / Smith, David E (Committee member) / Srivastava, Siddharth (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2022

What Do You Want Me To Do? Addressing Model Differences for Human-Aware Decision-Making from A Learning Perspective

Description

As intelligent agents become pervasive in our lives, they are expected to not only achieve tasks alone but also engage in tasks with humans in the loop. In such cases, the human naturally forms an understanding of the agent, which affects his perception of the agent’s behavior. However, such an…

As intelligent agents become pervasive in our lives, they are expected to not only achieve tasks alone but also engage in tasks with humans in the loop. In such cases, the human naturally forms an understanding of the agent, which affects his perception of the agent’s behavior. However, such an understanding inevitably deviates from the ground truth due to reasons such as the human’s lack of understanding of the domain or misunderstanding of the agent’s capabilities. Such differences would result in an unmatched expectation of the agent’s behavior with the agent’s optimal behavior, thereby biasing the human’s assessment of the agent’s performance. In this dissertation, I focus on when these differences are due to a biased belief about domain dynamics. I especially investigate the impact of such a biased belief on the agent’s decision-making process in two different problem settings from a learning perspective. In the first setting, the agent is tasked to accomplish a task alone but must infer the human’s objectives from the human’s feedback on the agent’s behavior in the environment. In such a case, the human biased feedback could mislead the agent to learn a reward function that results in a sub-optimal and, potentially, undesired policy. In the second setting, the agent must accomplish a task with a human observer. Given that the agent’s optimal behavior may not match the human’s expectation due to the biased belief, the agent’s optimal behavior may be viewed as inexplicable, leading to degraded performance and loss of trust. Consequently, this dissertation proposes approaches that (1) endow the agent with the ability to be aware of the human’s biased belief while inferring the human’s objectives, thereby (2) neutralize the impact of the model differences in a reinforcement learning framework, and (3) behave explicably by reconciling the human’s expectation and optimality during decision-making.

ContributorsGong, Ze (Author) / Zhang, Yu (Thesis advisor) / Amor, Hani Ben (Committee member) / Kambhampati, Subbarao (Committee member) / Zhang, Wenlong (Committee member) / Arizona State University (Publisher)

Created2022

Adapting Robotic Systems to User Control

Description

In this work, I propose to bridge the gap between human users and adaptive control of robotic systems. The goal is to enable robots to consider user feedback and adjust their behaviors. A critical challenge with designing such systems is that users are often non-experts, with limited knowledge about…

In this work, I propose to bridge the gap between human users and adaptive control of robotic systems. The goal is to enable robots to consider user feedback and adjust their behaviors. A critical challenge with designing such systems is that users are often non-experts, with limited knowledge about the robot's hardware and dynamics. In the domain of human-robot interaction, there exist different modalities of conveying information regarding the desired behavior of the robot, most commonly used are demonstrations, and preferences. While it is challenging for non-experts to provide demonstrations of robot behavior, works that consider preferences expressed as trajectory rankings lead to users providing noisy and possibly conflicting information, leading to slow adaptation or system failures. The end user can be expected to be familiar with the dynamics and how they relate to their desired objectives through repeated interactions with the system. However, due to inadequate knowledge about the system dynamics, it is expected that the user would find it challenging to provide feedback on all dimension's of the system's behavior at all times. Thus, the key innovation of this work is to enable users to provide partial instead of completely specified preferences as with traditional methods that learn from user preferences. In particular, I consider partial preferences in the form of preferences over plant dynamic parameters, for which I propose Adaptive User Control (AUC) of robotic systems. I leverage the correlations between the observed and hidden parameter preferences to deal with incompleteness. I use a sparse Gaussian Process Latent Variable Model formulation to learn hidden variables that represent the relationships between the observed and hidden preferences over the system parameters. This model is trained using Stochastic Variational Inference with a distributed loss formulation. I evaluate AUC in a custom drone-swarm environment and several domains from DeepMind control suite. I compare AUC with the state-of-the-art preference-based reinforcement learning methods that are utilized with user preferences. Results show that AUC outperforms the baselines substantially in terms of sample and feedback complexity.

ContributorsBiswas, Upasana (Author) / Zhang, Yu (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Berman, Spring (Committee member) / Liu, Lantao (Committee member) / Arizona State University (Publisher)

Created2023

Data-Efficient Paradigms for Personalized Assessment of Taskable AI Systems

Description

Recent advances in Artificial Intelligence (AI) have brought AI closer to laypeople than ever before. This leads to a pervasive problem: how would a user ascertain whether an AI system will be safe, reliable, or useful in a given situation? This problem becomes particularly challenging when it is considered that…

Recent advances in Artificial Intelligence (AI) have brought AI closer to laypeople than ever before. This leads to a pervasive problem: how would a user ascertain whether an AI system will be safe, reliable, or useful in a given situation? This problem becomes particularly challenging when it is considered that most autonomous systems are not designed by their users; the internal software of these systems may be unavailable or difficult to understand; and the functionality of these systems may even change from initial specifications as a result of learning. To overcome these challenges, this dissertation proposes a paradigm for third-party autonomous assessment of black-box taskable AI systems. The four main desiderata of such assessment systems are: (i) interpretability: generating a description of the AI system's functionality in a language that the target user can understand; (ii) correctness: ensuring that the description of AI system's working is accurate; (iii) generalizability creating a solution approach that works well for different types of AI systems; and (iv) minimal requirements: creating an assessment system that does not place complex requirements on AI systems to support the third-party assessment, otherwise the manufacturers of AI system's might not support such an assessment. To satisfy these properties, this dissertation presents algorithms and requirements that would enable user-aligned autonomous assessment that helps the user understand the limits of a black-box AI system's safe operability. This dissertation proposes a personalized AI assessment module that discovers the high-level ``capabilities'' of an AI system with arbitrary internal planning algorithms/policies and learns an accurate symbolic description of these capabilities in terms of concepts that a user understands. Furthermore, the dissertation includes the associated theoretical results and the empirical evaluations. The results show that (i) a primitive query-response interface can enable the development of autonomous assessment modules that can derive a causally accurate user-interpretable model of the system's capabilities efficiently, and (ii) such descriptions are easier to understand and reason with for the users than the agent's primitive actions.

ContributorsVerma, Pulkit (Author) / Srivastava, Siddharth (Thesis advisor) / Cooke, Nancy (Committee member) / Fainekos, Georgios (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2024

Autonomously Learning World-Model Representations For Efficient Robot Planning

Description

In today's world, robotic technology has become increasingly prevalent across various fields such as manufacturing, warehouses, delivery, and household applications. Planning is crucial for robots to solve various tasks in such difficult domains. However, most robots rely heavily on humans for world models that enable planning. Consequently, it is not…

In today's world, robotic technology has become increasingly prevalent across various fields such as manufacturing, warehouses, delivery, and household applications. Planning is crucial for robots to solve various tasks in such difficult domains. However, most robots rely heavily on humans for world models that enable planning. Consequently, it is not only expensive to create such world models, as it requires human experts who understand the domain as well as robot limitations, these models may also be biased by human embodiment, which can be limiting for robots whose kinematics are not human-like. This thesis answers the fundamental question: Can we learn such world models automatically? This research shows that we can learn complex world models directly from unannotated and unlabeled demonstrations containing only the configurations of the robot and the objects in the environment. The core contributions of this thesis are the first known approaches for i) task and motion planning that explicitly handle stochasticity, ii) automatically inventing neuro-symbolic state and action abstractions for deterministic and stochastic motion planning, and iii) automatically inventing relational and interpretable world models in the form of symbolic predicates and actions. This thesis also presents a thorough and rigorous empirical experimentation. With experiments in both simulated and real-world settings, this thesis has demonstrated the efficacy and robustness of automatically learned world models in overcoming challenges, generalizing beyond situations encountered during training.

ContributorsShah, Naman (Author) / Srivastava, Siddharth (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Konidaris, George (Committee member) / Speranzon, Alberto (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2024

Foundations of Human-Aware Planning -- A Tale of Three Models

Description

A critical challenge in the design of AI systems that operate with humans in the loop is to be able to model the intentions and capabilities of the humans, as well as their beliefs and expectations of the AI system itself. This allows the AI system to be "human- aware"…

A critical challenge in the design of AI systems that operate with humans in the loop is to be able to model the intentions and capabilities of the humans, as well as their beliefs and expectations of the AI system itself. This allows the AI system to be "human- aware" -- i.e. the human task model enables it to envisage desired roles of the human in joint action, while the human mental model allows it to anticipate how its own actions are perceived from the point of view of the human. In my research, I explore how these concepts of human-awareness manifest themselves in the scope of planning or sequential decision making with humans in the loop. To this end, I will show (1) how the AI agent can leverage the human task model to generate symbiotic behavior; and (2) how the introduction of the human mental model in the deliberative process of the AI agent allows it to generate explanations for a plan or resort to explicable plans when explanations are not desired. The latter is in addition to traditional notions of human-aware planning which typically use the human task model alone and thus enables a new suite of capabilities of a human-aware AI agent. Finally, I will explore how the AI agent can leverage emerging mixed-reality interfaces to realize effective channels of communication with the human in the loop.

ContributorsChakraborti, Tathagata (Author) / Kambhampati, Subbarao (Thesis advisor) / Talamadupula, Kartik (Committee member) / Scheutz, Matthias (Committee member) / Ben Amor, Hani (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2018

Identifying critical regions for robot planning using convolutional neural networks

Description

In this thesis, a new approach to learning-based planning is presented where critical regions of an environment with low probability measure are learned from a given set of motion plans. Critical regions are learned using convolutional neural networks (CNN) to improve sampling processes for motion planning (MP).

In addition to an…

In this thesis, a new approach to learning-based planning is presented where critical regions of an environment with low probability measure are learned from a given set of motion plans. Critical regions are learned using convolutional neural networks (CNN) to improve sampling processes for motion planning (MP).

In addition to an identification network, a new sampling-based motion planner, Learn and Link, is introduced. This planner leverages critical regions to overcome the limitations of uniform sampling while still maintaining guarantees of correctness inherent to sampling-based algorithms. Learn and Link is evaluated against planners from the Open Motion Planning Library (OMPL) on an extensive suite of challenging navigation planning problems. This work shows that critical areas of an environment are learnable, and can be used by Learn and Link to solve MP problems with far less planning time than existing sampling-based planners.

ContributorsMolina, Daniel, M.S (Author) / Srivastava, Siddharth (Thesis advisor) / Li, Baoxin (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by