Value and Policy Approximation for Two-player General-sum Differential Games

193641-Thumbnail Image.png
Description
Human-robot interactions can often be formulated as general-sum differential games where the equilibrial policies are governed by Hamilton-Jacobi-Isaacs (HJI) equations. Solving HJI PDEs faces the curse of dimensionality (CoD). While physics-informed neural networks (PINNs) alleviate CoD in solving PDEs with

Human-robot interactions can often be formulated as general-sum differential games where the equilibrial policies are governed by Hamilton-Jacobi-Isaacs (HJI) equations. Solving HJI PDEs faces the curse of dimensionality (CoD). While physics-informed neural networks (PINNs) alleviate CoD in solving PDEs with smooth solutions, they fall short in learning discontinuous solutions due to their sampling nature. This causes PINNs to have poor safety performance when they are applied to approximate values that are discontinuous due to state constraints. This dissertation aims to improve the safety performance of PINN-based value and policy models. The first contribution of the dissertation is to develop learning methods to approximate discontinuous values. Specifically, three solutions are developed: (1) hybrid learning uses both supervisory and PDE losses, (2) value-hardening solves HJIs with increasing Lipschitz constant on the constraint violation penalty, and (3) the epigraphical technique lifts the value to a higher-dimensional state space where it becomes continuous. Evaluations through 5D and 9D vehicle and 13D drone simulations reveal that the hybrid method outperforms others in terms of generalization and safety performance. The second contribution is a learning-theoretical analysis of PINN for value and policy approximation. Specifically, by extending the neural tangent kernel (NTK) framework, this dissertation explores why the choice of activation function significantly affects the PINN generalization performance, and why the inclusion of supervisory costate data improves the safety performance. The last contribution is a series of extensions of the hybrid PINN method to address real-time parameter estimation problems in incomplete-information games. Specifically, a Pontryagin-mode PINN is developed to avoid costly computation for supervisory data. The key idea is the introduction of a costate loss, which is cheap to compute yet effectively enables the learning of important value changes and policies in space-time. Building upon this, a Pontryagin-mode neural operator is developed to achieve state-of-the-art (SOTA) safety performance across a set of differential games with parametric state constraints. This dissertation demonstrates the utility of the resultant neural operator in estimating player constraint parameters during incomplete-information games.
Date Created
2024
Agent

Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

191489-Thumbnail Image.png
Description
This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a

This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. Yet as RL control has developed, CT-RL results have greatly lagged their discrete-time RL (DT-RL) counterparts, especially in regards to real-world applications. Current CT-RL algorithms generally fall into two classes: adaptive dynamic programming (ADP), and actor-critic deep RL (DRL). The first school of ADP methods features elegant theoretical results stemming from adaptive and optimal control. Yet, they have not been shown effectively synthesizing meaningful controllers. The second school of DRL has shown impressive learning solutions, yet theoretical guarantees are still to be developed. A substantive analysis uncovering the quantitative causes of the fundamental gap between CT and DT remains to be conducted. Thus, this work develops a first-of-its kind quantitative evaluation framework to diagnose the performance limitations of the leading CT-RL methods. This dissertation also introduces a suite of new CT-RL algorithms which offers both theoretical and synthesis guarantees. The proposed design approach relies on three important factors. First, for physical systems that feature physically-motivated dynamical partitions into distinct loops, the proposed decentralization method breaks the optimal control problem into smaller subproblems. Second, the work introduces a new excitation framework to improve persistence of excitation (PE) and numerical conditioning via classical input/output insights. Third, the method scales the learning problem via design-motivated invertible transformations of the system state variables in order to modulate the algorithm learning regression for further increases in numerical stability. This dissertation introduces a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms implementing these paradigms. It rigorously proves convergence, optimality, and closed-loop stability guarantees of the proposed methods, which are demonstrated in comprehensive comparative studies with the leading methods in ADP on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV). It also conducts comprehensive comparative studies with the leading DRL methods on three state-of-the-art (SOTA) environments, revealing new performance/design insights.
Date Created
2024
Agent

Design of Reinforcement Learning Controllers with Application to Robotic Knee Tuning with Human in the Loop

191018-Thumbnail Image.png
Description
This dissertation focuses on reinforcement learning (RL) controller design aiming for real-life applications in continuous state and control problems. It involves three major research investigations in the aspect of design, analysis, implementation, and evaluation. The application case addresses automatically configuring

This dissertation focuses on reinforcement learning (RL) controller design aiming for real-life applications in continuous state and control problems. It involves three major research investigations in the aspect of design, analysis, implementation, and evaluation. The application case addresses automatically configuring robotic prosthesis impedance parameters. Major contributions of the dissertation include the following. 1) An “echo control” using the intact knee profile as target is designed to overcome the limitation of a designer prescribed robotic knee profile. 2) Collaborative multiagent reinforcement learning (cMARL) is proposed to directly take into account human influence in the robot control design. 3) A phased actor in actor-critic (PAAC) reinforcement learning method is developed to reduce learning variance in RL. The design of an “echo control” is based on a new formulation of direct heuristic dynamic programming (dHDP) for tracking control of a robotic knee prosthesis to mimic the intact knee profile. A systematic simulation of the proposed control is provided using a human-robot system simulation in OpenSim. The tracking controller is then tested on able-bodied and amputee subjects. This is the first real-time human testing of RL tracking control of a robotic knee to mirror the profile of an intact knee. The cMARL is a new solution framework for the human-prosthesis collaboration (HPC) problem. This is the first attempt at considering human influence on human-robot walking with the presence of a reinforcement learning controlled lower limb prosthesis. Results show that treating the human and robot as coupled and collaborating agents and using an estimated human adaptation in robot control design help improve human walking performance. The above studies have demonstrated great potential of RL control in solving continuous problems. To solve more complex real-life tasks with multiple control inputs and high dimensional state space, high variance, low data efficiency, slow learning or even instability are major roadblocks to be addressed. A novel PAAC method is proposed to improve learning performance in policy gradient RL by accounting for both Q value and TD error in actor updates. Systematical and comprehensive demonstrations show its effectiveness by qualitative analysis and quantitative evaluation in DeepMind Control Suite.
Date Created
2023
Agent

Dynamic Modeling and Control of Octopus-Inspired Soft Continuum Robots with Distributed Sensing and Actuation

168698-Thumbnail Image.png
Description
Soft continuum robots with the ability to bend, twist, elongate, and shorten, similar to octopus arms, have many potential applications, such as dexterous manipulation and navigation through unstructured, dynamic environments. Novel soft materials such as smart hydrogels, which change volume

Soft continuum robots with the ability to bend, twist, elongate, and shorten, similar to octopus arms, have many potential applications, such as dexterous manipulation and navigation through unstructured, dynamic environments. Novel soft materials such as smart hydrogels, which change volume and other properties in response to stimuli such as temperature, pH, and chemicals, can potentially be used to construct soft robots that achieve self-regulated adaptive reconfiguration through on-demand dynamic control of local properties. However, the design of controllers for soft continuum robots is challenging due to their high-dimensional configuration space and the complexity of modeling soft actuator dynamics. To address these challenges, this dissertation presents two different model-based control approaches for robots with distributed soft actuators and sensors and validates the approaches in simulations and physical experiments. It is demonstrated that by choosing an appropriate dynamical model and designing a decentralized controller based on this model, such robots can be controlled to achieve diverse types of complex configurations. The first approach consists of approximating the dynamics of the system, including its actuators, as a linear state-space model in order to apply optimal robust control techniques such as H∞ state-feedback and H∞ output-feedback methods. These techniques are designed to utilize the decentralized control structure of the robot and its distributed sensing and actuation to achieve vibration control and trajectory tracking. The approach is validated in simulation on an Euler-Bernoulli dynamic model of a hydrogel based cantilevered robotic arm and in experiments with a hydrogel-actuated miniature 2-DOF manipulator. The second approach is developed for soft continuum robots with dynamics that can be modeled using Cosserat rod theory. An inverse dynamics control approach is implemented on the Cosserat model of the robot for tracking configurations that include bending, torsion, shear, and extension deformations. The decentralized controller structure facilitates its implementation on robot arms composed of independently-controllable segments that have local sensing and actuation. This approach is validated on simulated 3D robot arms and on an actual silicone robot arm with distributed pneumatic actuation, for which the inverse dynamics problem is solved in simulation and the computed control outputs are applied to the robot in real-time.
Date Created
2022
Agent

Modeling, Design and Control of Power Converters

168451-Thumbnail Image.png
Description
This dissertation examines modeling, design and control challenges associatedwith two classes of power converters: a direct current-direct current (DC-DC) step-down (buck) regulator and a 3-phase (3-ϕ) 4-wire direct current-alternating current (DC-AC) inverter. These are widely used for power transfer in a

This dissertation examines modeling, design and control challenges associatedwith two classes of power converters: a direct current-direct current (DC-DC) step-down (buck) regulator and a 3-phase (3-ϕ) 4-wire direct current-alternating current (DC-AC) inverter. These are widely used for power transfer in a variety of industrial and personal applications. This motivates the precise quantification of conditions under which existing modeling and design methods yield satisfactory designs, and the study of alternatives when they don’t. This dissertation describes a method utilizing Fourier components of the input square wave and the inductor-capacitor (LC) filter transfer function, which doesn’t require the small ripple approximation. Then, trade-offs associated with the choice of the filter order are analyzed for integrated buck converters with a constraint on their chip area. Design specifications which would justify using a fourth or sixth order filter instead of the widely used second order one are examined. Next, sampled-data (SD) control of a buck converter is analyzed. Three methods for the digital controller design are studied: analog design followed by discretization, direct digital design of a discretized plant, and a “lifting” based method wherein the sampling time is incorporated in the design process by lifting the continuous-time design plant before doing the controller design. Specifically, controller performance is quantified by studying the induced-L2 norm of the closed loop system for a range of switching/sampling frequencies. In the final segment of this dissertation, the inner-outer control loop, employed in inverters with an inductor-capacitor-inductor (LCL) output filter, is studied. Closed loop sensitivities for the loop broken at the error and the control are examined, demonstrating that traditional methods only address these properties for one loop-breaking point. New controllers are then provided for improving both sets of properties.
Date Created
2021
Agent

Image Processing Techniques for Object Sorting by a Two Degree of Freedom Robotic Manipulator: A Comparative Computer Simulation Study

168443-Thumbnail Image.png
Description
Object sorting is a very common application especially in the industry setting, but this is a labor intensive and time consuming process and it proves to be challenging if done manually. Thanks to the rapid development in technology now almost

Object sorting is a very common application especially in the industry setting, but this is a labor intensive and time consuming process and it proves to be challenging if done manually. Thanks to the rapid development in technology now almost all these object sorting tasks are partially or completely automated. Image processing techniques are essential for the full operation of such a pick and place robot as it is responsible for perceiving the environment and to correctly identify ,classify and localize the different objects in it. In order for the robots to perform accurate object sorting with efficiency and stability this thesis discusses how different Deep learning based perception techniques can be used. In the era of Artificial Intelligence this sorting problem can be done more efficiently than the existing techniques. This thesis presents different image processing techniques and algorithms that can be used to perform object sorting efficiently. A comparison between three different deep learning based techniques is presented and their pros and cons are discussed. Furthermore this thesis also presents a comprehensive study about the kinematics and the dynamics involved in a 2 Degree of Freedom Robotic Manipulator .
Date Created
2021
Agent

Design, Modeling and Control of an Inverted Pendulum on a Cart

161364-Thumbnail Image.png
Description
The Inverted Pendulum on a Cart is a classical control theory problem that helps understand the importance of feedback control systems for a coupled plant. In this study, a custom built pendulum system is coupled with a linearly actuated cart

The Inverted Pendulum on a Cart is a classical control theory problem that helps understand the importance of feedback control systems for a coupled plant. In this study, a custom built pendulum system is coupled with a linearly actuated cart and a control system is designed to show the stability of the pendulum. The three major objectives of this control system are to swing up the pendulum, balance the pendulum in the inverted position (i.e. $180^\circ$), and maintain the position of the cart. The input to this system is the translational force applied to the cart using the rotation of the tires. The main objective of this thesis is to design a control system that will help in balancing the pendulum while maintaining the position of the cart and implement it in a robot. The pendulum is made free rotating with the help of ball bearings and the angle of the pendulum is measured using an Inertial Measurement Unit (IMU) sensor. The cart is actuated by two Direct Current (DC) motors and the position of the cart is measured using encoders that generate pulse signals based on the wheel rotation. The control is implemented in a cascade format where an inner loop controller is used to stabilize and balance the pendulum in the inverted position and an outer loop controller is used to control the position of the cart. Both the inner loop and outer loop controllers follow the Proportional-Integral-Derivative (PID) control scheme with some modifications for the inner loop. The system is first mathematically modeled using the Newton-Euler first principles method and based on this model, a controller is designed for specific closed-loop parameters. All of this is implemented on hardware with the help of an Arduino Due microcontroller which serves as the main processing unit for the system.
Date Created
2021
Agent

Dynamical System Design for Control of Single and Multiple Non-holonomic Differential Drive Robots Based on Critical Design Trade Studies

161260-Thumbnail Image.png
Description
Over the past few decades, there is an increase in demand for various ground robot applications such as warehouse management, surveillance, mapping, infrastructure inspection, etc. This steady increase in demand has led to a significant rise in the nonholonomic differential

Over the past few decades, there is an increase in demand for various ground robot applications such as warehouse management, surveillance, mapping, infrastructure inspection, etc. This steady increase in demand has led to a significant rise in the nonholonomic differential drive vehicles (DDV) research. Albeit extensive work has been done in developing various control laws for trajectory tracking, point stabilization, formation control, etc., there are still problems and critical questions in regards to design, modeling, and control of DDV’s - that need to be adequately addressed. In this thesis, three different dynamical models are considered that are formed by varying the input/output parameters of the DDV model. These models are analyzed to understand their stability, bandwidth, input-output coupling, and control design properties. Furthermore, a systematic approach has been presented to show the impact of design parameters such as mass, inertia, radius of the wheels, and center of gravity location on the dynamic and inner-loop (speed) control design properties. Subsequently, extensive simulation and hardware trade studies have been conductedto quantify the impact of design parameters and modeling variations on the performance of outer-loop cruise and position control (along a curve). In addition to this, detailed guidelines are provided for when a multi-input multi-output (MIMO) control strategy is advisable over a single-input single-output (SISO) control strategy; when a less stable plant is preferable over a more stable one in order to accommodate performance specifications. Additionally, a multi-robot trajectory tracking implementation based on receding horizon optimization approach is also presented. In most of the optimization-based trajectory tracking approaches found in the literature, only the constraints imposed by the kinematic model are incorporated into the problem. This thesis elaborates the fundamental problem associated with these methods and presents a systematic approach to understand and quantify when kinematic model based constraints are sufficient and when dynamic model-based constraints are necessary to obtain good tracking properties. Detailed instructions are given for designing and building the DDV based on performance specifications, and also, an open-source platform capable of handling high-speed multi-robot research is developed in C++.
Date Created
2021
Agent

Modeling, Design and Control of a 6 D-O-F Quadcopter Fleet With Platooning Control

Description
Vertical take-off and landing (VTOL) systems have become a crucial component of aeronautical and commercial applications alike. Quadcopter systems are rather convenient to analyze and design controllers for, owing to symmetry in body dynamics. In this work, a quadcopter model

Vertical take-off and landing (VTOL) systems have become a crucial component of aeronautical and commercial applications alike. Quadcopter systems are rather convenient to analyze and design controllers for, owing to symmetry in body dynamics. In this work, a quadcopter model at hover equilibrium is derived, using both high and low level control. The low level control system is designed to track reference Euler angles (roll, pitch and yaw) as shown in previous work [1],[2]. The high level control is designed to track reference X, Y, and Z axis states [3]. The objective of this paper is to model, design and simulate platooning (separation) control for a fleet of 6 quadcopter units, each comprising of high and low level control systems, using a leader-follower approach. The primary motivation of this research is to examine the ”accordion effect”, a phenomenon observed in leader-follower systems due to which positioning or spacing errors arise in follower vehicles due to sudden changes in lead vehicle velocity. It is proposed that the accordion effect occurs when lead vehicle information is not directly communicated with the rest of the system [4][5] . In this paper, the effect of leader acceleration feedback is observed for the quadcopter platoon. This is performed by first designing a classical platoon controller for a nominal case, where communication within the system is purely ad-hoc (i.e from one quadcopter to it’s immediate successor in the fleet). Steady state separation/positioning errors for each member of the fleet are observed and documented during simulation. Following this analysis, lead vehicle acceleration is provided to the controller (as a feed forward term), to observe the extent of it’s effect on steady state separation, specifically along tight maneuvers. Thus the key contribution of this work is a controller that stabilizes a platoon of quadcopters in the presence of the accordion effect, when employing a leader-follower approach. The modeling shown in this paper builds on previous research to design a low costquadcopter platform, the Mark 3 copter [1]. Prior to each simulation, model nonlinearities and hardware constants are measured or derived from the Mark 3 model, in an effort to observe the working of the system in the presence of realistic hardware constraints. The system is designed in compliance with Robot Operating System (ROS) and the Micro Air Vehicle Link (MAVLINK) communication protocol.
Date Created
2021
Agent

Experimental Analysis on Collaborative Human Behavior in a Physical Interaction Environment

158796-Thumbnail Image.png
Description
Daily collaborative tasks like pushing a table or a couch require haptic communication between the people doing the task. To design collaborative motion planning algorithms for such applications, it is important to understand human behavior. Collaborative tasks involve continuous adaptations

Daily collaborative tasks like pushing a table or a couch require haptic communication between the people doing the task. To design collaborative motion planning algorithms for such applications, it is important to understand human behavior. Collaborative tasks involve continuous adaptations and intent recognition between the people involved in the task. This thesis explores the coordination between the human-partners through a virtual setup involving continuous visual feedback. The interaction and coordination are modeled as a two-step process: 1) Collecting data for a collaborative couch-pushing task, where both the people doing the task have complete information about the goal but are unaware of each other's cost functions or intentions and 2) processing the emergent behavior from complete information and fitting a model for this behavior to validate a mathematical model of agent-behavior in multi-agent collaborative tasks. The baseline model is updated using different approaches to resemble the trajectories generated by these models to human trajectories. All these models are compared to each other. The action profiles of both the agents and the position and velocity of the manipulated object during a goal-oriented task is recorded and used as expert-demonstrations to fit models resembling human behaviors. Analysis through hypothesis teasing is also performed to identify the difference in behaviors when there are complete information and information asymmetry among agents regarding the goal position.
Date Created
2020
Agent