Search Content

Haptic perception, decision-making, and learning for manipulation with artificial hands

Description

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn, and act to accomplish tasks. Limitations of robotic systems to interact with and manipulate the world diminish their usefulness. In…

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn, and act to accomplish tasks. Limitations of robotic systems to interact with and manipulate the world diminish their usefulness. In order to advance robot end effectors, specifically artificial hands, rich multimodal tactile sensing is needed. In this work, a multi-articulating, anthropomorphic robot testbed was developed for investigating tactile sensory stimuli during finger-object interactions. The artificial finger is controlled by a tendon-driven remote actuation system that allows for modular control of any tendon-driven end effector and capabilities for both speed and strength. The artificial proprioception system enables direct measurement of joint angles and tendon tensions while temperature, vibration, and skin deformation are provided by a multimodal tactile sensor. Next, attention was focused on real-time artificial perception for decision-making. A robotic system needs to perceive its environment in order to make decisions. Specific actions such as “exploratory procedures” can be employed to classify and characterize object features. Prior work on offline perception was extended to develop an anytime predictive model that returns the probability of having touched a specific feature of an object based on minimally processed sensor data. Developing models for anytime classification of features facilitates real-time action-perception loops. Finally, by combining real-time action-perception with reinforcement learning, a policy was learned to complete a functional contour-following task: closing a deformable ziplock bag. The approach relies only on proprioceptive and localized tactile data. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards within a finite time period by balancing exploration versus exploitation of the action space. Performance of the C-MAB learner was compared to a benchmark Q-learner that eventually returns the optimal policy. To assess robustness and generalizability, the learned policy was tested on variations of the original contour-following task. The work presented contributes to the full range of tools necessary to advance the abilities of artificial hands with respect to dexterity, perception, decision-making, and learning.

ContributorsHellman, Randall Blake (Author) / Santos, Veronica J (Thesis advisor) / Artemiadis, Panagiotis K (Committee member) / Berman, Spring (Committee member) / Helms Tillery, Stephen I (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2016

A novel approach to study task organization in animal groups

Description

A key factor in the success of social animals is their organization of work. Mathematical models have been instrumental in unraveling how simple, individual-based rules can generate collective patterns via self-organization. However, existing models offer limited insights into how these patterns are shaped by behavioral differences within groups, in part…

A key factor in the success of social animals is their organization of work. Mathematical models have been instrumental in unraveling how simple, individual-based rules can generate collective patterns via self-organization. However, existing models offer limited insights into how these patterns are shaped by behavioral differences within groups, in part because they focus on analyzing specific rules rather than general mechanisms that can explain behavior at the individual-level. My work argues for a more principled approach that focuses on the question of how individuals make decisions in costly environments.

In Chapters 2 and 3, I demonstrate how this approach provides novel insights into factors that shape the flexibility and robustness of task organization in harvester ant colonies (Pogonomyrmex barbatus). My results show that the degree to which colonies can respond to work in fluctuating environments depends on how individuals weigh the costs of activity and update their behavior in response to social information. In Chapter 4, I introduce a mathematical framework to study the emergence of collective organization in heterogenous groups. My approach, which is based on the theory of multi-agent systems, focuses on myopic agents whose behavior emerges out of an independent valuation of alternative choices in a given work environment. The product of this dynamic is an equilibrium organization in which agents perform different tasks (or abstain from work) with an analytically defined set of threshold probabilities. The framework is minimally developed, but can be extended to include other factors known to affect task decisions including individual experience and social facilitation. This research contributes a novel approach to developing (and analyzing) models of task organization that can be applied in a broader range of contexts where animals cooperate.

ContributorsUdiani, Oyita (Author) / Kang, Yun (Thesis advisor) / Fewell, Jennifer H (Thesis advisor) / Janssen, Marcus A (Committee member) / Castillo-Chavez, Carlos (Committee member) / Arizona State University (Publisher)

Created2016

Proactive Real-time Control of Multiple Interdependent Water Quality Variables in Buildings Water Networks

Description

Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated with consuming inadequate water can be mitigated. Although the United…

Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated with consuming inadequate water can be mitigated. Although the United States Environmental Protection Agency (EPA) sets and enforces federal water quality limits at water treatment plants, water quality reaching end users degrades during the water delivery process, emphasizing the need for proactive control systems in buildings to ensure safe drinking water.Future commercial and institutional buildings are anticipated to feature real-time water quality sensors, automated flushing and filtration systems, temperature control devices, and chemical boosters. Integrating these technologies with a reliable water quality control system that optimizes the use of chemical additives, filtration, flushing, and temperature adjustments ensures users consistently have access to water of adequate quality. Additionally, existing buildings can be retrofitted with these technologies at a reasonable cost, guaranteeing user safety. In the absence of smart buildings with the required technology, Chapter 2 describes developing an EPANET-MSX (a multi-species extension of EPA’s water simulation tool) model for a typical 5-story building. Chapter 3 involves creating accurate nonlinear approximation models of EPANET-MSX’s complex fluid dynamics and chemical reactions and developing an open-loop water quality control system that can regulate the water quality based on the approximated state of water quality. To address potential sudden changes in water quality, improve predictions, and reduce the gap between approximated and true state of water quality, a feedback control loop is developed in Chapter 4. Lastly, this dissertation includes the development of a reinforcement learning (RL) based water quality control system for cases where the approximation models prove inadequate and cause instability during implementation with a real building water network. The RL-based control system can be implemented in various buildings without the need to develop new hydraulic models and can handle the stochastic nature of water demand, ensuring the proactive control system’s effectiveness in maintaining water quality within safe limits for consumption.

ContributorsGhasemzadeh, Kiarash (Author) / Mirchandani, Pitu (Thesis advisor) / Boyer, Treavor (Committee member) / Ju, Feng (Committee member) / Pedrielli, Giulia (Committee member) / Arizona State University (Publisher)

Created2023

Representation Learning for Trustworthy AI

Description

Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and…

Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and evaluate such models with respect to privacy, fairness, and robustness. Recent examination of DL models reveals that representations may include information that could lead to privacy violations, unfairness, and robustness issues. This results in AI systems that are potentially untrustworthy from a socio-technical standpoint. Trustworthiness in AI is defined by a set of model properties such as non-discriminatory bias, protection of users’ sensitive attributes, and lawful decision-making. The characteristics of trustworthy AI can be grouped into three categories: Reliability, Resiliency, and Responsibility. Past research has shown that the successful integration of an AI model depends on its trustworthiness. Thus it is crucial for organizations and researchers to build trustworthy AI systems to facilitate the seamless integration and adoption of intelligent technologies. The main issue with existing AI systems is that they are primarily trained to improve technical measures such as accuracy on a specific task but are not considerate of socio-technical measures. The aim of this dissertation is to propose methods for improving the trustworthiness of AI systems through representation learning. DL models’ representations contain information about a given input and can be used for tasks such as detecting fake news on social media or predicting the sentiment of a review. The findings of this dissertation significantly expand the scope of trustworthy AI research and establish a new paradigm for modifying data representations to balance between properties of trustworthy AI. Specifically, this research investigates multiple techniques such as reinforcement learning for understanding trustworthiness in users’ privacy, fairness, and robustness in classification tasks like cyberbullying detection and fake news detection. Since most social measures in trustworthy AI cannot be used to fine-tune or train an AI model directly, the main contribution of this dissertation lies in using reinforcement learning to alter an AI system’s behavior based on non-differentiable social measures.

ContributorsMosallanezhad, Ahmadreza (Author) / Liu, Huan (Thesis advisor) / Mancenido, Michelle (Thesis advisor) / Doupe, Adam (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2023

Reinforcement Learning for Planning and Scheduling in Aviation

Description

Aviation is a complicated field that involves a wide range of operations, from commercial airline flights to Unmanned Aerial Systems (UAS). Planning and scheduling are essential components in the aviation industry that play a significant role in ensuring safe and efficient operations. Reinforcement Learning (RL) has received increasing attention in…

Aviation is a complicated field that involves a wide range of operations, from commercial airline flights to Unmanned Aerial Systems (UAS). Planning and scheduling are essential components in the aviation industry that play a significant role in ensuring safe and efficient operations. Reinforcement Learning (RL) has received increasing attention in recent years due to its capability to enable autonomous decision-making. To investigate the potential advantages and effectiveness of RL in aviation planning and scheduling, three topics are explored in-depth, including obstacle avoidance, task-oriented path planning, and maintenance scheduling. A dynamic and probabilistic airspace reservation concept, called Dynamic Anisotropic (DA) bound, is first developed for UAS, which can be added around the UAS as the separation requirement. A model based on Q-leaning is proposed to integrate DA bound with path planning for obstacle avoidance. Moreover, A deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to guide the UAS to destinations while avoiding obstacles through continuous control. Results from case studies demonstrate that the proposed model can provide accurate and robust guidance and resolve conflict with a success rate of over 99%. Next, the single-UAS path planning problem is extended to a multi-agent system where agents aim to accomplish their own complex tasks. These tasks involve non-Markovian reward functions and can be specified using reward machines. Both cooperative and competitive environments are explored. Decentralized Graph-based reinforcement learning using Reward Machines (DGRM) is proposed to improve computational efficiency for maximizing the global reward in a graph-based Markov Decision Process (MDP). Q-learning with Reward Machines for Stochastic Games (QRM-SG) is developed to learn the best-response strategy for each agent in a competitive environment. Furthermore, maintenance scheduling is investigated. The purpose is to minimize the system maintenance cost while ensuring compliance with reliability requirements. Maintenance scheduling is formulated as an MDP and determines when and what maintenance operations to conduct. A Linear Programming-enhanced RollouT (LPRT) method is developed to solve both constrained deterministic and stochastic maintenance scheduling with an infinite horizon. LPRT categorizes components according to their health condition and makes decisions for each category.

ContributorsHu, Jueming (Author) / Liu, Yongming YL (Thesis advisor) / Yan, Hao HY (Committee member) / Lee, Hyunglae HL (Committee member) / Zhang, Wenlong WZ (Committee member) / Xu, Zhe ZX (Committee member) / Arizona State University (Publisher)

Created2023

Design and Analysis of Multi-Layer Decomposition for Resource Allocation

Description

A distributed framework is proposed for addressing resource sharing problems in communications, micro-economics, and various other network systems. The approach uses a hierarchical multi-layer decomposition for network utility maximization. This methodology uses central management and distributed computations to allocate resources, and in dynamic environments, it aims to efficiently respond to…

A distributed framework is proposed for addressing resource sharing problems in communications, micro-economics, and various other network systems. The approach uses a hierarchical multi-layer decomposition for network utility maximization. This methodology uses central management and distributed computations to allocate resources, and in dynamic environments, it aims to efficiently respond to network changes. The main contributions include a comprehensive description of an exemplary unifying optimization framework to share resources across different operators and platforms, and a detailed analysis of the generalized methods under the assumption that the network changes are on the same time-scale as the convergence time of the algorithms employed for local computations.Assuming strong concavity and smoothness of the objective functions, and under some stability conditions for each layer, convergence rates and optimality bounds are presented. The effectiveness of the framework is demonstrated through numerical examples. Furthermore, a novel Federated Edge Network Utility Maximization (FEdg-NUM) architecture is proposed for solving large-scale distributed network utility maximization problems in a fully decentralized way. In FEdg-NUM, clients with private utilities communicate with a peer-to-peer network of edge servers. Convergence properties are examined both through analysis and numerical simulations, and potential applications are highlighted. Finally, problems in a complex stochastic dynamic environment, specifically motivated by resource sharing during disasters occurring in multiple areas, are studied. In a hierarchical management scenario, a method of applying a primal-dual algorithm in higher-layer along with deep reinforcement learning algorithms in localities is presented. Analytical details as well as case studies such as pandemic and wildfire response are provided.

ContributorsKarakoc, Nurullah (Author) / Scaglione, Anna (Thesis advisor) / Reisslein, Martin (Thesis advisor) / Nedich, Angelia (Committee member) / Michelusi, Nicolò (Committee member) / Arizona State University (Publisher)

Created2023

Perceiving, Planning, Acting, and Self-Explaining: A Cognitive Quartet with Four Neural Networks

Description

Learning to accomplish complex tasks may require a tight coupling among different levels of cognitive functions or components, like perception, acting, planning, and self-explaining. One may need a coupling between perception and acting components to make decisions automatically especially in emergent situations. One may need collaboration between perception and planning…

Learning to accomplish complex tasks may require a tight coupling among different levels of cognitive functions or components, like perception, acting, planning, and self-explaining. One may need a coupling between perception and acting components to make decisions automatically especially in emergent situations. One may need collaboration between perception and planning components to go with optimal plans in the long run while also drives task-oriented perception. One may also need self-explaining components to monitor and improve the overall learning. In my research, I explore how different cognitive functions or components at different levels, modeled by Deep Neural Networks, can learn and adapt simultaneously. The first question that I address is: Can an intelligent agent leverage recognized plans or human demonstrations to improve its perception that may allow better acting? To answer this question, I explore novel ways to learn to couple perception-acting or perception-planning. As a cornerstone, I will explore how to learn shallow domain models for planning. Apart from these, more advanced cognitive learning agents may also be reflective of what they have experienced so far, either from themselves or from observing others. Likewise, humans may also frequently monitor their learning and draw lessons from failures and others' successes. To this end, I explore the possibility of motivating cognitive agents to learn how to self-explain experiences, accomplishments, and failures, to gain useful insights. By internally making sense of the past experiences, an agent could have its learning of other cognitive functions guided and improved.

ContributorsZha, Yantian (Author) / Kambhampati, Subbarao SK (Thesis advisor) / Li, Baoxin BL (Committee member) / Srivastava, Siddharth SS (Committee member) / Wang, Jianjun JW (Committee member) / Arizona State University (Publisher)

Created2022

Deep Reinforcement Learning Based Voltage Controls for Power Systems under Disturbances

Description

In recent years, there has been an increasing need for effective voltage controls in power systems due to the growing complexity and dynamic nature of practical power grid operations. Deep reinforcement learning (DRL) techniques now have been widely explored and applied to various electric power operation analyses under different control…

In recent years, there has been an increasing need for effective voltage controls in power systems due to the growing complexity and dynamic nature of practical power grid operations. Deep reinforcement learning (DRL) techniques now have been widely explored and applied to various electric power operation analyses under different control structures. With massive data available from phasor measurement units (PMU), it is possible to explore the application of DRL to ensure that electricity is delivered reliably.For steady-state power system voltage regulation and control, this study proposed a novel deep reinforcement learning (DRL) based method to provide voltage control that can quickly remedy voltage violations under different operating conditions. Multiple types of devices, adjustable voltage ratio (AVR) and switched shunts, are considered as controlled devices. A modified deep deterministic policy gradient (DDPG) algorithm is applied to accommodate both the continuous and discrete control action spaces of different devices. A case study conducted on the WECC 240-Bus system validates the effectiveness of the proposed method. System dynamic stability and performance after serious disturbances using DRL are further discussed in this study. A real-time voltage control method is proposed based on DRL, which continuously regulates the excitation system in response to system disturbances. Dynamic performance is considered by incorporating historical voltage data, voltage rate of change, voltage deviation, and regulation amount. A versatile transmission-level power system dynamic training and simulation platform is developed by integrating the simulation software PSS/E and a user-written DRL agent code developed in Python. The platform developed facilitates the training and testing of various power system algorithms and power grids in dynamic simulations with all the modeling capabilities available within PSS/E. The efficacy of the proposed method is evaluated based on the developed platform. To enhance the controller's resilience in addressing communication failures, a dynamic voltage control method employing the Multi-agent DDPG algorithm is proposed. The algorithm follows the principle of centralized training and decentralized execution. Each agent has independent actor neural networks and critic neural networks. Simulation outcomes underscore the method’s efficacy, showcasing its capability in providing voltage support and handling communication failures among agents.

ContributorsWang, Yuling (Author) / Vittal, Vijay (Thesis advisor) / Ayyanar, Raja (Committee member) / Pal, Anamitra (Committee member) / Hedman, Mojdeh (Committee member) / Arizona State University (Publisher)

Created2024

Design of Reinforcement Learning Controllers with Application to Robotic Knee Tuning with Human in the Loop

Description

This dissertation focuses on reinforcement learning (RL) controller design aiming for real-life applications in continuous state and control problems. It involves three major research investigations in the aspect of design, analysis, implementation, and evaluation. The application case addresses automatically configuring robotic prosthesis impedance parameters. Major contributions of the dissertation include…

This dissertation focuses on reinforcement learning (RL) controller design aiming for real-life applications in continuous state and control problems. It involves three major research investigations in the aspect of design, analysis, implementation, and evaluation. The application case addresses automatically configuring robotic prosthesis impedance parameters. Major contributions of the dissertation include the following. 1) An “echo control” using the intact knee profile as target is designed to overcome the limitation of a designer prescribed robotic knee profile. 2) Collaborative multiagent reinforcement learning (cMARL) is proposed to directly take into account human influence in the robot control design. 3) A phased actor in actor-critic (PAAC) reinforcement learning method is developed to reduce learning variance in RL. The design of an “echo control” is based on a new formulation of direct heuristic dynamic programming (dHDP) for tracking control of a robotic knee prosthesis to mimic the intact knee profile. A systematic simulation of the proposed control is provided using a human-robot system simulation in OpenSim. The tracking controller is then tested on able-bodied and amputee subjects. This is the first real-time human testing of RL tracking control of a robotic knee to mirror the profile of an intact knee. The cMARL is a new solution framework for the human-prosthesis collaboration (HPC) problem. This is the first attempt at considering human influence on human-robot walking with the presence of a reinforcement learning controlled lower limb prosthesis. Results show that treating the human and robot as coupled and collaborating agents and using an estimated human adaptation in robot control design help improve human walking performance. The above studies have demonstrated great potential of RL control in solving continuous problems. To solve more complex real-life tasks with multiple control inputs and high dimensional state space, high variance, low data efficiency, slow learning or even instability are major roadblocks to be addressed. A novel PAAC method is proposed to improve learning performance in policy gradient RL by accounting for both Q value and TD error in actor updates. Systematical and comprehensive demonstrations show its effectiveness by qualitative analysis and quantitative evaluation in DeepMind Control Suite.

ContributorsWu, Ruofan (Author) / Si, Jennie (Thesis advisor) / Huang, He (Committee member) / Santello, Marco (Committee member) / Papandreou- Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2023

Data-Efficient Reinforcement Learning Control of Robotic Lower-Limb Prosthesis With Human in the Loop

Description

Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized manually by prosthetists for each individual user in different task environments. Reinforcement learning (RL) is capable of automatically learning from…

Robotic lower limb prostheses provide new opportunities to help transfemoral amputees regain mobility. However, their application is impeded by that the impedance control parameters need to be tuned and optimized manually by prosthetists for each individual user in different task environments. Reinforcement learning (RL) is capable of automatically learning from interacting with the environment. It becomes a natural candidate to replace human prosthetists to customize the control parameters. However, neither traditional RL approaches nor the popular deep RL approaches are readily suitable for learning with limited number of samples and samples with large variations. This dissertation aims to explore new RL based adaptive solutions that are data-efficient for controlling robotic prostheses.

This dissertation begins by proposing a new flexible policy iteration (FPI) framework. To improve sample efficiency, FPI can utilize either on-policy or off-policy learning strategy, can learn from either online or offline data, and can even adopt exiting knowledge of an external critic. Approximate convergence to Bellman optimal solutions are guaranteed under mild conditions. Simulation studies validated that FPI was data efficient compared to several established RL methods. Furthermore, a simplified version of FPI was implemented to learn from offline data, and then the learned policy was successfully tested for tuning the control parameters online on a human subject.

Next, the dissertation discusses RL control with information transfer (RL-IT), or knowledge-guided RL (KG-RL), which is motivated to benefit from transferring knowledge acquired from one subject to another. To explore its feasibility, knowledge was extracted from data measurements of able-bodied (AB) subjects, and transferred to guide Q-learning control for an amputee in OpenSim simulations. This result again demonstrated that data and time efficiency were improved using previous knowledge.

While the present study is new and promising, there are still many open questions to be addressed in future research. To account for human adaption, the learning control objective function may be designed to incorporate human-prosthesis performance feedback such as symmetry, user comfort level and satisfaction, and user energy consumption. To make the RL based control parameter tuning practical in real life, it should be further developed and tested in different use environments, such as from level ground walking to stair ascending or descending, and from walking to running.

ContributorsGao, Xiang (Author) / Si, Jennie (Thesis advisor) / Huang, He Helen (Committee member) / Santello, Marco (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by