Search Content

Haptic perception, decision-making, and learning for manipulation with artificial hands

Description

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn, and act to accomplish tasks. Limitations of robotic systems to interact with and manipulate the world diminish their usefulness. In…

Robotic systems are outmatched by the abilities of the human hand to perceive and manipulate the world. Human hands are able to physically interact with the world to perceive, learn, and act to accomplish tasks. Limitations of robotic systems to interact with and manipulate the world diminish their usefulness. In order to advance robot end effectors, specifically artificial hands, rich multimodal tactile sensing is needed. In this work, a multi-articulating, anthropomorphic robot testbed was developed for investigating tactile sensory stimuli during finger-object interactions. The artificial finger is controlled by a tendon-driven remote actuation system that allows for modular control of any tendon-driven end effector and capabilities for both speed and strength. The artificial proprioception system enables direct measurement of joint angles and tendon tensions while temperature, vibration, and skin deformation are provided by a multimodal tactile sensor. Next, attention was focused on real-time artificial perception for decision-making. A robotic system needs to perceive its environment in order to make decisions. Specific actions such as “exploratory procedures” can be employed to classify and characterize object features. Prior work on offline perception was extended to develop an anytime predictive model that returns the probability of having touched a specific feature of an object based on minimally processed sensor data. Developing models for anytime classification of features facilitates real-time action-perception loops. Finally, by combining real-time action-perception with reinforcement learning, a policy was learned to complete a functional contour-following task: closing a deformable ziplock bag. The approach relies only on proprioceptive and localized tactile data. A Contextual Multi-Armed Bandit (C-MAB) reinforcement learning algorithm was implemented to maximize cumulative rewards within a finite time period by balancing exploration versus exploitation of the action space. Performance of the C-MAB learner was compared to a benchmark Q-learner that eventually returns the optimal policy. To assess robustness and generalizability, the learned policy was tested on variations of the original contour-following task. The work presented contributes to the full range of tools necessary to advance the abilities of artificial hands with respect to dexterity, perception, decision-making, and learning.

ContributorsHellman, Randall Blake (Author) / Santos, Veronica J (Thesis advisor) / Artemiadis, Panagiotis K (Committee member) / Berman, Spring (Committee member) / Helms Tillery, Stephen I (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2016

Bi-manual learning for a basketball playing robot

Description

Sports activities have been a cornerstone in the evolution of humankind through the ages from the ancient Roman empire to the Olympics in the 21st century. These activities have been used as a benchmark to evaluate the how humans have progressed through the sands of time. In the 21st century,…

Sports activities have been a cornerstone in the evolution of humankind through the ages from the ancient Roman empire to the Olympics in the 21st century. These activities have been used as a benchmark to evaluate the how humans have progressed through the sands of time. In the 21st century, machines along with the help of powerful computing and relatively new computing paradigms have made a good case for taking up the mantle. Even though machines have been able to perform complex tasks and maneuvers, they have struggled to match the dexterity, coordination, manipulability and acuteness displayed by humans. Bi-manual tasks are more complex and bring in additional variables like coordination into the task making it harder to evaluate.

A task capable of demonstrating the above skillset would be a good measure of the progress in the field of robotic technology. Therefore a dual armed robot has been built and taught to handle the ball and make the basket successfully thus demonstrating the capability of using both arms. A combination of machine learning techniques, Reinforcement learning, and Imitation learning has been used along with advanced optimization algorithms to accomplish the task.

ContributorsKalige, Nikhil (Author) / Amor, Heni Ben (Thesis advisor) / Shrivastava, Aviral (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)

Created2016

FPGA accelerator architecture for Q-learning and its applications in space exploration rovers

Description

Achieving human level intelligence is a long-term goal for many Artificial Intelligence (AI) researchers. Recent developments in combining deep learning and reinforcement learning helped us to move a step forward in achieving this goal. Reinforcement learning using a delayed reward mechanism is an approach to machine intelligence which studies decision…

Achieving human level intelligence is a long-term goal for many Artificial Intelligence (AI) researchers. Recent developments in combining deep learning and reinforcement learning helped us to move a step forward in achieving this goal. Reinforcement learning using a delayed reward mechanism is an approach to machine intelligence which studies decision making with control and how a decision making agent can learn to act optimally in an environment-unaware conditions.

Q-learning is one of the model-free reinforcement directed learning strategies which uses temporal differences to estimate the performances of state-action pairs called Q values. A simple implementation of Q-learning algorithm can be done using a Q table memory to store and update the Q values. However, with an increase in state space data due to a complex environment, and with an increase in possible number of actions an agent can perform, Q table reaches its space limit and would be difficult to scale well. Q-learning with neural networks eliminates the use of Q table by approximating the Q function using neural networks.

Autonomous agents need to develop cognitive properties and become self-adaptive to be deployable in any environment. Reinforcement learning with Q-learning have been very efficient in solving such problems. However, embedded systems like space rovers and autonomous robots rarely implement such techniques due to the constraints faced like processing power, chip area, convergence rate and cost of the chip. These problems present a need for a portable, low power, area efficient hardware accelerator to accelerate the process of such learning.

This problem is targeted by implementing a hardware schematic architecture for Q-learning using Artificial Neural networks. This architecture exploits the massive parallelism provided by neural network with a dedicated fine grain parallelism provided by a Field Programmable Gate Array (FPGA) thereby processing the Q values at a high throughput. Mars exploration rovers currently use Xilinx-Space-grade FPGA devices for image processing, pyrotechnic operation control and obstacle avoidance. The hardware resource consumption for the architecture has been synthesized considering Xilinx Virtex7 FPGA as the target device.

ContributorsGankidi, Pranay Reddy (Author) / Thangavelautham, Jekanthan (Thesis advisor) / Ren, Fengbo (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2016

A novel approach to study task organization in animal groups

Description

A key factor in the success of social animals is their organization of work. Mathematical models have been instrumental in unraveling how simple, individual-based rules can generate collective patterns via self-organization. However, existing models offer limited insights into how these patterns are shaped by behavioral differences within groups, in part…

A key factor in the success of social animals is their organization of work. Mathematical models have been instrumental in unraveling how simple, individual-based rules can generate collective patterns via self-organization. However, existing models offer limited insights into how these patterns are shaped by behavioral differences within groups, in part because they focus on analyzing specific rules rather than general mechanisms that can explain behavior at the individual-level. My work argues for a more principled approach that focuses on the question of how individuals make decisions in costly environments.

In Chapters 2 and 3, I demonstrate how this approach provides novel insights into factors that shape the flexibility and robustness of task organization in harvester ant colonies (Pogonomyrmex barbatus). My results show that the degree to which colonies can respond to work in fluctuating environments depends on how individuals weigh the costs of activity and update their behavior in response to social information. In Chapter 4, I introduce a mathematical framework to study the emergence of collective organization in heterogenous groups. My approach, which is based on the theory of multi-agent systems, focuses on myopic agents whose behavior emerges out of an independent valuation of alternative choices in a given work environment. The product of this dynamic is an equilibrium organization in which agents perform different tasks (or abstain from work) with an analytically defined set of threshold probabilities. The framework is minimally developed, but can be extended to include other factors known to affect task decisions including individual experience and social facilitation. This research contributes a novel approach to developing (and analyzing) models of task organization that can be applied in a broader range of contexts where animals cooperate.

ContributorsUdiani, Oyita (Author) / Kang, Yun (Thesis advisor) / Fewell, Jennifer H (Thesis advisor) / Janssen, Marcus A (Committee member) / Castillo-Chavez, Carlos (Committee member) / Arizona State University (Publisher)

Created2016

Proactive Real-time Control of Multiple Interdependent Water Quality Variables in Buildings Water Networks

Description

Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated with consuming inadequate water can be mitigated. Although the United…

Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated with consuming inadequate water can be mitigated. Although the United States Environmental Protection Agency (EPA) sets and enforces federal water quality limits at water treatment plants, water quality reaching end users degrades during the water delivery process, emphasizing the need for proactive control systems in buildings to ensure safe drinking water.Future commercial and institutional buildings are anticipated to feature real-time water quality sensors, automated flushing and filtration systems, temperature control devices, and chemical boosters. Integrating these technologies with a reliable water quality control system that optimizes the use of chemical additives, filtration, flushing, and temperature adjustments ensures users consistently have access to water of adequate quality. Additionally, existing buildings can be retrofitted with these technologies at a reasonable cost, guaranteeing user safety. In the absence of smart buildings with the required technology, Chapter 2 describes developing an EPANET-MSX (a multi-species extension of EPA’s water simulation tool) model for a typical 5-story building. Chapter 3 involves creating accurate nonlinear approximation models of EPANET-MSX’s complex fluid dynamics and chemical reactions and developing an open-loop water quality control system that can regulate the water quality based on the approximated state of water quality. To address potential sudden changes in water quality, improve predictions, and reduce the gap between approximated and true state of water quality, a feedback control loop is developed in Chapter 4. Lastly, this dissertation includes the development of a reinforcement learning (RL) based water quality control system for cases where the approximation models prove inadequate and cause instability during implementation with a real building water network. The RL-based control system can be implemented in various buildings without the need to develop new hydraulic models and can handle the stochastic nature of water demand, ensuring the proactive control system’s effectiveness in maintaining water quality within safe limits for consumption.

ContributorsGhasemzadeh, Kiarash (Author) / Mirchandani, Pitu (Thesis advisor) / Boyer, Treavor (Committee member) / Ju, Feng (Committee member) / Pedrielli, Giulia (Committee member) / Arizona State University (Publisher)

Created2023

Design of a Graph Neural Network Coupled with an Advantage Actor-Critic Reinforcement Learning Algorithm for Multi-Agent Navigation

Description

A Graph Neural Network (GNN) is a type of neural network architecture that operates on data consisting of objects and their relationships, which are represented by a graph. Within the graph, nodes represent objects and edges represent associations between those objects. The representation of relationships and correlations between data is…

A Graph Neural Network (GNN) is a type of neural network architecture that operates on data consisting of objects and their relationships, which are represented by a graph. Within the graph, nodes represent objects and edges represent associations between those objects. The representation of relationships and correlations between data is unique to graph structures. GNNs exploit this feature of graphs by augmenting both forms of data, individual and relational, and have been designed to allow for communication and sharing of data within each neural network layer. These benefits allow each node to have an enriched perspective, or a better understanding, of its neighbouring nodes and its connections to those nodes. The ability of GNNs to efficiently process high-dimensional node data and multi-faceted relationships among nodes gives them advantages over neural network architectures such as Convolutional Neural Networks (CNNs) that do not implicitly handle relational data. These quintessential characteristics of GNN models make them suitable for solving problems in which the correspondences among input data are needed to produce an accurate and precise representation of these data. GNN frameworks may significantly improve existing communication and control techniques for multi-agent tasks by implicitly representing not only information associated with the individual agents, such as agent position, velocity, and camera data, but also their relationships with one another, such as distances between the agents and their ability to communicate with one another. One such task is a multi-agent navigation problem in which the agents must coordinate with one another in a decentralized manner, using proximity sensors only, to navigate safely to their intended goal positions in the environment without collisions or deadlocks. The contribution of this thesis is the design of an end-to-end decentralized control scheme for multi-agent navigation that utilizes GNNs to prevent inter-agent collisions and deadlocks. The contributions consist of the development, simulation and evaluation of the performance of an advantage actor-critic (A2C) reinforcement learning algorithm that employs actor and critic networks for training that simultaneously approximate the policy function and value function, respectively. These networks are implemented using GNN frameworks for navigation by groups of 3, 5, 10 and 15 agents in simulated two-dimensional environments. It is observed that in $40\%$ to $50\%$ of the simulation trials, between 70$\%$ to 80$\%$ of the agents reach their goal positions without colliding with other agents or becoming trapped in deadlocks. The model is also compared to a random run simulation, where actions are chosen randomly for the agents and observe that the model performs notably well for smaller groups of agents.

ContributorsAyalasomayajula, Manaswini (Author) / Berman, Spring (Thesis advisor) / Mian, Sami (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)

Created2022

Combining learning with knowledge-rich planning allows for efficient multi-agent solutions to the problem of perpetual sparse rewards

Description

This work has improved the quality of the solution to the sparse rewards problemby combining reinforcement learning (RL) with knowledge-rich planning. Classical methods for coping with sparse rewards during reinforcement learning modify the reward landscape so as to better guide the learner. In contrast, this work combines RL with a planner in order…

This work has improved the quality of the solution to the sparse rewards problemby combining reinforcement learning (RL) with knowledge-rich planning. Classical methods for coping with sparse rewards during reinforcement learning modify the reward landscape so as to better guide the learner. In contrast, this work combines RL with a planner in order to utilize other information about the environment. As the scope for representing environmental information is limited in RL, this work has conflated a model-free learning algorithm – temporal difference (TD) learning – with a Hierarchical Task Network (HTN) planner to accommodate rich environmental information in the algorithm. In the perpetual sparse rewards problem, rewards reemerge after being collected within a fixed interval of time, culminating in a lack of a well-defined goal state as an exit condition to the problem. Incorporating planning in the learning algorithm not only improves the quality of the solution, but the algorithm also avoids the ambiguity of incorporating a goal of maximizing profit while using only a planning algorithm to solve this problem. Upon occasionally using the HTN planner, this algorithm provides the necessary tweak toward the optimal solution. In this work, I have demonstrated an on-policy algorithm that has improved the quality of the solution over vanilla reinforcement learning. The objective of this work has been to observe the capacity of the synthesized algorithm in finding optimal policies to maximize rewards, awareness of the environment, and the awareness of the presence of other agents in the vicinity.

ContributorsNandan, Swastik (Author) / Pavlic, Theodore (Thesis advisor) / Das, Jnaneshwar (Thesis advisor) / Berman, Spring (Committee member) / Arizona State University (Publisher)

Created2022

A Deep Reinforcement Learning Approach for Robotic Bicycle Stabilization

Description

Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such as balancing and navigation which may be difficult for disabled people. Their problems could be partially reduced by providing steering…

Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such as balancing and navigation which may be difficult for disabled people. Their problems could be partially reduced by providing steering assistance. For stabilization of these highly maneuverable and efficient machines, many control techniques have been applied – achieving interesting results, but with some limitations which includes strict environmental requirements. This thesis expands on the work of Randlov and Alstrom, using reinforcement learning for bicycle self-stabilization with robotic steering. This thesis applies the deep deterministic policy gradient algorithm, which can handle continuous action spaces which is not possible for Q-learning technique. The research involved algorithm training on virtual environments followed by simulations to assess its results. Furthermore, hardware testing was also conducted on Arizona State University’s RISE lab Smart bicycle platform for testing its self-balancing performance. Detailed analysis of the bicycle trial runs are presented. Validation of testing was done by plotting the real-time states and actions collected during the outdoor testing which included the roll angle of bicycle. Further improvements in regard to model training and hardware testing are also presented.

ContributorsTurakhia, Shubham (Author) / Zhang, Wenlong (Thesis advisor) / Yong, Sze Zheng (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)

Created2020

A Comparative Study of Multi-Agent Reinforcement Learning on Real World Problems

Description

This work investigates the multi-agent reinforcement learning methods that have applicability to real-world scenarios including stochastic, partially observable, and infinite horizon problems. These problems are hard due to large state and control spaces and may require some form of intelligent multi-agent behavior to achieve the target objective. The study…

This work investigates the multi-agent reinforcement learning methods that have applicability to real-world scenarios including stochastic, partially observable, and infinite horizon problems. These problems are hard due to large state and control spaces and may require some form of intelligent multi-agent behavior to achieve the target objective. The study also introduces novel rollout-based methods that provide reasonable guarantees to cost improvements and obtaining a sub-optimal solution to such problems while being amenable to distributed computation and hence a faster runtime. These methods, first introduced and developed for single-agent scenarios, are gradually extended to the multi-agent variants. They have been named multi-agent rollout methods. The problems studied in this work target one or more aspects of three major challenges of real-world problems. Spider and Fly problem deals with stochastic environments, multi-robot repair problem is an example of a partial observation Markov decision problem or POMDP, whereas the Flatland challenge is an RL benchmark that aims to solve the vehicle rescheduling problem. The study also includes comparisons to some existing methods that are used widely for such problems as POMCP, DESPOT, and MADDPG. The work also delineates and compares different behaviors arising out of our methods to other existing methods thereby positing the efficacy of our rollout-based methods in solving real-world multi-agent reinforcement learning problems. Additionally, the source code and problem environments have been released for the community to further the research in this field. The source code and the related research can be found on https://sahilbadyal.com/marl.

ContributorsBadyal, Sahil (Author) / Gil, Stephanie Dr. (Thesis advisor) / Bertsekas, Dimitri Dr. (Committee member) / Yang, Yingzhen Dr. (Committee member) / Arizona State University (Publisher)

Created2021

Representation Learning for Trustworthy AI

Description

Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and…

Artificial Intelligence (AI) systems have achieved outstanding performance and have been found to be better than humans at various tasks, such as sentiment analysis, and face recognition. However, the majority of these state-of-the-art AI systems use complex Deep Learning (DL) methods which present challenges for human experts to design and evaluate such models with respect to privacy, fairness, and robustness. Recent examination of DL models reveals that representations may include information that could lead to privacy violations, unfairness, and robustness issues. This results in AI systems that are potentially untrustworthy from a socio-technical standpoint. Trustworthiness in AI is defined by a set of model properties such as non-discriminatory bias, protection of users’ sensitive attributes, and lawful decision-making. The characteristics of trustworthy AI can be grouped into three categories: Reliability, Resiliency, and Responsibility. Past research has shown that the successful integration of an AI model depends on its trustworthiness. Thus it is crucial for organizations and researchers to build trustworthy AI systems to facilitate the seamless integration and adoption of intelligent technologies. The main issue with existing AI systems is that they are primarily trained to improve technical measures such as accuracy on a specific task but are not considerate of socio-technical measures. The aim of this dissertation is to propose methods for improving the trustworthiness of AI systems through representation learning. DL models’ representations contain information about a given input and can be used for tasks such as detecting fake news on social media or predicting the sentiment of a review. The findings of this dissertation significantly expand the scope of trustworthy AI research and establish a new paradigm for modifying data representations to balance between properties of trustworthy AI. Specifically, this research investigates multiple techniques such as reinforcement learning for understanding trustworthiness in users’ privacy, fairness, and robustness in classification tasks like cyberbullying detection and fake news detection. Since most social measures in trustworthy AI cannot be used to fine-tune or train an AI model directly, the main contribution of this dissertation lies in using reinforcement learning to alter an AI system’s behavior based on non-differentiable social measures.

ContributorsMosallanezhad, Ahmadreza (Author) / Liu, Huan (Thesis advisor) / Mancenido, Michelle (Thesis advisor) / Doupe, Adam (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by