Search Content

Modeling Human Adaptation with Game-theoretic Intention Decoding in Human-Robot Interactions

Description

With the substantial development of intelligent robots, human-robot interaction (HRI) has become ubiquitous in applications such as collaborative manufacturing, surgical robotic operations, and autonomous driving. In all these applications, a human behavior model, which can provide predictions of human actions, is a helpful reference that helps robots to achieve intelligent…

With the substantial development of intelligent robots, human-robot interaction (HRI) has become ubiquitous in applications such as collaborative manufacturing, surgical robotic operations, and autonomous driving. In all these applications, a human behavior model, which can provide predictions of human actions, is a helpful reference that helps robots to achieve intelligent interaction with humans. The requirement elicits an essential problem of how to properly model human behavior, especially when individuals are interacting or cooperating with each other. The major objective of this thesis is to utilize the human intention decoding method to help robots enhance their performance while interacting with humans. Preliminary work on integrating human intention estimation with an HRI scenario is shown to demonstrate the benefit. In order to achieve this goal, the research topic is divided into three phases. First, a novel method of an online measure of the human's reliance on the robot, which can be estimated through the intention decoding process from human actions，is described. An experiment that requires human participants to complete an object-moving task with a robot manipulator was conducted under different conditions of distractions. A relationship is discovered between human intention and trust while participants performed a familiar task with no distraction. This finding suggests a relationship between the psychological construct of trust and joint physical coordination, which bridges the human's action to its mental states. Then, a novel human collaborative dynamic model is introduced based on game theory and bounded rationality, which is a novel method to describe human dyadic behavior with the aforementioned theories. The mutual intention decoding process was also considered to inform this model. Through this model, the connection between the mental states of the individuals to their cooperative actions is indicated. A haptic interface is developed with a virtual environment and the experiments are conducted with 30 human subjects. The result suggests the existence of mutual intention decoding during the human dyadic cooperative behaviors. Last, the empirical results show that allowing agents to have empathy in inference, which lets the agents understand that others might have a false understanding of their intentions, can help to achieve correct intention inference. It has been verified that knowledge about vehicle dynamics was also important to correctly infer intentions. A new courteous policy is proposed that bounded the courteous motion using its inferred set of equilibrium motions. A simulation, which is set to reproduce an intersection passing case between an autonomous car and a human driving car, is conducted to demonstrate the benefit of the novel courteous control policy.

ContributorsWang, Yiwei (Author) / Zhang, Wenlong (Thesis advisor) / Berman, Spring (Committee member) / Lee, Hyunglae (Committee member) / Ren, Yi (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Implicitly Supervised Neural Question Answering

Description

How to teach a machine to understand natural language? This question is a long-standing challenge in Artificial Intelligence. Several tasks are designed to measure the progress of this challenge. Question Answering is one such task that evaluates a machine's ability to understand natural language, where it reads a passage of…

How to teach a machine to understand natural language? This question is a long-standing challenge in Artificial Intelligence. Several tasks are designed to measure the progress of this challenge. Question Answering is one such task that evaluates a machine's ability to understand natural language, where it reads a passage of text or an image and answers comprehension questions. In recent years, the development of transformer-based language models and large-scale human-annotated datasets has led to remarkable progress in the field of question answering. However, several disadvantages of fully supervised question answering systems have been observed. Such as generalizing to unseen out-of-distribution domains, linguistic style differences in questions, and adversarial samples. This thesis proposes implicitly supervised question answering systems trained using knowledge acquisition from external knowledge sources and new learning methods that provide inductive biases to learn question answering. In particular, the following research projects are discussed: (1) Knowledge Acquisition methods: these include semantic and abductive information retrieval for seeking missing knowledge, a method to represent unstructured text corpora as a knowledge graph, and constructing a knowledge base for implicit commonsense reasoning. (2) Learning methods: these include Knowledge Triplet Learning, a method over knowledge graphs; Test-Time Learning, a method to generalize to an unseen out-of-distribution context; WeaQA, a method to learn visual question answering using image captions without strong supervision; WeaSel, weakly supervised method for relative spatial reasoning; and a new paradigm for unsupervised natural language inference. These methods potentially provide a new research direction to overcome the pitfalls of direct supervision.

ContributorsBanerjee, Pratyay (Author) / Baral, Chitta (Thesis advisor) / Yang, Yezhou (Committee member) / Blanco, Eduardo (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2022

Weakly-Supervised Visual-Retriever-Reader Pipeline for Knowledge-Based VQA Tasks

Description

Visual question answering (VQA) is a task that answers the questions by giving an image, and thus involves both language and vision methods to solve, which make the VQA tasks a frontier interdisciplinary field. In recent years, as the great progress made in simple question tasks (e.g. object recognition), researchers…

Visual question answering (VQA) is a task that answers the questions by giving an image, and thus involves both language and vision methods to solve, which make the VQA tasks a frontier interdisciplinary field. In recent years, as the great progress made in simple question tasks (e.g. object recognition), researchers start to shift their interests to the questions that require knowledge and reasoning. Knowledge-based VQA requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverages different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, this paper collects a natural language knowledge base that can be used for any question answering (QA) system. Moreover, a Visual Retriever-Reader pipeline is proposed to approach knowledge-based VQA, where the visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. The retriever is constructed with two versions: term based retriever which uses best matching 25 (BM25), and neural based retriever where the latest dense passage retriever (DPR) is introduced. To encode the visual information, the image and caption are encoded separately in the two kinds of neural based retriever: Image-DPR and Caption-DPR. There are also two styles of readers, classification reader and extraction reader. Both the retriever and reader are trained with weak supervision. The experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge.

ContributorsZeng, Yankai (Author) / Baral, Chitta (Thesis advisor) / Yang, Yezhou (Committee member) / Ghayekhloo, Samira (Committee member) / Arizona State University (Publisher)

Created2021

Physically Realizable Targeted Adversarial Attacks on Autonomous Driving

Description

Autonomous Driving (AD) systems are being researched and developed actively in recent days to solve the task of controlling the vehicles safely without human intervention. One method to solve such task is through deep Reinforcement Learning (RL) approach. In deep RL, the main objective is to find an optimal control…

Autonomous Driving (AD) systems are being researched and developed actively in recent days to solve the task of controlling the vehicles safely without human intervention. One method to solve such task is through deep Reinforcement Learning (RL) approach. In deep RL, the main objective is to find an optimal control behavior, often called policy performed by an agent, which is AD system in this case. This policy is usually learned through Deep Neural Networks (DNNs) based on the observations that the agent perceives along with rewards feedback received from environment.However, recent studies demonstrated the vulnerability of such control policies learned through deep RL against adversarial attacks. This raises concerns about the application of such policies to risk-sensitive tasks like AD. Previous adversarial attacks assume that the threats can be broadly realized in two ways: First one is targeted attacks through manipu- lation of the agent’s complete observation in real time and the other is untargeted attacks through manipulation of objects in environment. The former assumes full access to the agent’s observations at almost all time, while the latter has no control over outcomes of attack. This research investigates the feasibility of targeted attacks through physical adver- sarial objects in the environment, a threat that combines the effectiveness and practicality. Through simulations on one of the popular AD systems, it is demonstrated that a fixed optimal policy can be malfunctioned over time by an attacker e.g., performing an unintended self-parking, when an adversarial object is present. The proposed approach is formulated in such a way that the attacker can learn a dynamics of the environment and also utilizes common knowledge of agent’s dynamics to realize the attack. Further, several experiments are conducted to show the effectiveness of the proposed attack on different driving scenarios empirically. Lastly, this work also studies robustness of object location, and trade-off between the attack strength and attack length based on proposed evaluation metrics.

ContributorsBuddareddygari, Prasanth (Author) / Yang, Yezhou (Thesis advisor) / Ren, Yi (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2021

Theses and Dissertations

Filtering by

Modeling Human Adaptation with Game-theoretic Intention Decoding in Human-Robot Interactions

Implicitly Supervised Neural Question Answering

Weakly-Supervised Visual-Retriever-Reader Pipeline for Knowledge-Based VQA Tasks

Physically Realizable Targeted Adversarial Attacks on Autonomous Driving