Search Content

A high level language for human robot interaction

Description

While developing autonomous intelligent robots has been the goal of many research programs, a more practical application involving intelligent robots is the formation of teams consisting of both humans and robots. An example of such an application is search and rescue operations where robots commanded by humans are sent to…

While developing autonomous intelligent robots has been the goal of many research programs, a more practical application involving intelligent robots is the formation of teams consisting of both humans and robots. An example of such an application is search and rescue operations where robots commanded by humans are sent to environments too dangerous for humans. For such human-robot interaction, natural language is considered a good communication medium as it allows humans with less training about the robot's internal language to be able to command and interact with the robot. However, any natural language communication from the human needs to be translated to a formal language that the robot can understand. Similarly, before the robot can communicate (in natural language) with the human, it needs to formulate its communique in some formal language which then gets translated into natural language. In this paper, I develop a high level language for communication between humans and robots and demonstrate various aspects through a robotics simulation. These language constructs borrow some ideas from action execution languages and are grounded with respect to simulated human-robot interaction transcripts.

ContributorsLumpkin, Barry Thomas (Author) / Baral, Chitta (Thesis advisor) / Lee, Joohyung (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2012

Multimodal Robot Learning for Grasping and Manipulation

Description

Enabling robots to physically engage with their environment in a safe and efficient manner is an essential step towards human-robot interaction. To date, robots usually operate as pre-programmed workers that blindly execute tasks in highly structured environments crafted by skilled engineers. Changing the robots’ behavior to cover new duties or…

Enabling robots to physically engage with their environment in a safe and efficient manner is an essential step towards human-robot interaction. To date, robots usually operate as pre-programmed workers that blindly execute tasks in highly structured environments crafted by skilled engineers. Changing the robots’ behavior to cover new duties or handle variability is an expensive, complex, and time-consuming process. However, with the advent of more complex sensors and algorithms, overcoming these limitations becomes within reach. This work proposes innovations in artificial intelligence, language understanding, and multimodal integration to enable next-generation grasping and manipulation capabilities in autonomous robots. The underlying thesis is that multimodal observations and instructions can drastically expand the responsiveness and dexterity of robot manipulators. Natural language, in particular, can be used to enable intuitive, bidirectional communication between a human user and the machine. To this end, this work presents a system that learns context-aware robot control policies from multimodal human demonstrations. Among the main contributions presented are techniques for (a) collecting demonstrations in an efficient and intuitive fashion, (b) methods for leveraging physical contact with the environment and objects, (c) the incorporation of natural language to understand context, and (d) the generation of robust robot control policies. The presented approach and systems are evaluated in multiple grasping and manipulation settings ranging from dexterous manipulation to pick-and-place, as well as contact-rich bimanual insertion tasks. Moreover, the usability of these innovations, especially when utilizing human task demonstrations and communication interfaces, is evaluated in several human-subject studies.

ContributorsStepputtis, Simon (Author) / Ben Amor, Heni (Thesis advisor) / Baral, Chitta (Committee member) / Yang, Yezhou (Committee member) / Lee, Stefan (Committee member) / Arizona State University (Publisher)

Created2021

Probabilistic Imitation Learning for Spatiotemporal Human-Robot Interaction

Description

Imitation learning is a promising methodology for teaching robots how to physically interact and collaborate with human partners. However, successful interaction requires complex coordination in time and space, i.e., knowing what to do as well as when to do it. This dissertation introduces Bayesian Interaction Primitives, a probabilistic imitation learning…

Imitation learning is a promising methodology for teaching robots how to physically interact and collaborate with human partners. However, successful interaction requires complex coordination in time and space, i.e., knowing what to do as well as when to do it. This dissertation introduces Bayesian Interaction Primitives, a probabilistic imitation learning framework which establishes a conceptual and theoretical relationship between human-robot interaction (HRI) and simultaneous localization and mapping. In particular, it is established that HRI can be viewed through the lens of recursive filtering in time and space. In turn, this relationship allows one to leverage techniques from an existing, mature field and develop a powerful new formulation which enables multimodal spatiotemporal inference in collaborative settings involving two or more agents. Through the development of exact and approximate variations of this method, it is shown in this work that it is possible to learn complex real-world interactions in a wide variety of settings, including tasks such as handshaking, cooperative manipulation, catching, hugging, and more.

ContributorsCampbell, Joseph (Author) / Ben Amor, Heni (Thesis advisor) / Fainekos, Georgios (Thesis advisor) / Yamane, Katsu (Committee member) / Kambhampati, Subbarao (Committee member) / Arizona State University (Publisher)

Created2021

Towards understanding natural language: semantic parsing, commonsense knowledge acquisition, reasoning framework and applications

Description

Reasoning with commonsense knowledge is an integral component of human behavior. It is due to this capability that people know that a weak person may not be able to lift someone. It has been a long standing goal of the Artificial Intelligence community to simulate such commonsense reasoning abilities in…

Reasoning with commonsense knowledge is an integral component of human behavior. It is due to this capability that people know that a weak person may not be able to lift someone. It has been a long standing goal of the Artificial Intelligence community to simulate such commonsense reasoning abilities in machines. Over the years, many advances have been made and various challenges have been proposed to test their abilities. The Winograd Schema Challenge (WSC) is one such Natural Language Understanding (NLU) task which was also proposed as an alternative to the Turing Test. It is made up of textual question answering problems which require resolution of a pronoun to its correct antecedent.

In this thesis, two approaches of developing NLU systems to solve the Winograd Schema Challenge are demonstrated. To this end, a semantic parser is presented, various kinds of commonsense knowledge are identified, techniques to extract commonsense knowledge are developed and two commonsense reasoning algorithms are presented. The usefulness of the developed tools and techniques is shown by applying them to solve the challenge.

ContributorsSharma, Arpita (Author) / Baral, Chitta (Thesis advisor) / Lee, Joohyung (Committee member) / Papotti, Paolo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2019

Human factors analysis of automated planning technologies for human-robot teaming

Description

Humans and robots need to work together as a team to accomplish certain shared goals due to the limitations of current robot capabilities. Human assistance is required to accomplish the tasks as human capabilities are often better suited for certain tasks and they complement robot capabilities in many situations. Given…

Humans and robots need to work together as a team to accomplish certain shared goals due to the limitations of current robot capabilities. Human assistance is required to accomplish the tasks as human capabilities are often better suited for certain tasks and they complement robot capabilities in many situations. Given the necessity of human-robot teams, it has been long assumed that for the robotic agent to be an effective team member, it must be equipped with automated planning technologies that helps in achieving the goals that have been delegated to it by their human teammates as well as in deducing its own goal to proactively support its human counterpart by inferring their goals. However there has not been any systematic evaluation on the accuracy of this claim.

In my thesis, I perform human factors analysis on effectiveness of such automated planning technologies for remote human-robot teaming. In the first part of my study, I perform an investigation on effectiveness of automated planning in remote human-robot teaming scenarios. In the second part of my study, I perform an investigation on effectiveness of a proactive robot assistant in remote human-robot teaming scenarios.

Both investigations are conducted in a simulated urban search and rescue (USAR) scenario where the human-robot teams are deployed during early phases of an emergency response to explore all areas of the disaster scene. I evaluate through both the studies, how effective is automated planning technology in helping the human-robot teams move closer to human-human teams. I utilize both objective measures (like accuracy and time spent on primary and secondary tasks, Robot Attention Demand, etc.) and a set of subjective Likert-scale questions (on situation awareness, immediacy etc.) to investigate the trade-offs between different types of remote human-robot teams. The results from both the studies seem to suggest that intelligent robots with automated planning capability and proactive support ability is welcomed in general.

ContributorsNarayanan, Vignesh (Author) / Kambhampati, Subbarao (Thesis advisor) / Zhang, Yu (Thesis advisor) / Cooke, Nancy J. (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2015

Mixture of interaction primitives for multiple agents: a Python framework

Description

In a collaborative environment where multiple robots and human beings are expected

to collaborate to perform a task, it becomes essential for a robot to be aware of multiple

agents working in its work environment. A robot must also learn to adapt to

different agents in the workspace and conduct its interaction based…

In a collaborative environment where multiple robots and human beings are expected

to collaborate to perform a task, it becomes essential for a robot to be aware of multiple

agents working in its work environment. A robot must also learn to adapt to

different agents in the workspace and conduct its interaction based on the presence

of these agents. A theoretical framework was introduced which performs interaction

learning from demonstrations in a two-agent work environment, and it is called

Interaction Primitives.

This document is an in-depth description of the new state of the art Python

Framework for Interaction Primitives between two agents in a single as well as multiple

task work environment and extension of the original framework in a work environment

with multiple agents doing a single task. The original theory of Interaction

Primitives has been extended to create a framework which will capture correlation

between more than two agents while performing a single task. The new state of the

art Python framework is an intuitive, generic, easy to install and easy to use python

library which can be applied to use the Interaction Primitives framework in a work

environment. This library was tested in simulated environments and controlled laboratory

environment. The results and benchmarks of this library are available in the

related sections of this document.

ContributorsKumar, Ashish, M.S (Author) / Amor, Hani Ben (Thesis advisor) / Zhang, Yu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2017

memeBot: Automatic Image Meme Generation for Online Social Interaction

Description

Internet memes have become a widespread tool used by people for interacting and exchanging ideas over social media, blogs, and open messengers. Internet memes most commonly take the form of an image which is a combination of image, text, and humor, making them a powerful tool to deliver information. Image…

Internet memes have become a widespread tool used by people for interacting and exchanging ideas over social media, blogs, and open messengers. Internet memes most commonly take the form of an image which is a combination of image, text, and humor, making them a powerful tool to deliver information. Image memes are used in viral marketing and mass advertising to propagate any ideas ranging from simple commercials to those that can cause changes and development in the social structures like countering hate speech.

This work proposes to treat automatic image meme generation as a translation process, and further present an end to end neural and probabilistic approach to generate an image-based meme for any given sentence using an encoder-decoder architecture. For a given input sentence, a meme is generated by combining a meme template image and a text caption where the meme template image is selected from a set of popular candidates using a selection module and the meme caption is generated by an encoder-decoder model. An encoder is used to map the selected meme template and the input sentence into a meme embedding space and then a decoder is used to decode the meme caption from the meme embedding space. The generated natural language caption is conditioned on the input sentence and the selected meme template.

The model learns the dependencies between the meme captions and the meme template images and generates new memes using the learned dependencies. The quality of the generated captions and the generated memes is evaluated through both automated metrics and human evaluation. An experiment is designed to score how well the generated memes can represent popular tweets from Twitter conversations. Experiments on Twitter data show the efficacy of the model in generating memes capable of representing a sentence in online social interaction.

ContributorsSadasivam, Aadhavan (Author) / Yang, Yezhou (Thesis advisor) / Baral, Chitta (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2020

Understanding the importance of entities and roles in natural language inference: a model and datasets

Description

In this thesis, I present two new datasets and a modification to the existing models in the form of a novel attention mechanism for Natural Language Inference (NLI). The new datasets have been carefully synthesized from various existing corpora released for different tasks.

The task of NLI is to determine the…

In this thesis, I present two new datasets and a modification to the existing models in the form of a novel attention mechanism for Natural Language Inference (NLI). The new datasets have been carefully synthesized from various existing corpora released for different tasks.

The task of NLI is to determine the possibility of a sentence referred to as “Hypothesis” being true given that another sentence referred to as “Premise” is true. In other words, the task is to identify whether the “Premise” entails, contradicts or remains neutral with regards to the “Hypothesis”. NLI is a precursor to solving many Natural Language Processing (NLP) tasks such as Question Answering and Semantic Search. For example, in Question Answering systems, the question is paraphrased to form a declarative statement which is treated as the hypothesis. The options are treated as the premise. The option with the maximum entailment score is considered as the answer. Considering the applications of NLI, the importance of having a strong NLI system can't be stressed enough.

Many large-scale datasets and models have been released in order to advance the field of NLI. While all of these models do get good accuracy on the test sets of the datasets they were trained on, they fail to capture the basic understanding of “Entities” and “Roles”. They often make the mistake of inferring that “John went to the market.” from “Peter went to the market.” failing to capture the notion of “Entities”. In other cases, these models don't understand the difference in the “Roles” played by the same entities in “Premise” and “Hypothesis” sentences and end up wrongly inferring that “Peter drove John to the stadium.” from “John drove Peter to the stadium.”

The lack of understanding of “Roles” can be attributed to the lack of such examples in the various existing datasets. The reason for the existing model’s failure in capturing the notion of “Entities” is not just due to the lack of such examples in the existing NLI datasets. It can also be attributed to the strict use of vector similarity in the “word-to-word” attention mechanism being used in the existing architectures.

To overcome these issues, I present two new datasets to help make the NLI systems capture the notion of “Entities” and “Roles”. The “NER Changed” (NC) dataset and the “Role-Switched” (RS) dataset contains examples of Premise-Hypothesis pairs that require the understanding of “Entities” and “Roles” respectively in order to be able to make correct inferences. This work shows how the existing architectures perform poorly on the “NER Changed” (NC) dataset even after being trained on the new datasets. In order to help the existing architectures, understand the notion of “Entities”, this work proposes a modification to the “word-to-word” attention mechanism. Instead of relying on vector similarity alone, the modified architectures learn to incorporate the “Symbolic Similarity” as well by using the Named-Entity features of the Premise and Hypothesis sentences. The new modified architectures not only perform significantly better than the unmodified architectures on the “NER Changed” (NC) dataset but also performs as well on the existing datasets.

ContributorsShrivastava, Ishan (Author) / Baral, Chitta (Thesis advisor) / Anwar, Saadat (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2019

Poincare Embeddings for Visualizing Eigenvector Centrality

Description

Hyperbolic geometry, which is a geometry which concerns itself with hyperbolic space, has caught the eye of certain circles in the machine learning community as of late. Lauded for its ability to encapsulate strong clustering as well as latent hierarchies in complex and social networks, hyperbolic geometry has proven itself…

Hyperbolic geometry, which is a geometry which concerns itself with hyperbolic space, has caught the eye of certain circles in the machine learning community as of late. Lauded for its ability to encapsulate strong clustering as well as latent hierarchies in complex and social networks, hyperbolic geometry has proven itself to be an enduring presence in the network science community throughout the 2010s, with no signs of fading into obscurity anytime soon. Hyperbolic embeddings, which map a given graph to hyperbolic space, have particularly proven to be a powerful and dynamic tool for studying complex networks. Hyperbolic embeddings are exploited in this thesis to illustrate centrality in a graph. In network science, centrality quantifies the influence of individual nodes in a graph. Eigenvector centrality is one type of such measure, and assigns an influence weight to each node in a graph by solving for an eigenvector equation. A procedure is defined to embed a given network in a model of hyperbolic space, known as the Poincare disk, according to the influence weights computed by three eigenvector centrality measures: the PageRank algorithm, the Hyperlink-Induced Topic Search (HITS) algorithm, and the Pinski-Narin algorithm. The resulting embeddings are shown to accurately and meaningfully reflect each node's influence and proximity to influential nodes.

ContributorsChang, Alena (Author) / Xue, Guoliang (Thesis advisor) / Yang, Dejun (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Language Conditioned Self-Driving Cars Using Environmental Object Descriptions For Controlling Cars

Description

Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more…

Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more safe and efficient AI agents that can navigate through our cities. However, driving is a very complex task to master even for a human, let alone the challenges in developing robots to do the same. It requires attention and inputs from the surroundings of the car, and it is nearly impossible for us to program all the possible factors affecting this complex task. As a solution, imitation learning was introduced, wherein the agents learn a policy, mapping the observations to the actions through demonstrations given by humans. Through imitation learning, one could easily teach self-driving cars the expected behavior in many scenarios. Despite their autonomous nature, it is undeniable that humans play a vital role in the development and execution of safe and trustworthy self-driving cars and hence form the strongest link in this application of Human-Robot Interaction. Several approaches were taken to incorporate this link between humans and self-driving cars, one of which involves the communication of human's navigational instruction to self-driving cars. The communicative channel provides humans with control over the agent’s decisions as well as the ability to guide them in real-time. In this work, the abilities of imitation learning in creating a self-driving agent that can follow natural language instructions given by humans based on environmental objects’ descriptions were explored. The proposed model architecture is capable of handling latent temporal context in these instructions thus making the agent capable of taking multiple decisions along its course. The work shows promising results that push the boundaries of natural language instructions and their complexities in navigating self-driving cars through towns.

ContributorsMoudhgalya, Nithish B (Author) / Amor, Hani Ben (Thesis advisor) / Baral, Chitta (Committee member) / Yang, Yezhou (Committee member) / Zhang, Wenlong (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by