JEDAI.Ed: An Interactive Explainable AI Platform for Outreach with Robotics Programming

193839-Thumbnail Image.png
Description
While the growing prevalence of robots in industry and daily life necessitatesknowing how to operate them safely and effectively, the steep learning curve of programming languages and formal AI education is a barrier for most beginner users. This thesis presents an interactive

While the growing prevalence of robots in industry and daily life necessitatesknowing how to operate them safely and effectively, the steep learning curve of programming languages and formal AI education is a barrier for most beginner users. This thesis presents an interactive platform which leverages a block based programming interface with natural language instructions to teach robotics programming to novice users. An integrated robot simulator allows users to view the execution of their high-level plan, with the hierarchical low level planning abstracted away from them. Users are provided human-understandable explanations of their planning failures and hints using LLMs to enhance the learning process. The results obtained from a user study conducted with students having minimal programming experience show that JEDAI-Ed is successful in teaching robotic planning to users, as well as increasing their curiosity about AI in general.
Date Created
2024
Agent

Applications of Conditional Abstractions for Sample Efficient And Scalable Reinforcement Learning

193583-Thumbnail Image.png
Description
Reinforcement Learning (RL) presents a diverse and expansive collection of approaches that enable systems to learn and adapt through interaction with their environments. However, the widespread deployment of RL in real-world applications is hindered by challenges related to sample efficiency

Reinforcement Learning (RL) presents a diverse and expansive collection of approaches that enable systems to learn and adapt through interaction with their environments. However, the widespread deployment of RL in real-world applications is hindered by challenges related to sample efficiency and the interpretability of decision-making processes. This thesis addresses the critical challenges of sample efficiency and interpretability in reinforcement learning (RL), which are pivotal for advancing RL applications in complex, real-world scenarios.This work first presents a novel approach for learning dynamic abstract representations for continuous or parameterized state and action spaces. Empirical evaluations show that the proposed approach achieves a higher sample efficiency and beat state- of-the-art Deep-RL methods. Second, it presents a new approach HOPL for Transfer Reinforcement Learning (RL) for Stochastic Shortest Path (SSP) problems in factored domains with unknown transition functions. This approach continually learns transferable, generalizable knowledge in the form of symbolically represented options and integrates search techniques with RL to solve new problems by efficiently composing the learned options. The empirical results show that the approach achieves superior sample efficiency as compared to SOTA methods for transfering learned knowledge.
Date Created
2024
Agent

Learning to Grasp Using the Extrinsic Property of the Environment

193581-Thumbnail Image.png
Description
Grasping objects in a general household setting is a dexterous task, high compliance is needed to generate a grasp that leads to grasp closure. Standard 6 Degree of Freedom (DoF) manipulators with parallel grippers are naturally incapable of showing

Grasping objects in a general household setting is a dexterous task, high compliance is needed to generate a grasp that leads to grasp closure. Standard 6 Degree of Freedom (DoF) manipulators with parallel grippers are naturally incapable of showing such dexterity. This renders many objects in household settings difficult to grasp, as the manipulator cannot access readily available antipodal (planar) grasps. In such scenarios, one must either use a high DoF end effector to learn this compliance or change the initial configuration of the object to find an antipodal grasp. A pipeline that uses the extrinsic forces present in the environment to make up for this lack of compliance is proposed. The proposed method: i) Takes the point cloud input from the environment, and creates a search space with all its available poses. This search space is used to identify the best graspable position for an object with a grasp score network ii) Learn how to approach an object, and generate an appropriate set of motor primitives that converts the current ungraspable pose to a graspable pose. iii) Run a naive grasp detection network to verify the proposed methods and subsequently grasp the initially ungraspable object. By integrating these components, objects that were initially ungraspable, with a standard grasp detection model DexNet, remain no longer ungraspable.
Date Created
2024
Agent

Learning Temporally Composable Task Segmentations with Language

193572-Thumbnail Image.png
Description
Learning longer-horizon tasks is challenging with techniques such as reinforcement learning and behavior cloning. Previous approaches have split these long tasks into shorter tasks that are easier to learn by using statistical change point detection methods. However, classical changepoint detection

Learning longer-horizon tasks is challenging with techniques such as reinforcement learning and behavior cloning. Previous approaches have split these long tasks into shorter tasks that are easier to learn by using statistical change point detection methods. However, classical changepoint detection methods function only with low-dimensional robot trajectory data and not with high-dimensional inputs such as vision. In this thesis, I have split a long horizon tasks, represented by trajectories into short-horizon sub-tasks with the supervision of language. These shorter horizon tasks can be learned using conventional behavior cloning approaches. I found comparisons between the techniques from the video moment retrieval problem and changepoint detection in robot trajectory data consisting of high-dimensional data. The proposed moment retrieval-based approach shows a more than 30% improvement in mean average precision (mAP) for identifying trajectory sub-tasks with language guidance compared to that without language. Several ablations are performed to understand the effects of domain randomization, sample complexity, views, and sim-to-real transfer of this method. The data ablation shows that just with a 100 labeled trajectories a 42.01 mAP can be achieved, demonstrating the sample efficiency of using such an approach. Further, behavior cloning models trained on the segmented trajectories outperform a single model trained on the whole trajectory by up to 20%.
Date Created
2024
Agent

Multi Agent Bayesian Optimization

193570-Thumbnail Image.png
Description
Efficiently solving global optimization problems remains a pervasive challenge across diverse domains, characterized by complex, high-dimensional search spaces with non-convexity and noise. Most of the approaches in the Bayesian optimization literature have highlighted the computational complexity involved when scaling to

Efficiently solving global optimization problems remains a pervasive challenge across diverse domains, characterized by complex, high-dimensional search spaces with non-convexity and noise. Most of the approaches in the Bayesian optimization literature have highlighted the computational complexity involved when scaling to high dimensions. Non myopic approximations over a finite horizon has been adopted in recent years by modeling the problem as a partially observable Markov Decision Process (MDP). Another promising direction is the partitioning of the input domain into sub regions facilitating local modeling of the input space. This localized approach helps prioritize regions of interest, which is particularly crucial in high dimensions. However, very few literature exist which leverage agent based modeling of the problem domain along with the aforementioned methodologies. This work explores the synergistic integration of Bayesian Optimization and Reinforcement Learning by proposing a Multi Agent Rollout formulation of the global optimization problem. Multi Agent Bayesian Optimization (MABO) partitions the input domain among a finite set of agents enabling distributed modeling of the input space. In addition to selecting candidate samples from their respective sub regions, these agents also influence each other in partitioning the sub regions. Consequently, a portion of the function is optimized by these agents which prioritize candidate samples that don't undermine exploration in favor of a single step greedy exploitation. This work highlights the efficacy of the algorithm on a range of complex synthetic test functions.
Date Created
2024
Agent

AnyNMP: Generative Cross-Embodiment Neural Motion Planning

193564-Thumbnail Image.png
Description
Manipulator motion planning has conventionally been solved using sampling and optimization-based algorithms that are agnostic to embodiment and environment configurations. However, these algorithms plan on a fixed environment representation approximated using shape primitives, and hence struggle to find solutions for

Manipulator motion planning has conventionally been solved using sampling and optimization-based algorithms that are agnostic to embodiment and environment configurations. However, these algorithms plan on a fixed environment representation approximated using shape primitives, and hence struggle to find solutions for cluttered and dynamic environments. Furthermore, these algorithms fail to produce solutions for complex unstructured environments under real-time bounds. Neural Motion Planners (NMPs) are an appealing alternative to algorithmic approaches as they can leverage parallel computing for planning while incorporating arbitrary environmental constraints directly from raw sensor observations. Contemporary NMPs successfully transfer to different environment variations, however, fail to generalize across embodiments. This thesis proposes "AnyNMP'', a generalist motion planning policy for zero-shot transfer across different robotic manipulators and environments. The policy is conditioned on semantically segmented 3D pointcloud representation of the workspace thus enabling implicit sim2real transfer. In the proposed approach, templates are formulated for manipulator kinematics and ground truth motion plans are collected for over 3 million procedurally sampled robots in randomized environments. The planning pipeline consists of a state validation model for differentiable collision detection and a sampling based planner for motion generation. AnyNMP has been validated on 5 different commercially available manipulators and showcases successful cross-embodiment planning, achieving an 80% average success rate on baseline benchmarks.
Date Created
2024
Agent

Estimating Object Kinematic State Machines Via Human Demonstration

193542-Thumbnail Image.png
Description
As robots become increasingly integrated into the environments, they need to learn how to interact with the objects around them. Many of these objects are articulated with multiple degrees of freedom (DoF). Multi-DoF objects have complex joints that require specific

As robots become increasingly integrated into the environments, they need to learn how to interact with the objects around them. Many of these objects are articulated with multiple degrees of freedom (DoF). Multi-DoF objects have complex joints that require specific manipulation orders, but existing methods only consider objects with a single joint. To capture the joint structure and manipulation sequence of any object, I introduce the "Object Kinematic State Machines" (OKSMs), a novel representation that models the kinematic constraints and manipulation sequences of multi-DoF objects. I also present Pokenet, a deep neural network architecture that estimates the OKSMs from the sequence of point cloud data of human demonstrations. I conduct experiments on both simulated and real-world datasets to validate my approach. First, I evaluate the modeling of multi-DoF objects on a simulated dataset, comparing against the current state-of-the-art method. I then assess Pokenet's real-world usability on a dataset collected in my lab, comprising 5,500 data points across 4 objects. Results showcase that my method can successfully estimate joint parameters of novel multi-DoF objects with over 25% more accuracy on average than prior methods.
Date Created
2024
Agent

Addressing Efficiency and Reliability Challenges in Natural Language Processing

193413-Thumbnail Image.png
Description
Recently developed large language models have achieved remarkable success on a wide range of natural language tasks. Furthermore, they have been shown to possess an impressive ability to generate fluent and coherent text. Despite all the notable abilities of these

Recently developed large language models have achieved remarkable success on a wide range of natural language tasks. Furthermore, they have been shown to possess an impressive ability to generate fluent and coherent text. Despite all the notable abilities of these models, there exist several efficiency and reliability related challenges. For example, they are vulnerable to a phenomenon called 'hallucination' in which they generate text that is not factually correct and they also have a large number of parameters which makes their inference slow and computationally expensive. With the objective of taking a step closer towards further enabling the widespread adoption of the Natural Language Processing (NLP) systems, this dissertation studies the following question: how to effectively address the efficiency and reliability related concerns of the NLP systems? Specifically, to improve the reliability of models, this dissertation first presents an approach that actively detects and mitigates the hallucinations of LLMs using a retrieval augmented methodology. Note that another strategy to mitigate incorrect predictions is abstention from answering when error is likely, i.e., selective prediction. To this end, I present selective prediction approaches and conduct extensive experiments to demonstrate their effectiveness. Building on top of selective prediction, I also present post-abstention strategies that focus on reliably increasing the coverage of a selective prediction system without considerably impacting its accuracy. Furthermore, this dissertation covers multiple aspects of improving the efficiency including 'inference efficiency' (making model inferences in a computationally efficient manner without sacrificing the prediction accuracy), 'data sample efficiency' (efficiently collecting data instances for training a task-specific system), 'open-domain QA reader efficiency' (leveraging the external knowledge efficiently while answering open-domain questions), and 'evaluation efficiency' (comparing the performance of different models efficiently). In summary, this dissertation highlights several challenges pertinent to the efficiency and reliability involved in the development of NLP systems and provides effective solutions to address them.
Date Created
2024
Agent

Building And Evaluating A Skin-Like Sensor For Social Touch Gesture Classification

193343-Thumbnail Image.png
Description
Socially assistive robots (SARs) can act as assistants and caregivers, interacting and communicating with people through touch gestures. There has been ongoing research on using them as companion robots for children with autism as therapy assistants and playmates. Building touch-perception

Socially assistive robots (SARs) can act as assistants and caregivers, interacting and communicating with people through touch gestures. There has been ongoing research on using them as companion robots for children with autism as therapy assistants and playmates. Building touch-perception systems for social robots has been a challenge. The sensors must be designed to ensure comfortable and natural user interaction while recording high-quality data. The sensor must be able to detect touch gestures. Accurate touch gesture classification is challenging as different users perform the same touch gesture in their own unique way. This study aims to build and evaluate a skin-like sensor by replicating a recent paper introducing a novel silicone-based sensor design. A touch gesture classification is performed using deep-learning models to classify touch gestures accurately. This study focuses on 8 gestures: Fistbump, Hitting, Holding, Poking, Squeezing, Stroking, Tapping, and Tickling. They were chosen based on previous research where specialists determined which gestures were essential to detect while interacting with children with autism. In this work, a user study data collection was conducted with 20 adult subjects, using the skin-like sensor to record gesture data and a load cell underneath to record the force. Three different types of input were used for the touch gesture classification: skin-like sensor & load cell data, only skin-like sensor data, and only load cell data. A Convolutional Neural Network-Long Short Term Memory (CNN-LSTM) neural network architecture was developed for inputs with skin-like sensor data, and an LSTM network for only load cell data. This work achieved an average accuracy of 94% with skin-like sensor & load cell data, 95% for only skin-like sensor data, and 45% for only load cell data after a stratified 10-fold validation. This work also performed subject-dependent splitting and achieved accuracies of 69% skin-like sensor & load cell data, 66% for only skin-like sensor data, and 31% for only load cell data, respectively.
Date Created
2024
Agent

Accurate Speech to Text Program for Emergency Hospital Calls

Description
Speech to text models have become a very useful tool for hospitals. Hospitals can use automatic transcriptions to be able to reduce workload on doctors and clinicians since they do not have to manually record information. This automation can give

Speech to text models have become a very useful tool for hospitals. Hospitals can use automatic transcriptions to be able to reduce workload on doctors and clinicians since they do not have to manually record information. This automation can give them more time to meet with more patients and increase the efficiency of hospital work. However, an unexplored application of speech-to-text are emergency calls. The most common use for automated transcriptions are to document what doctors are doing and are given time to proofread for errors. This work focuses on the problem of transcriptions of emergency call data. Our work curates this emergency call data and models it as a medical transcription problem in hopes that the transcriptions can be used later for medical decision making. The heavy background noise and poor audio quality that comes with emergency radio are the reason this problem is challenging to solve. The results of this experiment show a modest increase to the accuracy of transcribing the emergency hospital recordings.
Date Created
2024-05
Agent