Search Content

Monocular depth estimation with edge-based constraints and active learning

Description

The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid…

The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid sparse patterns used by previous work. This work focuses on using these feature-based sparse patterns to generate additional depth information by interpolating regions between clusters of samples that are in close proximity to each other. These interpolated sparse depths are used to enforce additional constraints on the network’s predictions. In addition to the improved depth prediction performance observed from incorporating the sparse sample information in the network compared to pure RGB-based methods, the experiments show that actively retraining a network on a small number of samples that deviate most from the interpolated sparse depths leads to better depth prediction overall.

This thesis also introduces a new metric, titled Edge, to quantify model performance in regions of an image that show the highest change in ground truth depth values along either the x-axis or the y-axis. Existing metrics in depth estimation like Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) quantify model performance across the entire image and don’t focus on specific regions of an image that are hard to predict. To this end, the proposed Edge metric focuses specifically on these hard to classify regions. The experiments also show that using the Edge metric as a small addition to existing loss functions like L1 loss in current state-of-the-art methods leads to vastly improved performance in these hard to classify regions, while also improving performance across the board in every other metric.

ContributorsRai, Anshul (Author) / Yang, Yezhou (Thesis advisor) / Zhang, Wenlong (Committee member) / Liang, Jianming (Committee member) / Arizona State University (Publisher)

Created2019

Language Conditioned Self-Driving Cars Using Environmental Object Descriptions For Controlling Cars

Description

Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more…

Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more safe and efficient AI agents that can navigate through our cities. However, driving is a very complex task to master even for a human, let alone the challenges in developing robots to do the same. It requires attention and inputs from the surroundings of the car, and it is nearly impossible for us to program all the possible factors affecting this complex task. As a solution, imitation learning was introduced, wherein the agents learn a policy, mapping the observations to the actions through demonstrations given by humans. Through imitation learning, one could easily teach self-driving cars the expected behavior in many scenarios. Despite their autonomous nature, it is undeniable that humans play a vital role in the development and execution of safe and trustworthy self-driving cars and hence form the strongest link in this application of Human-Robot Interaction. Several approaches were taken to incorporate this link between humans and self-driving cars, one of which involves the communication of human's navigational instruction to self-driving cars. The communicative channel provides humans with control over the agent’s decisions as well as the ability to guide them in real-time. In this work, the abilities of imitation learning in creating a self-driving agent that can follow natural language instructions given by humans based on environmental objects’ descriptions were explored. The proposed model architecture is capable of handling latent temporal context in these instructions thus making the agent capable of taking multiple decisions along its course. The work shows promising results that push the boundaries of natural language instructions and their complexities in navigating self-driving cars through towns.

ContributorsMoudhgalya, Nithish B (Author) / Amor, Hani Ben (Thesis advisor) / Baral, Chitta (Committee member) / Yang, Yezhou (Committee member) / Zhang, Wenlong (Committee member) / Arizona State University (Publisher)

Created2021

Game-theoretic Empathetic Parameter Estimation in Two-Vehicle Interaction

Description

Recent years, there has been many attempts with different approaches to the human-robot interaction (HRI) problems. In this paper, the multi-agent interaction is formulated as a differential game with incomplete information. To tackle this problem, the parameter estimation method is utilized to obtain the approximated solution in a real time…

Recent years, there has been many attempts with different approaches to the human-robot interaction (HRI) problems. In this paper, the multi-agent interaction is formulated as a differential game with incomplete information. To tackle this problem, the parameter estimation method is utilized to obtain the approximated solution in a real time basis. Previous studies in the parameter estimation made the assumption that the human parameters are known by the robot; but such may not be the case and there exists uncertainty in the modeling of the human rewards as well as human's modeling of the robot's rewards. The proposed method, empathetic estimation, is tested and compared with the ``non-empathetic'' estimation from the existing works. The case studies are conducted in an uncontrolled intersection with two agents attempting to pass efficiently. Results have shown that in the case of both agents having inconsistent belief of the other agent's parameters, the empathetic agent performs better at estimating the parameters and has higher reward values, which indicates the scenarios when empathy is essential: when agent's initial belief is mismatched from the true parameters/intent of the agents.

ContributorsChen, Yi (Author) / Ren, Yi (Thesis advisor) / Zhang, Wenlong (Committee member) / Yong, Sze Zheng (Committee member) / Arizona State University (Publisher)

Created2021

Computationally Efficient Object Detection Strategy from Water Surfaces with Specularity Removal

Description

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these…

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these trash is a hazardous and time-consuming task. Employing autonomous robots for these tasks is a better approach since it is more efficient and faster than humans. However, for a robot to clean the trash objects, a good detection algorithm is required. Real-time object detection on water surfaces is a challenging issue due to nature of the environment and the volatility of the water surface. In addition to this, running an object detection algorithm on an on-board processor of a robot limits the amount of CPU consumption that the algorithm can utilize. In this thesis, a computationally low cost object detection approach for robust detection of trash objects that was run on an on-board processor of a multirotor is presented. To account for specular reflections on the water surface, we use a polarization filter and integrate a specularity removal algorithm on our approach as well. The challenges faced during testing and the means taken to eliminate those challenges are also discussed. The algorithm was compared with two other object detectors using 4 different metrics. The testing was carried out using videos of 5 different objects collected at different illumination conditions over a lake using a multirotor. The results indicate that our algorithm is much suitable to be employed in real-time since it had the highest processing speed of 21 FPS, the lowest CPU consumption of 37.5\% and considerably high precision and recall values in detecting the object.

ContributorsSyed, Danish Faraaz (Author) / Zhang, Wenlong (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2021

A Framework to Allow Unmanned Aerial Vehicles to Make Good Collisions

Description

The field of unmanned aerial vehicle, or UAV, navigation has been moving towards collision inclusive path planning, yet work has not been done to consider what a UAV is colliding with, and if it should or not. Therefore, there is a need for a framework that allows a UAV to…

The field of unmanned aerial vehicle, or UAV, navigation has been moving towards collision inclusive path planning, yet work has not been done to consider what a UAV is colliding with, and if it should or not. Therefore, there is a need for a framework that allows a UAV to consider what is around it and find the best collision candidate. The following work presents a framework that allows UAVs to do so, by considering what an object is and the properties associated with it. Specifically, it considers an object’s material and monetary value to decide if it is good to collide with or not. This information is then published on a binary occupancy map that contains the objects’ size and location with respect to the current position of the UAV. The intent is that the generated binary occupancy map can be used with a path planner to decide what the UAV should collide with. The framework was designed to be as modular as possible and to work with conventional UAV's that have some degree of crash resistance incorporated into their design. The framework was tested by using it to identify various objects that could be collision candidates or not, and then carrying out collisions with some of the objects to test the framework’s accuracy. The purpose of this research was to further the field of collision inclusive path planning by allowing UAVs to know, in a way, what they are intending to collide with and decide if they should or not in order to make safer and more efficient collisions.

ContributorsMolnar, Madelyn Helena (Author) / Zhang, Wenlong (Thesis advisor) / Sugar, Thomas (Committee member) / Guo, Shenghan (Committee member) / Arizona State University (Publisher)

Created2024

Learning to Grasp Using the Extrinsic Property of the Environment

Description

Grasping objects in a general household setting is a dexterous task, high compliance is needed to generate a grasp that leads to grasp closure. Standard 6 Degree of Freedom (DoF) manipulators with parallel grippers are naturally incapable of showing such dexterity. This renders many objects in household settings difficult…

Grasping objects in a general household setting is a dexterous task, high compliance is needed to generate a grasp that leads to grasp closure. Standard 6 Degree of Freedom (DoF) manipulators with parallel grippers are naturally incapable of showing such dexterity. This renders many objects in household settings difficult to grasp, as the manipulator cannot access readily available antipodal (planar) grasps. In such scenarios, one must either use a high DoF end effector to learn this compliance or change the initial configuration of the object to find an antipodal grasp. A pipeline that uses the extrinsic forces present in the environment to make up for this lack of compliance is proposed. The proposed method: i) Takes the point cloud input from the environment, and creates a search space with all its available poses. This search space is used to identify the best graspable position for an object with a grasp score network ii) Learn how to approach an object, and generate an appropriate set of motor primitives that converts the current ungraspable pose to a graspable pose. iii) Run a naive grasp detection network to verify the proposed methods and subsequently grasp the initially ungraspable object. By integrating these components, objects that were initially ungraspable, with a standard grasp detection model DexNet, remain no longer ungraspable.

ContributorsSah, Anant (Author) / Gopalan, Nakul (Thesis advisor) / Zhang, Wenlong (Committee member) / Senanayake, Ransalu (Committee member) / Arizona State University (Publisher)

Created2024

ASU Electronic Theses and Dissertations

Filtering by

Monocular depth estimation with edge-based constraints and active learning

Language Conditioned Self-Driving Cars Using Environmental Object Descriptions For Controlling Cars

Game-theoretic Empathetic Parameter Estimation in Two-Vehicle Interaction

Computationally Efficient Object Detection Strategy from Water Surfaces with Specularity Removal

A Framework to Allow Unmanned Aerial Vehicles to Make Good Collisions

Learning to Grasp Using the Extrinsic Property of the Environment