This collection includes both ASU Theses and Dissertations, submitted by graduate students, and the Barrett, Honors College theses submitted by undergraduate students. 

Displaying 1 - 10 of 11
Filtering by

Clear all filters

152234-Thumbnail Image.png
Description
One of the main challenges in planetary robotics is to traverse the shortest path through a set of waypoints. The shortest distance between any two waypoints is a direct linear traversal. Often times, there are physical restrictions that prevent a rover form traversing straight to a waypoint. Thus, knowledge of

One of the main challenges in planetary robotics is to traverse the shortest path through a set of waypoints. The shortest distance between any two waypoints is a direct linear traversal. Often times, there are physical restrictions that prevent a rover form traversing straight to a waypoint. Thus, knowledge of the terrain is needed prior to traversal. The Digital Terrain Model (DTM) provides information about the terrain along with waypoints for the rover to traverse. However, traversing a set of waypoints linearly is burdensome, as the rovers would constantly need to modify their orientation as they successively approach waypoints. Although there are various solutions to this problem, this research paper proposes the smooth traversability of the rover using splines as a quick and easy implementation to traverse a set of waypoints. In addition, a rover was used to compare the smoothness of the linear traversal along with the spline interpolations. The data collected illustrated that spline traversals had a less rate of change in the velocity over time, indicating that the rover performed smoother than with linear paths.
ContributorsKamasamudram, Anurag (Author) / Saripalli, Srikanth (Thesis advisor) / Fainekos, Georgios (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2013
152324-Thumbnail Image.png
Description
With robots being used extensively in various areas, a certain degree of robot autonomy has always been found desirable. In applications like planetary exploration, autonomous path planning and navigation are considered essential. But every now and then, a need to modify the robot's operation arises, a need for a human

With robots being used extensively in various areas, a certain degree of robot autonomy has always been found desirable. In applications like planetary exploration, autonomous path planning and navigation are considered essential. But every now and then, a need to modify the robot's operation arises, a need for a human to provide it some supervisory parameters that modify the degree of autonomy or allocate extra tasks to the robot. In this regard, this thesis presents an approach to include a provision to accept and incorporate such human inputs and modify the navigation functions of the robot accordingly. Concepts such as applying kinematical constraints while planning paths, traversing of unknown areas with an intent of maximizing field of view, performing complex tasks on command etc. have been examined and implemented. The approaches have been tested in Robot Operating System (ROS), using robots such as the iRobot Create, Personal Robotics (PR2) etc. Simulations and experimental demonstrations have proved that this approach is feasible for solving some of the existing problems and that it certainly can pave way to further research for enhancing functionality.
ContributorsVemprala, Sai Hemachandra (Author) / Saripalli, Srikanth (Thesis advisor) / Fainekos, Georgios (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2013
152840-Thumbnail Image.png
Description
Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many

Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many of such sparse learning methods focus on designing or application of some learning techniques for certain feature space without much explicit consideration on possible interaction between the underlying semantics of the visual data and the employed learning technique. Rich semantic information in most visual data, if properly incorporated into algorithm design, should help achieving improved performance while delivering intuitive interpretation of the algorithmic outcomes. My study addresses the problem of how to explicitly consider the semantic information of the visual data in the sparse learning algorithms. In this work, we identify four problems which are of great importance and broad interest to the community. Specifically, a novel approach is proposed to incorporate label information to learn a dictionary which is not only reconstructive but also discriminative; considering the formation process of face images, a novel image decomposition approach for an ensemble of correlated images is proposed, where a subspace is built from the decomposition and applied to face recognition; based on the observation that, the foreground (or salient) objects are sparse in input domain and the background is sparse in frequency domain, a novel and efficient spatio-temporal saliency detection algorithm is proposed to identify the salient regions in video; and a novel hidden Markov model learning approach is proposed by utilizing a sparse set of pairwise comparisons among the data, which is easier to obtain and more meaningful, consistent than tradition labels, in many scenarios, e.g., evaluating motion skills in surgical simulations. In those four problems, different types of semantic information are modeled and incorporated in designing sparse learning algorithms for the corresponding visual computing tasks. Several real world applications are selected to demonstrate the effectiveness of the proposed methods, including, face recognition, spatio-temporal saliency detection, abnormality detection, spatio-temporal interest point detection, motion analysis and emotion recognition. In those applications, data of different modalities are involved, ranging from audio signal, image to video. Experiments on large scale real world data with comparisons to state-of-art methods confirm the proposed approaches deliver salient advantages, showing adding those semantic information dramatically improve the performances of the general sparse learning methods.
ContributorsZhang, Qiang (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Yalin (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2014
153270-Thumbnail Image.png
Description
Fisheye cameras are special cameras that have a much larger field of view compared to

conventional cameras. The large field of view comes at a price of non-linear distortions

introduced near the boundaries of the images captured by such cameras. Despite this

drawback, they are being used increasingly in many applications of computer

Fisheye cameras are special cameras that have a much larger field of view compared to

conventional cameras. The large field of view comes at a price of non-linear distortions

introduced near the boundaries of the images captured by such cameras. Despite this

drawback, they are being used increasingly in many applications of computer vision,

robotics, reconnaissance, astrophotography, surveillance and automotive applications.

The images captured from such cameras can be corrected for their distortion if the

cameras are calibrated and the distortion function is determined. Calibration also allows

fisheye cameras to be used in tasks involving metric scene measurement, metric

scene reconstruction and other simultaneous localization and mapping (SLAM) algorithms.

This thesis presents a calibration toolbox (FisheyeCDC Toolbox) that implements a collection of some of the most widely used techniques for calibration of fisheye cameras under one package. This enables an inexperienced user to calibrate his/her own camera without the need for a theoretical understanding about computer vision and camera calibration. This thesis also explores some of the applications of calibration such as distortion correction and 3D reconstruction.
ContributorsKashyap Takmul Purushothama Raju, Vinay (Author) / Karam, Lina (Thesis advisor) / Turaga, Pavan (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)
Created2014
150353-Thumbnail Image.png
Description
Advancements in computer vision and machine learning have added a new dimension to remote sensing applications with the aid of imagery analysis techniques. Applications such as autonomous navigation and terrain classification which make use of image classification techniques are challenging problems and research is still being carried out to find

Advancements in computer vision and machine learning have added a new dimension to remote sensing applications with the aid of imagery analysis techniques. Applications such as autonomous navigation and terrain classification which make use of image classification techniques are challenging problems and research is still being carried out to find better solutions. In this thesis, a novel method is proposed which uses image registration techniques to provide better image classification. This method reduces the error rate of classification by performing image registration of the images with the previously obtained images before performing classification. The motivation behind this is the fact that images that are obtained in the same region which need to be classified will not differ significantly in characteristics. Hence, registration will provide an image that matches closer to the previously obtained image, thus providing better classification. To illustrate that the proposed method works, naïve Bayes and iterative closest point (ICP) algorithms are used for the image classification and registration stages respectively. This implementation was tested extensively in simulation using synthetic images and using a real life data set called the Defense Advanced Research Project Agency (DARPA) Learning Applied to Ground Robots (LAGR) dataset. The results show that the ICP algorithm does help in better classification with Naïve Bayes by reducing the error rate by an average of about 10% in the synthetic data and by about 7% on the actual datasets used.
ContributorsMuralidhar, Ashwini (Author) / Saripalli, Srikanth (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2011
154464-Thumbnail Image.png
Description
The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized

The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized in multiple visual computing tasks to bridge the so-called "semantic gap" between extractable low-level feature representations and high-level semantic understanding of the visual objects.

Despite years of research, there are still some unsolved problems on semantic attribute learning. First, real-world applications usually involve hundreds of attributes which requires great effort to acquire sufficient amount of labeled data for model learning. Second, existing attribute learning work for visual objects focuses primarily on images, with semantic analysis on videos left largely unexplored.

In this dissertation I conduct innovative research and propose novel approaches to tackling the aforementioned problems. In particular, I propose robust and accurate learning frameworks on both attribute ranking and prediction by exploring the correlation among multiple attributes and utilizing various types of label information. Furthermore, I propose a video-based skill coaching framework by extending attribute learning to the video domain for robust motion skill analysis. Experiments on various types of applications and datasets and comparisons with multiple state-of-the-art baseline approaches confirm that my proposed approaches can achieve significant performance improvements for the general attribute learning problem.
ContributorsChen, Lin (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Yalin (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2016
155059-Thumbnail Image.png
Description
The tradition of building musical robots and automata is thousands of years old. Despite this rich history, even today musical robots do not play with as much nuance and subtlety as human musicians. In particular, most instruments allow the player to manipulate timbre while playing; if a violinist is told

The tradition of building musical robots and automata is thousands of years old. Despite this rich history, even today musical robots do not play with as much nuance and subtlety as human musicians. In particular, most instruments allow the player to manipulate timbre while playing; if a violinist is told to sustain an E, they will select which string to play it on, how much bow pressure and velocity to use, whether to use the entire bow or only the portion near the tip or the frog, how close to the bridge or fingerboard to contact the string, whether or not to use a mute, and so forth. Each one of these choices affects the resulting timbre, and navigating this timbre space is part of the art of playing the instrument. Nonetheless, this type of timbral nuance has been largely ignored in the design of musical robots. Therefore, this dissertation introduces a suite of techniques that deal with timbral nuance in musical robots. Chapter 1 provides the motivating ideas and introduces Kiki, a robot designed by the author to explore timbral nuance. Chapter 2 provides a long history of musical robots, establishing the under-researched nature of timbral nuance. Chapter 3 is a comprehensive treatment of dynamic timbre production in percussion robots and, using Kiki as a case-study, provides a variety of techniques for designing striking mechanisms that produce a range of timbres similar to those produced by human players. Chapter 4 introduces a machine-learning algorithm for recognizing timbres, so that a robot can transcribe timbres played by a human during live performance. Chapter 5 introduces a technique that allows a robot to learn how to produce isolated instances of particular timbres by listening to a human play an examples of those timbres. The 6th and final chapter introduces a method that allows a robot to learn the musical context of different timbres; this is done in realtime during interactive improvisation between a human and robot, wherein the robot builds a statistical model of which timbres the human plays in which contexts, and uses this to inform its own playing.
ContributorsKrzyzaniak, Michael Joseph (Author) / Coleman, Grisha (Thesis advisor) / Turaga, Pavan (Committee member) / Artemiadis, Panagiotis (Committee member) / Arizona State University (Publisher)
Created2016
155083-Thumbnail Image.png
Description
Multi-sensor fusion is a fundamental problem in Robot Perception. For a robot to operate in a real world environment, multiple sensors are often needed. Thus, fusing data from various sensors accurately is vital for robot perception. In the first part of this thesis, the problem of fusing information from a

Multi-sensor fusion is a fundamental problem in Robot Perception. For a robot to operate in a real world environment, multiple sensors are often needed. Thus, fusing data from various sensors accurately is vital for robot perception. In the first part of this thesis, the problem of fusing information from a LIDAR, a color camera and a thermal camera to build RGB-Depth-Thermal (RGBDT) maps is investigated. An algorithm that solves a non-linear optimization problem to compute the relative pose between the cameras and the LIDAR is presented. The relative pose estimate is then used to find the color and thermal texture of each LIDAR point. Next, the various sources of error that can cause the mis-coloring of a LIDAR point after the cross- calibration are identified. Theoretical analyses of these errors reveal that the coloring errors due to noisy LIDAR points, errors in the estimation of the camera matrix, and errors in the estimation of translation between the sensors disappear with distance. But errors in the estimation of the rotation between the sensors causes the coloring error to increase with distance.

On a robot (vehicle) with multiple sensors, sensor fusion algorithms allow us to represent the data in the vehicle frame. But data acquired temporally in the vehicle frame needs to be registered in a global frame to obtain a map of the environment. Mapping techniques involving the Iterative Closest Point (ICP) algorithm and the Normal Distributions Transform (NDT) assume that a good initial estimate of the transformation between the 3D scans is available. This restricts the ability to stitch maps that were acquired at different times. Mapping can become flexible if maps that were acquired temporally can be merged later. To this end, the second part of this thesis focuses on developing an automated algorithm that fuses two maps by finding a congruent set of five points forming a pyramid.

Mapping has various application domains beyond Robot Navigation. The third part of this thesis considers a unique application domain where the surface displace- ments caused by an earthquake are to be recovered using pre- and post-earthquake LIDAR data. A technique to recover the 3D surface displacements is developed and the results are presented on real earthquake datasets: El Mayur Cucupa earthquake, Mexico, 2010 and Fukushima earthquake, Japan, 2011.
ContributorsKrishnan, Aravindhan K (Author) / Saripalli, Srikanth (Thesis advisor) / Klesh, Andrew (Committee member) / Fainekos, Georgios (Committee member) / Thangavelautham, Jekan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2016
155085-Thumbnail Image.png
Description
High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such

High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representations for static images, researchers have been using similar techniques to tackle video contents. Typical techniques first extract spatial features by processing raw images using deep convolution architectures designed for static image classifications. Then simple average, concatenation or classifier-based fusion/pooling methods are applied to the extracted features. I argue that features extracted in such ways do not acquire enough representative information since videos, unlike images, should be characterized as a temporal sequence of semantically coherent visual contents and thus need to be represented in a manner considering both semantic and spatio-temporal information.

In this thesis, I propose a novel architecture to learn semantic spatio-temporal embedding for videos to support high-level video analysis. The proposed method encodes video spatial and temporal information separately by employing a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Fully Connected Gated Recurrent Unit (FC-GRU) encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a Fully Connected Multilayer Perceptron (FC-MLP) to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. I evaluate the usefulness and effectiveness of this new video representation by conducting experiments on action recognition, zero-shot video classification, and semantic video retrieval (word-to-video) retrieval, using the UCF101 action recognition dataset.
ContributorsHu, Sheng-Hung (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Liang, Jianming (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2016
187693-Thumbnail Image.png
Description
Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world.

Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world. Advanced information is necessary to address these limitations. Fortunately, recent developments in learning-based 3D reconstruction allow robots to not only detect semantic meanings, but also recognize the 3D structure of objects from a few images. By combining this 3D structural information, SLAM can be improved from a low-level approach to a structure-aware approach. This work propose a novel approach for multi-view 3D reconstruction using recurrent transformer. This approach allows robots to accumulate information from multiple views and encode them into a compact latent space. The resulting latent representations are then decoded to produce 3D structural landmarks, which can be used to improve robot localization and mapping.
ContributorsHuang, Chi-Yao (Author) / Yang, Yezhou (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2023