Matching Items (44)
Filtering by

Clear all filters

193581-Thumbnail Image.png
Description
Grasping objects in a general household setting is a dexterous task, high compliance is needed to generate a grasp that leads to grasp closure. Standard 6 Degree of Freedom (DoF) manipulators with parallel grippers are naturally incapable of showing such dexterity. This renders many objects in household settings difficult

Grasping objects in a general household setting is a dexterous task, high compliance is needed to generate a grasp that leads to grasp closure. Standard 6 Degree of Freedom (DoF) manipulators with parallel grippers are naturally incapable of showing such dexterity. This renders many objects in household settings difficult to grasp, as the manipulator cannot access readily available antipodal (planar) grasps. In such scenarios, one must either use a high DoF end effector to learn this compliance or change the initial configuration of the object to find an antipodal grasp. A pipeline that uses the extrinsic forces present in the environment to make up for this lack of compliance is proposed. The proposed method: i) Takes the point cloud input from the environment, and creates a search space with all its available poses. This search space is used to identify the best graspable position for an object with a grasp score network ii) Learn how to approach an object, and generate an appropriate set of motor primitives that converts the current ungraspable pose to a graspable pose. iii) Run a naive grasp detection network to verify the proposed methods and subsequently grasp the initially ungraspable object. By integrating these components, objects that were initially ungraspable, with a standard grasp detection model DexNet, remain no longer ungraspable.
ContributorsSah, Anant (Author) / Gopalan, Nakul (Thesis advisor) / Zhang, Wenlong (Committee member) / Senanayake, Ransalu (Committee member) / Arizona State University (Publisher)
Created2024
156802-Thumbnail Image.png
Description
Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees

Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees of quality, and the quality can be defined with regard to aesthetic, athletic, or health-related ratings. It is useful to measure and track the quality of a person's movements, for various applications, especially with the prevalence of low-cost and portable cameras and sensors today. Furthermore, based on such measurements, feedback systems can be designed for people to practice their movements towards certain goals. In this dissertation, I introduce symmetry as a family of measures for movement quality, and utilize recent advances in computer vision and differential geometry to model and analyze different types of symmetry in human movements. Movements are modeled as trajectories on different types of manifolds, according to the representations of movements from sensor data. The benefit of such a universal framework is that it can accommodate different existing and future features that describe human movements. The theory and tools developed in this dissertation will also be useful in other scientific areas to analyze symmetry from high-dimensional signals.
ContributorsWang, Qiao (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Srivastava, Anuj (Committee member) / Sha, Xin Wei (Committee member) / Arizona State University (Publisher)
Created2018
Description
Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.
ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2018
157633-Thumbnail Image.png
Description
The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid

The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid sparse patterns used by previous work. This work focuses on using these feature-based sparse patterns to generate additional depth information by interpolating regions between clusters of samples that are in close proximity to each other. These interpolated sparse depths are used to enforce additional constraints on the network’s predictions. In addition to the improved depth prediction performance observed from incorporating the sparse sample information in the network compared to pure RGB-based methods, the experiments show that actively retraining a network on a small number of samples that deviate most from the interpolated sparse depths leads to better depth prediction overall.

This thesis also introduces a new metric, titled Edge, to quantify model performance in regions of an image that show the highest change in ground truth depth values along either the x-axis or the y-axis. Existing metrics in depth estimation like Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) quantify model performance across the entire image and don’t focus on specific regions of an image that are hard to predict. To this end, the proposed Edge metric focuses specifically on these hard to classify regions. The experiments also show that using the Edge metric as a small addition to existing loss functions like L1 loss in current state-of-the-art methods leads to vastly improved performance in these hard to classify regions, while also improving performance across the board in every other metric.
ContributorsRai, Anshul (Author) / Yang, Yezhou (Thesis advisor) / Zhang, Wenlong (Committee member) / Liang, Jianming (Committee member) / Arizona State University (Publisher)
Created2019
157645-Thumbnail Image.png
Description
Disentangling latent spaces is an important research direction in the interpretability of unsupervised machine learning. Several recent works using deep learning are very effective at producing disentangled representations. However, in the unsupervised setting, there is no way to pre-specify which part of the latent space captures specific factors of

Disentangling latent spaces is an important research direction in the interpretability of unsupervised machine learning. Several recent works using deep learning are very effective at producing disentangled representations. However, in the unsupervised setting, there is no way to pre-specify which part of the latent space captures specific factors of variations. While this is generally a hard problem because of the non-existence of analytical expressions to capture these variations, there are certain factors like geometric

transforms that can be expressed analytically. Furthermore, in existing frameworks, the disentangled values are also not interpretable. The focus of this work is to disentangle these geometric factors of variations (which turn out to be nuisance factors for many applications) from the semantic content of the signal in an interpretable manner which in turn makes the features more discriminative. Experiments are designed to show the modularity of the approach with other disentangling strategies as well as on multiple one-dimensional (1D) and two-dimensional (2D) datasets, clearly indicating the efficacy of the proposed approach.
ContributorsKoneripalli Seetharam, Kaushik (Author) / Turaga, Pavan (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019
154630-Thumbnail Image.png
Description
There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment

There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment finds wide range of application in motor control, health-care, rehabilitation and physical therapy. Home-based interactive physical therapy requires the ability to monitor, inform and assess the quality of everyday movements. Obtaining labeled data from trained therapists/experts is the main limitation, since it is both expensive and time consuming.

Motivated by recent studies in motor control and therapy, in this thesis an existing computational framework is used to assess balance impairment and disease severity in people suffering from Parkinson's disease. The framework uses high-dimensional shape descriptors of the reconstructed phase space, of the subjects' center of pressure (CoP) tracings while performing dynamical postural shifts. The performance of the framework is evaluated using a dataset collected from 43 healthy and 17 Parkinson's disease impaired subjects, and outperforms other methods, such as dynamical shift indices and use of chaotic invariants, in assessment of balance impairment.

In this thesis, an unsupervised method is also proposed that measures movement quality assessment of simple actions like sit-to-stand and dynamic posture shifts by modeling the deviation of a given movement from an ideal movement path in the configuration space, i.e. the quality of movement is directly related to similarity to the ideal trajectory, between the start and end pose. The S^1xS^1 configuration space was used to model the interaction of two joint angles in sit-to-stand actions, and the R^2 space was used to model the subject's CoP while performing dynamic posture shifts for application in movement quality estimation.
ContributorsSom, Anirudh (Author) / Turaga, Pavan (Thesis advisor) / Krishnamurthi, Narayanan (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2016
154633-Thumbnail Image.png
Description
This thesis aims to explore the language of different bodies in the field of dance by analyzing

the habitual patterns of dancers from different backgrounds and vernaculars. Contextually,

the term habitual patterns is defined as the postures or poses that tend to re-appear,

often unintentionally, as the dancer performs improvisational dance. The focus

This thesis aims to explore the language of different bodies in the field of dance by analyzing

the habitual patterns of dancers from different backgrounds and vernaculars. Contextually,

the term habitual patterns is defined as the postures or poses that tend to re-appear,

often unintentionally, as the dancer performs improvisational dance. The focus lies in exposing

the movement vocabulary of a dancer to reveal his/her unique fingerprint.

The proposed approach for uncovering these movement patterns is to use a clustering

technique; mainly k-means. In addition to a static method of analysis, this paper uses

an online method of clustering using a streaming variant of k-means that integrates into

the flow of components that can be used in a real-time interactive dance performance. The

computational system is trained by the dancer to discover identifying patterns and therefore

it enables a feedback loop resulting in a rich exchange between dancer and machine. This

can help break a dancer’s tendency to create similar postures, explore larger kinespheric

space and invent movement beyond their current capabilities.

This paper describes a project that distinguishes itself in that it uses a custom database

that is curated for the purpose of highlighting the similarities and differences between various

movement forms. It puts particular emphasis on the process of choosing source movement

qualitatively, before the technological capture process begins.
ContributorsIyengar, Varsha (Author) / Xin Wei, Sha (Thesis advisor) / Turaga, Pavan (Committee member) / Coleman, Grisha (Committee member) / Arizona State University (Publisher)
Created2016
154464-Thumbnail Image.png
Description
The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized

The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized in multiple visual computing tasks to bridge the so-called "semantic gap" between extractable low-level feature representations and high-level semantic understanding of the visual objects.

Despite years of research, there are still some unsolved problems on semantic attribute learning. First, real-world applications usually involve hundreds of attributes which requires great effort to acquire sufficient amount of labeled data for model learning. Second, existing attribute learning work for visual objects focuses primarily on images, with semantic analysis on videos left largely unexplored.

In this dissertation I conduct innovative research and propose novel approaches to tackling the aforementioned problems. In particular, I propose robust and accurate learning frameworks on both attribute ranking and prediction by exploring the correlation among multiple attributes and utilizing various types of label information. Furthermore, I propose a video-based skill coaching framework by extending attribute learning to the video domain for robust motion skill analysis. Experiments on various types of applications and datasets and comparisons with multiple state-of-the-art baseline approaches confirm that my proposed approaches can achieve significant performance improvements for the general attribute learning problem.
ContributorsChen, Lin (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Wang, Yalin (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2016
154471-Thumbnail Image.png
Description
The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.
ContributorsAnirudh, Rushil (Author) / Turaga, Pavan (Thesis advisor) / Cochran, Douglas (Committee member) / Runger, George C. (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)
Created2016
154384-Thumbnail Image.png
Description
Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional

Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional approaches to dynamical modeling have included linear and nonlinear methods with their respective drawbacks. An alternative idea I propose is the use of descriptors of the shape of the dynamical attractor as a feature representation for quantification of nature of dynamics. The framework has two main advantages over traditional approaches: a) representation of the dynamical system is derived directly from the observational data, without any inherent assumptions, and b) the proposed features show stability under different time-series lengths where traditional dynamical invariants fail.

Approximately 1\% of the total world population are stroke survivors, making it the most common neurological disorder. This increasing demand for rehabilitation facilities has been seen as a significant healthcare problem worldwide. The laborious and expensive process of visual monitoring by physical therapists has motivated my research to invent novel strategies to supplement therapy received in hospital in a home-setting. In this direction, I propose a general framework for tuning component-level kinematic features using therapists’ overall impressions of movement quality, in the context of a Home-based Adaptive Mixed Reality Rehabilitation (HAMRR) system.

The rapid technological advancements in computing and sensing has resulted in large amounts of data which requires powerful tools to analyze. In the recent past, topological data analysis methods have been investigated in various communities, and the work by Carlsson establishes that persistent homology can be used as a powerful topological data analysis approach for effectively analyzing large datasets. I have explored suitable topological data analysis methods and propose a framework for human activity analysis utilizing the same for applications such as action recognition.
ContributorsVenkataraman, Vinay (Author) / Turaga, Pavan (Thesis advisor) / Papandreou-Suppappol, Antonia (Committee member) / Krishnamurthi, Narayanan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2016