Matching Items (21)
Filtering by

Clear all filters

151373-Thumbnail Image.png
Description
In this thesis, quantitative evaluation of quality of movement during stroke rehabilitation will be discussed. Previous research on stroke rehabilitation in hospital has been shown to be effective. In this thesis, we study various issues that arise when creating a home-based system that can be deployed in a patient's home.

In this thesis, quantitative evaluation of quality of movement during stroke rehabilitation will be discussed. Previous research on stroke rehabilitation in hospital has been shown to be effective. In this thesis, we study various issues that arise when creating a home-based system that can be deployed in a patient's home. Limitation of motion capture due to reduced number of sensors leads to problems with design of kinematic features for quantitative evaluation. Also, the hierarchical three-level tasks of rehabilitation requires new design of kinematic features. In this thesis, the design of kinematic features for a home based stroke rehabilitation system will be presented. Results of the most challenging classifier are shown and proves the effectiveness of the design. Comparison between modern classification techniques and low computational cost threshold based classification with same features will also be shown.
ContributorsCheng, Long (Author) / Turaga, Pavan (Thesis advisor) / Arizona State University (Publisher)
Created2012
151383-Thumbnail Image.png
Description
Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera -

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera - Kinect is presented. We address this problem by first conducting a systematic analysis of the usability of Kinect for motion analysis in stroke rehabilitation. Then a hybrid upper body tracking approach is proposed which combines off-the-shelf skeleton tracking with a novel depth-fused mean shift tracking method. We proposed several kinematic features reliably extracted from the proposed inexpensive and portable motion capture system and classifiers that correlate torso movement to clinical measures of unimpaired and impaired. Experiment results show that the proposed sensing and analysis works reliably on measuring torso movement quality and is promising for end-point tracking. The system is currently being deployed for large-scale evaluations.
ContributorsDu, Tingfang (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Rikakis, Thanassis (Committee member) / Arizona State University (Publisher)
Created2012
Description
As the application of interactive media systems expands to address broader problems in health, education and creative practice, they fall within a higher dimensional space for which it is inherently more complex to design. In response to this need an emerging area of interactive system design, referred to as experiential

As the application of interactive media systems expands to address broader problems in health, education and creative practice, they fall within a higher dimensional space for which it is inherently more complex to design. In response to this need an emerging area of interactive system design, referred to as experiential media systems, applies hybrid knowledge synthesized across multiple disciplines to address challenges relevant to daily experience. Interactive neurorehabilitation (INR) aims to enhance functional movement therapy by integrating detailed motion capture with interactive feedback in a manner that facilitates engagement and sensorimotor learning for those who have suffered neurologic injury. While INR shows great promise to advance the current state of therapies, a cohesive media design methodology for INR is missing due to the present lack of substantial evidence within the field. Using an experiential media based approach to draw knowledge from external disciplines, this dissertation proposes a compositional framework for authoring visual media for INR systems across contexts and applications within upper extremity stroke rehabilitation. The compositional framework is applied across systems for supervised training, unsupervised training, and assisted reflection, which reflect the collective work of the Adaptive Mixed Reality Rehabilitation (AMRR) Team at Arizona State University, of which the author is a member. Formal structures and a methodology for applying them are described in detail for the visual media environments designed by the author. Data collected from studies conducted by the AMRR team to evaluate these systems in both supervised and unsupervised training contexts is also discussed in terms of the extent to which the application of the compositional framework is supported and which aspects require further investigation. The potential broader implications of the proposed compositional framework and methodology are the dissemination of interdisciplinary information to accelerate the informed development of INR applications and to demonstrate the potential benefit of generalizing integrative approaches, merging arts and science based knowledge, for other complex problems related to embodied learning.
ContributorsLehrer, Nicole (Author) / Rikakis, Thanassis (Committee member) / Olson, Loren (Committee member) / Wolf, Steven L. (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2014
153394-Thumbnail Image.png
Description
As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a

As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a conventional camera, into a single step. A popular variant is the single-pixel camera that obtains measurements of the scene using a pseudo-random measurement matrix. Advances in compressive sensing (CS) theory in the past decade have supplied the tools that, in theory, allow near-perfect reconstruction of an image from these measurements even for sub-Nyquist sampling rates. However, current state-of-the-art reconstruction algorithms suffer from two drawbacks -- They are (1) computationally very expensive and (2) incapable of yielding high fidelity reconstructions for high compression ratios. In computer vision, the final goal is usually to perform an inference task using the images acquired and not signal recovery. With this motivation, this thesis considers the possibility of inference directly from compressed measurements, thereby obviating the need to use expensive reconstruction algorithms. It is often the case that non-linear features are used for inference tasks in computer vision. However, currently, it is unclear how to extract such features from compressed measurements. Instead, using the theoretical basis provided by the Johnson-Lindenstrauss lemma, discriminative features using smashed correlation filters are derived and it is shown that it is indeed possible to perform reconstruction-free inference at high compression ratios with only a marginal loss in accuracy. As a specific inference problem in computer vision, face recognition is considered, mainly beyond the visible spectrum such as in the short wave infra-red region (SWIR), where sensors are expensive.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2015
153158-Thumbnail Image.png
Description
Stroke is a leading cause of disability with varying effects across stroke survivors necessitating comprehensive approaches to rehabilitation. Interactive neurorehabilitation (INR) systems represent promising technological solutions that can provide an array of sensing, feedback and analysis tools which hold the potential to maximize clinical therapy as well as extend therapy

Stroke is a leading cause of disability with varying effects across stroke survivors necessitating comprehensive approaches to rehabilitation. Interactive neurorehabilitation (INR) systems represent promising technological solutions that can provide an array of sensing, feedback and analysis tools which hold the potential to maximize clinical therapy as well as extend therapy to the home. Currently, there are a variety of approaches to INR design, which coupled with minimal large-scale clinical data, has led to a lack of cohesion in INR design. INR design presents an inherently complex space as these systems have multiple users including stroke survivors, therapists and designers, each with their own user experience needs. This dissertation proposes that comprehensive INR design, which can address this complex user space, requires and benefits from the application of interdisciplinary research that spans motor learning and interactive learning. A methodology for integrated and iterative design approaches to INR task experience, assessment, hardware, software and interactive training protocol design is proposed within the comprehensive example of design and implementation of a mixed reality rehabilitation system for minimally supervised environments. This system was tested with eight stroke survivors who showed promising results in both functional and movement quality improvement. The results of testing the system with stroke survivors as well as observing user experiences will be presented along with suggested improvements to the proposed design methodology. This integrative design methodology is proposed to have benefit for not only comprehensive INR design but also complex interactive system design in general.
ContributorsBaran, Michael (Author) / Rikakis, Thanassis (Thesis advisor) / Olson, Loren (Thesis advisor) / Wolf, Steven L. (Committee member) / Ingalls, Todd (Committee member) / Arizona State University (Publisher)
Created2014
Description
Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.
ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2018
155083-Thumbnail Image.png
Description
Multi-sensor fusion is a fundamental problem in Robot Perception. For a robot to operate in a real world environment, multiple sensors are often needed. Thus, fusing data from various sensors accurately is vital for robot perception. In the first part of this thesis, the problem of fusing information from a

Multi-sensor fusion is a fundamental problem in Robot Perception. For a robot to operate in a real world environment, multiple sensors are often needed. Thus, fusing data from various sensors accurately is vital for robot perception. In the first part of this thesis, the problem of fusing information from a LIDAR, a color camera and a thermal camera to build RGB-Depth-Thermal (RGBDT) maps is investigated. An algorithm that solves a non-linear optimization problem to compute the relative pose between the cameras and the LIDAR is presented. The relative pose estimate is then used to find the color and thermal texture of each LIDAR point. Next, the various sources of error that can cause the mis-coloring of a LIDAR point after the cross- calibration are identified. Theoretical analyses of these errors reveal that the coloring errors due to noisy LIDAR points, errors in the estimation of the camera matrix, and errors in the estimation of translation between the sensors disappear with distance. But errors in the estimation of the rotation between the sensors causes the coloring error to increase with distance.

On a robot (vehicle) with multiple sensors, sensor fusion algorithms allow us to represent the data in the vehicle frame. But data acquired temporally in the vehicle frame needs to be registered in a global frame to obtain a map of the environment. Mapping techniques involving the Iterative Closest Point (ICP) algorithm and the Normal Distributions Transform (NDT) assume that a good initial estimate of the transformation between the 3D scans is available. This restricts the ability to stitch maps that were acquired at different times. Mapping can become flexible if maps that were acquired temporally can be merged later. To this end, the second part of this thesis focuses on developing an automated algorithm that fuses two maps by finding a congruent set of five points forming a pyramid.

Mapping has various application domains beyond Robot Navigation. The third part of this thesis considers a unique application domain where the surface displace- ments caused by an earthquake are to be recovered using pre- and post-earthquake LIDAR data. A technique to recover the 3D surface displacements is developed and the results are presented on real earthquake datasets: El Mayur Cucupa earthquake, Mexico, 2010 and Fukushima earthquake, Japan, 2011.
ContributorsKrishnan, Aravindhan K (Author) / Saripalli, Srikanth (Thesis advisor) / Klesh, Andrew (Committee member) / Fainekos, Georgios (Committee member) / Thangavelautham, Jekan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2016
155085-Thumbnail Image.png
Description
High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such

High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representations for static images, researchers have been using similar techniques to tackle video contents. Typical techniques first extract spatial features by processing raw images using deep convolution architectures designed for static image classifications. Then simple average, concatenation or classifier-based fusion/pooling methods are applied to the extracted features. I argue that features extracted in such ways do not acquire enough representative information since videos, unlike images, should be characterized as a temporal sequence of semantically coherent visual contents and thus need to be represented in a manner considering both semantic and spatio-temporal information.

In this thesis, I propose a novel architecture to learn semantic spatio-temporal embedding for videos to support high-level video analysis. The proposed method encodes video spatial and temporal information separately by employing a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Fully Connected Gated Recurrent Unit (FC-GRU) encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a Fully Connected Multilayer Perceptron (FC-MLP) to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. I evaluate the usefulness and effectiveness of this new video representation by conducting experiments on action recognition, zero-shot video classification, and semantic video retrieval (word-to-video) retrieval, using the UCF101 action recognition dataset.
ContributorsHu, Sheng-Hung (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Liang, Jianming (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Created2016
155809-Thumbnail Image.png
Description
Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow

Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow in processing a light field. We

present a deep learning approach using a new, two branch network architecture,

consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution

4D light field from a single coded 2D image. This network decreases reconstruction

time significantly while achieving average PSNR values of 26-32 dB on a variety of

light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7

minutes as compared to the dictionary method for equivalent visual quality. These

reconstructions are performed at small sampling/compression ratios as low as 8%,

allowing for cheaper coded light field cameras. We test our network reconstructions

on synthetic light fields, simulated coded measurements of real light fields captured

from a Lytro Illum camera, and real coded images from a custom CMOS diffractive

light field camera. The combination of compressive light field capture with deep

learning allows the potential for real-time light field video acquisition systems in the

future.
ContributorsGupta, Mayank (Author) / Turaga, Pavan (Thesis advisor) / Yang, Yezhou (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
149371-Thumbnail Image.png
Description
This thesis presents a multi-modal motion tracking system for stroke patient rehabilitation. This system deploys two sensor modules: marker-based motion capture system and inertial measurement unit (IMU). The integrated system provides real-time measurement of the right arm and trunk movement, even in the presence of marker occlusion. The information from

This thesis presents a multi-modal motion tracking system for stroke patient rehabilitation. This system deploys two sensor modules: marker-based motion capture system and inertial measurement unit (IMU). The integrated system provides real-time measurement of the right arm and trunk movement, even in the presence of marker occlusion. The information from the two sensors is fused through quaternion-based recursive filters to promise robust detection of torso compensation (undesired body motion). Since this algorithm allows flexible sensor configurations, it presents a framework for fusing the IMU data and vision data that can adapt to various sensor selection scenarios. The proposed system consequently has the potential to improve both the robustness and flexibility of the sensing process. Through comparison between the complementary filter, the extended Kalman filter (EKF), the unscented Kalman filter (UKF) and the particle filter (PF), the experimental part evaluated the performance of the quaternion-based complementary filter for 10 sensor combination scenarios. Experimental results demonstrate the favorable performance of the proposed system in case of occlusion. Such investigation also provides valuable information for filtering algorithm and strategy selection in specific sensor applications.
ContributorsLiu, Yangzi (Author) / Qian, Gang (Thesis advisor) / Olson, Loren (Committee member) / Si, Jennie (Committee member) / Arizona State University (Publisher)
Created2010