Matching Items (22)
Filtering by

Clear all filters

151383-Thumbnail Image.png
Description
Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera -

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera - Kinect is presented. We address this problem by first conducting a systematic analysis of the usability of Kinect for motion analysis in stroke rehabilitation. Then a hybrid upper body tracking approach is proposed which combines off-the-shelf skeleton tracking with a novel depth-fused mean shift tracking method. We proposed several kinematic features reliably extracted from the proposed inexpensive and portable motion capture system and classifiers that correlate torso movement to clinical measures of unimpaired and impaired. Experiment results show that the proposed sensing and analysis works reliably on measuring torso movement quality and is promising for end-point tracking. The system is currently being deployed for large-scale evaluations.
ContributorsDu, Tingfang (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Rikakis, Thanassis (Committee member) / Arizona State University (Publisher)
Created2012
151544-Thumbnail Image.png
Description
Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse modeling, data is represented as a sparse linear combination of atoms from a "dictionary" matrix. This dissertation focuses on understanding different aspects of sparse learning, thereby enhancing the use of sparse methods by incorporating tools from machine learning. With the growing need to adapt models for large scale data, it is important to design dictionaries that can model the entire data space and not just the samples considered. By exploiting the relation of dictionary learning to 1-D subspace clustering, a multilevel dictionary learning algorithm is developed, and it is shown to outperform conventional sparse models in compressed recovery, and image denoising. Theoretical aspects of learning such as algorithmic stability and generalization are considered, and ensemble learning is incorporated for effective large scale learning. In addition to building strategies for efficiently implementing 1-D subspace clustering, a discriminative clustering approach is designed to estimate the unknown mixing process in blind source separation. By exploiting the non-linear relation between the image descriptors, and allowing the use of multiple features, sparse methods can be made more effective in recognition problems. The idea of multiple kernel sparse representations is developed, and algorithms for learning dictionaries in the feature space are presented. Using object recognition experiments on standard datasets it is shown that the proposed approaches outperform other sparse coding-based recognition frameworks. Furthermore, a segmentation technique based on multiple kernel sparse representations is developed, and successfully applied for automated brain tumor identification. Using sparse codes to define the relation between data samples can lead to a more robust graph embedding for unsupervised clustering. By performing discriminative embedding using sparse coding-based graphs, an algorithm for measuring the glomerular number in kidney MRI images is developed. Finally, approaches to build dictionaries for local sparse coding of image descriptors are presented, and applied to object recognition and image retrieval.
ContributorsJayaraman Thiagarajan, Jayaraman (Author) / Spanias, Andreas (Thesis advisor) / Frakes, David (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2013
151537-Thumbnail Image.png
Description
Effective modeling of high dimensional data is crucial in information processing and machine learning. Classical subspace methods have been very effective in such applications. However, over the past few decades, there has been considerable research towards the development of new modeling paradigms that go beyond subspace methods. This dissertation focuses

Effective modeling of high dimensional data is crucial in information processing and machine learning. Classical subspace methods have been very effective in such applications. However, over the past few decades, there has been considerable research towards the development of new modeling paradigms that go beyond subspace methods. This dissertation focuses on the study of sparse models and their interplay with modern machine learning techniques such as manifold, ensemble and graph-based methods, along with their applications in image analysis and recovery. By considering graph relations between data samples while learning sparse models, graph-embedded codes can be obtained for use in unsupervised, supervised and semi-supervised problems. Using experiments on standard datasets, it is demonstrated that the codes obtained from the proposed methods outperform several baseline algorithms. In order to facilitate sparse learning with large scale data, the paradigm of ensemble sparse coding is proposed, and different strategies for constructing weak base models are developed. Experiments with image recovery and clustering demonstrate that these ensemble models perform better when compared to conventional sparse coding frameworks. When examples from the data manifold are available, manifold constraints can be incorporated with sparse models and two approaches are proposed to combine sparse coding with manifold projection. The improved performance of the proposed techniques in comparison to sparse coding approaches is demonstrated using several image recovery experiments. In addition to these approaches, it might be required in some applications to combine multiple sparse models with different regularizations. In particular, combining an unconstrained sparse model with non-negative sparse coding is important in image analysis, and it poses several algorithmic and theoretical challenges. A convex and an efficient greedy algorithm for recovering combined representations are proposed. Theoretical guarantees on sparsity thresholds for exact recovery using these algorithms are derived and recovery performance is also demonstrated using simulations on synthetic data. Finally, the problem of non-linear compressive sensing, where the measurement process is carried out in feature space obtained using non-linear transformations, is considered. An optimized non-linear measurement system is proposed, and improvements in recovery performance are demonstrated in comparison to using random measurements as well as optimized linear measurements.
ContributorsNatesan Ramamurthy, Karthikeyan (Author) / Spanias, Andreas (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Karam, Lina (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2013
153488-Thumbnail Image.png
Description
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
149977-Thumbnail Image.png
Description
Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.
ContributorsPeng, Bo (Author) / Qian, Gang (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2011
151028-Thumbnail Image.png
Description
In this thesis, we consider the problem of fast and efficient indexing techniques for time sequences which evolve on manifold-valued spaces. Using manifolds is a convenient way to work with complex features that often do not live in Euclidean spaces. However, computing standard notions of geodesic distance, mean etc. can

In this thesis, we consider the problem of fast and efficient indexing techniques for time sequences which evolve on manifold-valued spaces. Using manifolds is a convenient way to work with complex features that often do not live in Euclidean spaces. However, computing standard notions of geodesic distance, mean etc. can get very involved due to the underlying non-linearity associated with the space. As a result a complex task such as manifold sequence matching would require very large number of computations making it hard to use in practice. We believe that one can device smart approximation algorithms for several classes of such problems which take into account the geometry of the manifold and maintain the favorable properties of the exact approach. This problem has several applications in areas of human activity discovery and recognition, where several features and representations are naturally studied in a non-Euclidean setting. We propose a novel solution to the problem of indexing manifold-valued sequences by proposing an intrinsic approach to map sequences to a symbolic representation. This is shown to enable the deployment of fast and accurate algorithms for activity recognition, motif discovery, and anomaly detection. Toward this end, we present generalizations of key concepts of piece-wise aggregation and symbolic approximation for the case of non-Euclidean manifolds. Experiments show that one can replace expensive geodesic computations with much faster symbolic computations with little loss of accuracy in activity recognition and discovery applications. The proposed methods are ideally suited for real-time systems and resource constrained scenarios.
ContributorsAnirudh, Rushil (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2012
156587-Thumbnail Image.png
Description
Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance.

Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction.

We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems.

In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph.
ContributorsSong, Huan (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)
Created2018
156802-Thumbnail Image.png
Description
Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees

Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees of quality, and the quality can be defined with regard to aesthetic, athletic, or health-related ratings. It is useful to measure and track the quality of a person's movements, for various applications, especially with the prevalence of low-cost and portable cameras and sensors today. Furthermore, based on such measurements, feedback systems can be designed for people to practice their movements towards certain goals. In this dissertation, I introduce symmetry as a family of measures for movement quality, and utilize recent advances in computer vision and differential geometry to model and analyze different types of symmetry in human movements. Movements are modeled as trajectories on different types of manifolds, according to the representations of movements from sensor data. The benefit of such a universal framework is that it can accommodate different existing and future features that describe human movements. The theory and tools developed in this dissertation will also be useful in other scientific areas to analyze symmetry from high-dimensional signals.
ContributorsWang, Qiao (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Srivastava, Anuj (Committee member) / Sha, Xin Wei (Committee member) / Arizona State University (Publisher)
Created2018
133887-Thumbnail Image.png
Description
This thesis evaluates the viability of an original design for a cost-effective wheel-mounted dynamometer for road vehicles. The goal is to show whether or not a device that generates torque and horsepower curves by processing accelerometer data collected at the edge of a wheel can yield results that are comparable

This thesis evaluates the viability of an original design for a cost-effective wheel-mounted dynamometer for road vehicles. The goal is to show whether or not a device that generates torque and horsepower curves by processing accelerometer data collected at the edge of a wheel can yield results that are comparable to results obtained using a conventional chassis dynamometer. Torque curves were generated via the experimental method under a variety of circumstances and also obtained professionally by a precision engine testing company. Metrics were created to measure the precision of the experimental device's ability to consistently generate torque curves and also to compare the similarity of these curves to the professionally obtained torque curves. The results revealed that although the test device does not quite provide the same level of precision as the professional chassis dynamometer, it does create torque curves that closely resemble the chassis dynamometer torque curves and exhibit a consistency between trials comparable to the professional results, even on rough road surfaces. The results suggest that the test device provides enough accuracy and precision to satisfy the needs of most consumers interested in measuring their vehicle's engine performance but probably lacks the level of accuracy and precision needed to appeal to professionals.
ContributorsKing, Michael (Author) / Ren, Yi (Thesis director) / Spanias, Andreas (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
154660-Thumbnail Image.png
Description
The research on the topology and dynamics of complex networks is one of the most focused area in complex system science. The goals are to structure our understanding of the real-world social, economical, technological, and biological systems in the aspect of networks consisting a large number of interacting units and

The research on the topology and dynamics of complex networks is one of the most focused area in complex system science. The goals are to structure our understanding of the real-world social, economical, technological, and biological systems in the aspect of networks consisting a large number of interacting units and to develop corresponding detection, prediction, and control strategies. In this highly interdisciplinary field, my research mainly concentrates on universal estimation schemes, physical controllability, as well as mechanisms behind extreme events and cascading failure for complex networked systems.

Revealing the underlying structure and dynamics of complex networked systems from observed data without of any specific prior information is of fundamental importance to science, engineering, and society. We articulate a Markov network based model, the sparse dynamical Boltzmann machine (SDBM), as a universal network structural estimator and dynamics approximator based on techniques including compressive sensing and K-means algorithm. It recovers the network structure of the original system and predicts its short-term or even long-term dynamical behavior for a large variety of representative dynamical processes on model and real-world complex networks.

One of the most challenging problems in complex dynamical systems is to control complex networks.

Upon finding that the energy required to approach a target state with reasonable precision

is often unbearably large, and the energy of controlling a set of networks with similar structural properties follows a fat-tail distribution, we identify fundamental structural ``short boards'' that play a dominant role in the enormous energy and offer a theoretical interpretation for the fat-tail distribution and simple strategies to significantly reduce the energy.

Extreme events and cascading failure, a type of collective behavior in complex networked systems, often have catastrophic consequences. Utilizing transportation and evolutionary game dynamics as prototypical

settings, we investigate the emergence of extreme events in simplex complex networks, mobile ad-hoc networks and multi-layer interdependent networks. A striking resonance-like phenomenon and the emergence of global-scale cascading breakdown are discovered. We derive analytic theories to understand the mechanism of

control at a quantitative level and articulate cost-effective control schemes to significantly suppress extreme events and the cascading process.
ContributorsChen, Yuzhong (Author) / Lai, Ying-Cheng (Thesis advisor) / Spanias, Andreas (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Ying, Lei (Committee member) / Arizona State University (Publisher)
Created2016