Search Content

Invariant human pose feature extraction for movement recognition and pose estimation

Description

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from…

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.

ContributorsPeng, Bo (Author) / Qian, Gang (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2011

The detection of reliability prediction cues in manufacturing data from statistically controlled processes

Description

Many products undergo several stages of testing ranging from tests on individual components to end-item tests. Additionally, these products may be further "tested" via customer or field use. The later failure of a delivered product may in some cases be due to circumstances that have no correlation with the product's…

Many products undergo several stages of testing ranging from tests on individual components to end-item tests. Additionally, these products may be further "tested" via customer or field use. The later failure of a delivered product may in some cases be due to circumstances that have no correlation with the product's inherent quality. However, at times, there may be cues in the upstream test data that, if detected, could serve to predict the likelihood of downstream failure or performance degradation induced by product use or environmental stresses. This study explores the use of downstream factory test data or product field reliability data to infer data mining or pattern recognition criteria onto manufacturing process or upstream test data by means of support vector machines (SVM) in order to provide reliability prediction models. In concert with a risk/benefit analysis, these models can be utilized to drive improvement of the product or, at least, via screening to improve the reliability of the product delivered to the customer. Such models can be used to aid in reliability risk assessment based on detectable correlations between the product test performance and the sources of supply, test stands, or other factors related to product manufacture. As an enhancement to the usefulness of the SVM or hyperplane classifier within this context, L-moments and the Western Electric Company (WECO) Rules are used to augment or replace the native process or test data used as inputs to the classifier. As part of this research, a generalizable binary classification methodology was developed that can be used to design and implement predictors of end-item field failure or downstream product performance based on upstream test data that may be composed of single-parameter, time-series, or multivariate real-valued data. Additionally, the methodology provides input parameter weighting factors that have proved useful in failure analysis and root cause investigations as indicators of which of several upstream product parameters have the greater influence on the downstream failure outcomes.

ContributorsMosley, James (Author) / Morrell, Darryl (Committee member) / Cochran, Douglas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Roberts, Chell (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2011

Incorporating auditory models in speech/audio applications

Description

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding…

Following the success in incorporating perceptual models in audio coding algorithms, their application in other speech/audio processing systems is expanding. In general, all perceptual speech/audio processing algorithms involve minimization of an objective function that directly/indirectly incorporates properties of human perception. This dissertation primarily investigates the problems associated with directly embedding an auditory model in the objective function formulation and proposes possible solutions to overcome high complexity issues for use in real-time speech/audio algorithms. Specific problems addressed in this dissertation include: 1) the development of approximate but computationally efficient auditory model implementations that are consistent with the principles of psychoacoustics, 2) the development of a mapping scheme that allows synthesizing a time/frequency domain representation from its equivalent auditory model output. The first problem is aimed at addressing the high computational complexity involved in solving perceptual objective functions that require repeated application of auditory model for evaluation of different candidate solutions. In this dissertation, a frequency pruning and a detector pruning algorithm is developed that efficiently implements the various auditory model stages. The performance of the pruned model is compared to that of the original auditory model for different types of test signals in the SQAM database. Experimental results indicate only a 4-7% relative error in loudness while attaining up to 80-90 % reduction in computational complexity. Similarly, a hybrid algorithm is developed specifically for use with sinusoidal signals and employs the proposed auditory pattern combining technique together with a look-up table to store representative auditory patterns. The second problem obtains an estimate of the auditory representation that minimizes a perceptual objective function and transforms the auditory pattern back to its equivalent time/frequency representation. This avoids the repeated application of auditory model stages to test different candidate time/frequency vectors in minimizing perceptual objective functions. In this dissertation, a constrained mapping scheme is developed by linearizing certain auditory model stages that ensures obtaining a time/frequency mapping corresponding to the estimated auditory representation. This paradigm was successfully incorporated in a perceptual speech enhancement algorithm and a sinusoidal component selection task.

ContributorsKrishnamoorthi, Harish (Author) / Spanias, Andreas (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2011

Field effect modulation of ion transport in silicon-on-insulator nanopores and their application as nanoscale coulter counters

Description

In the last few years, significant advances in nanofabrication have allowed tailoring of structures and materials at a molecular level enabling nanofabrication with precise control of dimensions and organization at molecular length scales, a development leading to significant advances in nanoscale systems. Although, the direction of progress seems to follow…

In the last few years, significant advances in nanofabrication have allowed tailoring of structures and materials at a molecular level enabling nanofabrication with precise control of dimensions and organization at molecular length scales, a development leading to significant advances in nanoscale systems. Although, the direction of progress seems to follow the path of microelectronics, the fundamental physics in a nanoscale system changes more rapidly compared to microelectronics, as the size scale is decreased. The changes in length, area, and volume ratios due to reduction in size alter the relative influence of various physical effects determining the overall operation of a system in unexpected ways. One such category of nanofluidic structures demonstrating unique ionic and molecular transport characteristics are nanopores. Nanopores derive their unique transport characteristics from the electrostatic interaction of nanopore surface charge with aqueous ionic solutions. In this doctoral research cylindrical nanopores, in single and array configuration, were fabricated in silicon-on-insulator (SOI) using a combination of electron beam lithography (EBL) and reactive ion etching (RIE). The fabrication method presented is compatible with standard semiconductor foundries and allows fabrication of nanopores with desired geometries and precise dimensional control, providing near ideal and isolated physical modeling systems to study ion transport at the nanometer level. Ion transport through nanopores was characterized by measuring ionic conductances of arrays of nanopores of various diameters for a wide range of concentration of aqueous hydrochloric acid (HCl) ionic solutions. Measured ionic conductances demonstrated two distinct regimes based on surface charge interactions at low ionic concentrations and nanopore geometry at high ionic concentrations. Field effect modulation of ion transport through nanopore arrays, in a fashion similar to semiconductor transistors, was also studied. Using ionic conductance measurements, it was shown that the concentration of ions in the nanopore volume was significantly changed when a gate voltage on nanopore arrays was applied, hence controlling their transport. Based on the ion transport results, single nanopores were used to demonstrate their application as nanoscale particle counters by using polystyrene nanobeads, monodispersed in aqueous HCl solutions of different molarities. Effects of field effect modulation on particle transition events were also demonstrated.

ContributorsJoshi, Punarvasu (Author) / Thornton, Trevor J (Thesis advisor) / Goryll, Michael (Thesis advisor) / Spanias, Andreas (Committee member) / Saraniti, Marco (Committee member) / Arizona State University (Publisher)

Created2011

Analytical control grid registration for efficient application of optical flow

Description

Image resolution limits the extent to which zooming enhances clarity, restricts the size digital photographs can be printed at, and, in the context of medical images, can prevent a diagnosis. Interpolation is the supplementing of known data with estimated values based on a function or model involving some or all…

Image resolution limits the extent to which zooming enhances clarity, restricts the size digital photographs can be printed at, and, in the context of medical images, can prevent a diagnosis. Interpolation is the supplementing of known data with estimated values based on a function or model involving some or all of the known samples. The selection of the contributing data points and the specifics of how they are used to define the interpolated values influences how effectively the interpolation algorithm is able to estimate the underlying, continuous signal. The main contributions of this dissertation are three fold: 1) Reframing edge-directed interpolation of a single image as an intensity-based registration problem. 2) Providing an analytical framework for intensity-based registration using control grid constraints. 3) Quantitative assessment of the new, single-image enlargement algorithm based on analytical intensity-based registration. In addition to single image resizing, the new methods and analytical approaches were extended to address a wide range of applications including volumetric (multi-slice) image interpolation, video deinterlacing, motion detection, and atmospheric distortion correction. Overall, the new approaches generate results that more accurately reflect the underlying signals than less computationally demanding approaches and with lower processing requirements and fewer restrictions than methods with comparable accuracy.

ContributorsZwart, Christine M. (Author) / Frakes, David H (Thesis advisor) / Karam, Lina (Committee member) / Kodibagkar, Vikram (Committee member) / Spanias, Andreas (Committee member) / Towe, Bruce (Committee member) / Arizona State University (Publisher)

Created2013

Multipath mitigating correlation kernels

Description

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival…

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival estimates are based upon accurate direct sequence spread spectrum (DSSS) code and carrier phase tracking. Current multipath mitigating GNSS solutions include fixed radiation pattern antennas and windowed delay-lock loop code phase discriminators. A new multipath mitigating code tracking algorithm is introduced that utilizes a non-symmetric correlation kernel to reject multipath. Independent parameters provide a means to trade-off code tracking discriminant gain against multipath mitigation performance. The algorithm performance is characterized in terms of multipath phase error bias, phase error estimation variance, tracking range, tracking ambiguity and implementation complexity. The algorithm is suitable for modernized GNSS signals including Binary Phase Shift Keyed (BPSK) and a variety of Binary Offset Keyed (BOC) signals. The algorithm compensates for unbalanced code sequences to ensure a code tracking bias does not result from the use of asymmetric correlation kernels. The algorithm does not require explicit knowledge of the propagation channel model. Design recommendations for selecting the algorithm parameters to mitigate precorrelation filter distortion are also provided.

ContributorsMiller, Steven (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2013

New directions in sparse models for image analysis and restoration

Description

Effective modeling of high dimensional data is crucial in information processing and machine learning. Classical subspace methods have been very effective in such applications. However, over the past few decades, there has been considerable research towards the development of new modeling paradigms that go beyond subspace methods. This dissertation focuses…

Effective modeling of high dimensional data is crucial in information processing and machine learning. Classical subspace methods have been very effective in such applications. However, over the past few decades, there has been considerable research towards the development of new modeling paradigms that go beyond subspace methods. This dissertation focuses on the study of sparse models and their interplay with modern machine learning techniques such as manifold, ensemble and graph-based methods, along with their applications in image analysis and recovery. By considering graph relations between data samples while learning sparse models, graph-embedded codes can be obtained for use in unsupervised, supervised and semi-supervised problems. Using experiments on standard datasets, it is demonstrated that the codes obtained from the proposed methods outperform several baseline algorithms. In order to facilitate sparse learning with large scale data, the paradigm of ensemble sparse coding is proposed, and different strategies for constructing weak base models are developed. Experiments with image recovery and clustering demonstrate that these ensemble models perform better when compared to conventional sparse coding frameworks. When examples from the data manifold are available, manifold constraints can be incorporated with sparse models and two approaches are proposed to combine sparse coding with manifold projection. The improved performance of the proposed techniques in comparison to sparse coding approaches is demonstrated using several image recovery experiments. In addition to these approaches, it might be required in some applications to combine multiple sparse models with different regularizations. In particular, combining an unconstrained sparse model with non-negative sparse coding is important in image analysis, and it poses several algorithmic and theoretical challenges. A convex and an efficient greedy algorithm for recovering combined representations are proposed. Theoretical guarantees on sparsity thresholds for exact recovery using these algorithms are derived and recovery performance is also demonstrated using simulations on synthetic data. Finally, the problem of non-linear compressive sensing, where the measurement process is carried out in feature space obtained using non-linear transformations, is considered. An optimized non-linear measurement system is proposed, and improvements in recovery performance are demonstrated in comparison to using random measurements as well as optimized linear measurements.

ContributorsNatesan Ramamurthy, Karthikeyan (Author) / Spanias, Andreas (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Karam, Lina (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2013

Performance characterization of communication channels through asymptotic and partial ordering analysis

Description

Asymptotic comparisons of ergodic channel capacity at high and low signal-to-noise ratios (SNRs) are provided for several adaptive transmission schemes over fading channels with general distributions, including optimal power and rate adaptation, rate adaptation only, channel inversion and its variants. Analysis of the high-SNR pre-log constants of the ergodic capacity…

Asymptotic comparisons of ergodic channel capacity at high and low signal-to-noise ratios (SNRs) are provided for several adaptive transmission schemes over fading channels with general distributions, including optimal power and rate adaptation, rate adaptation only, channel inversion and its variants. Analysis of the high-SNR pre-log constants of the ergodic capacity reveals the existence of constant capacity difference gaps among the schemes with a pre-log constant of 1. Closed-form expressions for these high-SNR capacity difference gaps are derived, which are proportional to the SNR loss between these schemes in dB scale. The largest one of these gaps is found to be between the optimal power and rate adaptation scheme and the channel inversion scheme. Based on these expressions it is shown that the presence of space diversity or multi-user diversity makes channel inversion arbitrarily close to achieving optimal capacity at high SNR with sufficiently large number of antennas or users. A low-SNR analysis also reveals that the presence of fading provably always improves capacity at sufficiently low SNR, compared to the additive white Gaussian noise (AWGN) case. Numerical results are shown to corroborate our analytical results. This dissertation derives high-SNR asymptotic average error rates over fading channels by relating them to the outage probability, under mild assumptions. The analysis is based on the Tauberian theorem for Laplace-Stieltjes transforms which is grounded on the notion of regular variation, and applies to a wider range of channel distributions than existing approaches. The theory of regular variation is argued to be the proper mathematical framework for finding sufficient and necessary conditions for outage events to dominate high-SNR error rate performance. It is proved that the diversity order being d and the cumulative distribution function (CDF) of the channel power gain having variation exponent d at 0 imply each other, provided that the instantaneous error rate is upper-bounded by an exponential function of the instantaneous SNR. High-SNR asymptotic average error rates are derived for specific instantaneous error rates. Compared to existing approaches in the literature, the asymptotic expressions are related to the channel distribution in a much simpler manner herein, and related with outage more intuitively. The high-SNR asymptotic error rate is also characterized under diversity combining schemes with the channel power gain of each branch having a regularly varying CDF. Numerical results are shown to corroborate our theoretical analysis. This dissertation studies several problems concerning channel inclusion, which is a partial ordering between discrete memoryless channels (DMCs) proposed by Shannon. Specifically, majorization-based conditions are derived for channel inclusion between certain DMCs. Furthermore, under general conditions, channel equivalence defined through Shannon ordering is shown to be the same as permutation of input and output symbols. The determination of channel inclusion is considered as a convex optimization problem, and the sparsity of the weights related to the representation of the worse DMC in terms of the better one is revealed when channel inclusion holds between two DMCs. For the exploitation of this sparsity, an effective iterative algorithm is established based on modifying the orthogonal matching pursuit algorithm. The extension of channel inclusion to continuous channels and its application in ordering phase noises are briefly addressed.

ContributorsZhang, Yuan (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Zhang, Junshan (Committee member) / Reisslein, Martin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2013

Sparse methods in image understanding and computer vision

Description

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse…

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse modeling, data is represented as a sparse linear combination of atoms from a "dictionary" matrix. This dissertation focuses on understanding different aspects of sparse learning, thereby enhancing the use of sparse methods by incorporating tools from machine learning. With the growing need to adapt models for large scale data, it is important to design dictionaries that can model the entire data space and not just the samples considered. By exploiting the relation of dictionary learning to 1-D subspace clustering, a multilevel dictionary learning algorithm is developed, and it is shown to outperform conventional sparse models in compressed recovery, and image denoising. Theoretical aspects of learning such as algorithmic stability and generalization are considered, and ensemble learning is incorporated for effective large scale learning. In addition to building strategies for efficiently implementing 1-D subspace clustering, a discriminative clustering approach is designed to estimate the unknown mixing process in blind source separation. By exploiting the non-linear relation between the image descriptors, and allowing the use of multiple features, sparse methods can be made more effective in recognition problems. The idea of multiple kernel sparse representations is developed, and algorithms for learning dictionaries in the feature space are presented. Using object recognition experiments on standard datasets it is shown that the proposed approaches outperform other sparse coding-based recognition frameworks. Furthermore, a segmentation technique based on multiple kernel sparse representations is developed, and successfully applied for automated brain tumor identification. Using sparse codes to define the relation between data samples can lead to a more robust graph embedding for unsupervised clustering. By performing discriminative embedding using sparse coding-based graphs, an algorithm for measuring the glomerular number in kidney MRI images is developed. Finally, approaches to build dictionaries for local sparse coding of image descriptors are presented, and applied to object recognition and image retrieval.

ContributorsJayaraman Thiagarajan, Jayaraman (Author) / Spanias, Andreas (Thesis advisor) / Frakes, David (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2013

On the dynamics of epileptic spikes and focus localization in temporal lobe epilepsy

Description

Interictal spikes, together with seizures, have been recognized as the two hallmarks of epilepsy, a brain disorder that 1% of the world's population suffers from. Even though the presence of spikes in brain's electromagnetic activity has diagnostic value, their dynamics are still elusive. It was an objective of this dissertation…

Interictal spikes, together with seizures, have been recognized as the two hallmarks of epilepsy, a brain disorder that 1% of the world's population suffers from. Even though the presence of spikes in brain's electromagnetic activity has diagnostic value, their dynamics are still elusive. It was an objective of this dissertation to formulate a mathematical framework within which the dynamics of interictal spikes could be thoroughly investigated. A new epileptic spike detection algorithm was developed by employing data adaptive morphological filters. The performance of the spike detection algorithm was favorably compared with others in the literature. A novel spike spatial synchronization measure was developed and tested on coupled spiking neuron models. Application of this measure to individual epileptic spikes in EEG from patients with temporal lobe epilepsy revealed long-term trends of increase in synchronization between pairs of brain sites before seizures and desynchronization after seizures, in the same patient as well as across patients, thus supporting the hypothesis that seizures may occur to break (reset) the abnormal spike synchronization in the brain network. Furthermore, based on these results, a separate spatial analysis of spike rates was conducted that shed light onto conflicting results in the literature about variability of spike rate before and after seizure. The ability to automatically classify seizures into clinical and subclinical was a result of the above findings. A novel method for epileptogenic focus localization from interictal periods based on spike occurrences was also devised, combining concepts from graph theory, like eigenvector centrality, and the developed spike synchronization measure, and tested very favorably against the utilized gold rule in clinical practice for focus localization from seizures onset. Finally, in another application of resetting of brain dynamics at seizures, it was shown that it is possible to differentiate with a high accuracy between patients with epileptic seizures (ES) and patients with psychogenic nonepileptic seizures (PNES). The above studies of spike dynamics have elucidated many unknown aspects of ictogenesis and it is expected to significantly contribute to further understanding of the basic mechanisms that lead to seizures, the diagnosis and treatment of epilepsy.

ContributorsKrishnan, Balu (Author) / Iasemidis, Leonidas (Thesis advisor) / Tsakalis, Kostantinos (Committee member) / Spanias, Andreas (Committee member) / Si, Jennie (Committee member) / Arizona State University (Publisher)

Created2012

Filtering by