Search Content

Invariant human pose feature extraction for movement recognition and pose estimation

Description

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from…

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.

ContributorsPeng, Bo (Author) / Qian, Gang (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2011

Practical coding schemes for multi-user communications

Description

There are many wireless communication and networking applications that require high transmission rates and reliability with only limited resources in terms of bandwidth, power, hardware complexity etc.. Real-time video streaming, gaming and social networking are a few such examples. Over the years many problems have been addressed towards the goal…

There are many wireless communication and networking applications that require high transmission rates and reliability with only limited resources in terms of bandwidth, power, hardware complexity etc.. Real-time video streaming, gaming and social networking are a few such examples. Over the years many problems have been addressed towards the goal of enabling such applications; however, significant challenges still remain, particularly, in the context of multi-user communications. With the motivation of addressing some of these challenges, the main focus of this dissertation is the design and analysis of capacity approaching coding schemes for several (wireless) multi-user communication scenarios. Specifically, three main themes are studied: superposition coding over broadcast channels, practical coding for binary-input binary-output broadcast channels, and signalling schemes for two-way relay channels. As the first contribution, we propose an analytical tool that allows for reliable comparison of different practical codes and decoding strategies over degraded broadcast channels, even for very low error rates for which simulations are impractical. The second contribution deals with binary-input binary-output degraded broadcast channels, for which an optimal encoding scheme that achieves the capacity boundary is found, and a practical coding scheme is given by concatenation of an outer low density parity check code and an inner (non-linear) mapper that induces desired distribution of "one" in a codeword. The third contribution considers two-way relay channels where the information exchange between two nodes takes place in two transmission phases using a coding scheme called physical-layer network coding. At the relay, a near optimal decoding strategy is derived using a list decoding algorithm, and an approximation is obtained by a joint decoding approach. For the latter scheme, an analytical approximation of the word error rate based on a union bounding technique is computed under the assumption that linear codes are employed at the two nodes exchanging data. Further, when the wireless channel is frequency selective, two decoding strategies at the relay are developed, namely, a near optimal decoding scheme implemented using list decoding, and a reduced complexity detection/decoding scheme utilizing a linear minimum mean squared error based detector followed by a network coded sequence decoder.

ContributorsBhat, Uttam (Author) / Duman, Tolga M. (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Li, Baoxin (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2011

Mining semantics from low-level features in multimedia computing

Description

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or…

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively associating high-level semantics with low-level signals with higher accuracies in multimedia computing problems. In this work, novel computational methods are explored and developed for incorporating contextual information/domain knowledge in different forms for multimedia computing and pattern recognition problems. Specifically, a novel Bayesian approach with statistical-sampling-based inference is proposed for incorporating a special type of domain knowledge, spatial prior for the underlying shapes; cross-modality correlations via Kernel Canonical Correlation Analysis is explored and the learnt space is then used for associating multimedia contents in different forms; model contextual information as a graph is leveraged for regulating interactions among high-level semantic concepts (e.g., category labels), low-level input signal (e.g., spatial/temporal structure). Four real-world applications, including visual-to-tactile face conversion, photo tag recommendation, wild web video classification and unconstrained consumer video summarization, are selected to demonstrate the effectiveness of the approaches. These applications range from classic research challenges to emerging tasks in multimedia computing. Results from experiments on large-scale real-world data with comparisons to other state-of-the-art methods and subjective evaluations with end users confirmed that the developed approaches exhibit salient advantages, suggesting that they are promising for leveraging contextual information/domain knowledge for a wide range of multimedia computing and pattern recognition problems.

ContributorsWang, Zhesheng (Author) / Li, Baoxin (Thesis advisor) / Sundaram, Hari (Committee member) / Qian, Gang (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2011

Finding provenance data in social media

Description

A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data. However, the same characteristics that make the social media environment…

A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data. However, the same characteristics that make the social media environment challenging, including the massive amounts of data available, large numbers of users, and a highly dynamic environment, provide unique and untapped opportunities for solving the provenance problem for social media. Current approaches for tracking provenance data do not scale for online social media and consequently there is a gap in provenance methodologies and technologies providing exciting research opportunities. The guiding vision is the use of social media information itself to realize a useful amount of provenance data for information in social media. This departs from traditional approaches for data provenance which rely on a central store of provenance information. The contemporary online social media environment is an enormous and constantly updated "central store" that can be mined for provenance information that is not readily made available to the average social media user. This research introduces an approach and builds a foundation aimed at realizing a provenance data capability for social media users that is not accessible today.

ContributorsBarbier, Geoffrey P (Author) / Liu, Huan (Thesis advisor) / Bell, Herbert (Committee member) / Li, Baoxin (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Multi-label dimensionality reduction

Description

Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering…

Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.

ContributorsSun, Liang (Author) / Ye, Jieping (Thesis advisor) / Li, Baoxin (Committee member) / Liu, Huan (Committee member) / Mittelmann, Hans D. (Committee member) / Arizona State University (Publisher)

Created2011

Techniques for soundscape retrieval and synthesis

Description

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the…

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the realms of digital signal processing, multimedia information retrieval, and computer music synthesis to the analysis of the soundscape. Namely, these tools include a) an open source software library, Sirens, which can be used for the segmentation of long environmental field recordings into individual sonic events and compare these events in terms of acoustic content, b) a graph-based retrieval system that can use these measures of acoustic similarity and measures of semantic similarity using the lexical database WordNet to perform both text-based retrieval and automatic annotation of environmental sounds, and c) new techniques for the dynamic, realtime parametric morphing of multiple field recordings, informed by the geographic paths along which they were recorded.

ContributorsMechtley, Brandon Michael (Author) / Spanias, Andreas S (Thesis advisor) / Sundaram, Hari (Thesis advisor) / Cook, Perry R. (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2013

Interactions driving the collapse of islet amyloid polypeptide: implications for amyloid aggregation

Description

Human islet amyloid polypeptide (hIAPP), also known as amylin, is a 37-residue intrinsically disordered hormone involved in glucose regulation and gastric emptying. The aggregation of hIAPP into amyloid fibrils is believed to play a causal role in type 2 diabetes. To date, not much is known about the monomeric state…

Human islet amyloid polypeptide (hIAPP), also known as amylin, is a 37-residue intrinsically disordered hormone involved in glucose regulation and gastric emptying. The aggregation of hIAPP into amyloid fibrils is believed to play a causal role in type 2 diabetes. To date, not much is known about the monomeric state of hIAPP or how it undergoes an irreversible transformation from disordered peptide to insoluble aggregate. IAPP contains a highly conserved disulfide bond that restricts hIAPP(1-8) into a short ring-like structure: N_loop. Removal or chemical reduction of N_loop not only prevents cell response upon binding to the CGRP receptor, but also alters the mass per length distribution of hIAPP fibers and the kinetics of fibril formation. The mechanism by which N_loop affects hIAPP aggregation is not yet understood, but is important for rationalizing kinetics and developing potential inhibitors. By measuring end-to-end contact formation rates, Vaiana et al. showed that N_loop induces collapsed states in IAPP monomers, implying attractive interactions between N_loop and other regions of the disordered polypeptide chain . We show that in addition to being involved in intra-protein interactions, the N_loop is involved in inter-protein interactions, which lead to the formation of extremely long and stable β-turn fibers. These non-amyloid fibers are present in the 10 μM concentration range, under the same solution conditions in which hIAPP forms amyloid fibers. We discuss the effect of peptide cyclization on both intra- and inter-protein interactions, and its possible implications for aggregation. Our findings indicate a potential role of N_loop-N_loop interactions in hIAPP aggregation, which has not previously been explored. Though our findings suggest that N_loop plays an important role in the pathway of amyloid formation, other naturally occurring IAPP variants that contain this structural feature are incapable of forming amyloids. For example, hIAPP readily forms amyloid ﬁbrils in vitro, whereas the rat variant (rIAPP), differing by six amino acids, does not. In addition to being highly soluble, rIAPP is an effective inhibitor of hIAPP ﬁbril formation . Both of these properties have been attributed to rIAPP's three proline residues: A25P, S28P and S29P. Single proline mutants of hIAPP have also been shown to kinetically inhibit hIAPP fibril formation. Because of their intrinsic dihedral angle preferences, prolines are expected to affect conformational ensembles of intrinsically disordered proteins. The specific effect of proline substitutions on IAPP structure and dynamics has not yet been explored, as the detection of such properties is experimentally challenging due to the low molecular weight, fast reconfiguration times, and very low solubility of IAPP peptides. High-resolution techniques able to measure tertiary contact formations are needed to address this issue. We employ a nanosecond laser spectroscopy technique to measure end-to-end contact formation rates in IAPP mutants. We explore the proline substitutions in IAPP and quantify their effects in terms of intrinsic chain stiffness. We find that the three proline mutations found in rIAPP increase chain stiffness. Interestingly, we also find that residue R18 plays an important role in rIAPP's unique chain stiffness and, together with the proline residues, is a determinant for its non-amyloidogenic properties. We discuss the implications of our findings on the role of prolines in IDPs.

ContributorsCope, Stephanie M (Author) / Vaiana, Sara M (Thesis advisor) / Ghirlanda, Giovanna (Committee member) / Ros, Robert (Committee member) / Lindsay, Stuart M (Committee member) / Ozkan, Sefika B (Committee member) / Arizona State University (Publisher)

Created2013

Towards single molecule DNA sequencing

Description

Single molecule DNA Sequencing technology has been a hot research topic in the recent decades because it holds the promise to sequence a human genome in a fast and affordable way, which will eventually make personalized medicine possible. Single molecule differentiation and DNA translocation control are the two main challenges…

Single molecule DNA Sequencing technology has been a hot research topic in the recent decades because it holds the promise to sequence a human genome in a fast and affordable way, which will eventually make personalized medicine possible. Single molecule differentiation and DNA translocation control are the two main challenges in all single molecule DNA sequencing methods. In this thesis, I will first introduce DNA sequencing technology development and its application, and then explain the performance and limitation of prior art in detail. Following that, I will show a single molecule DNA base differentiation result obtained in recognition tunneling experiments. Furthermore, I will explain the assembly of a nanofluidic platform for single strand DNA translocation, which holds the promised to be integrated into a single molecule DNA sequencing instrument for DNA translocation control. Taken together, my dissertation research demonstrated the potential of using recognition tunneling techniques to serve as a general readout system for single molecule DNA sequencing application.

ContributorsLiu, Hao (Author) / Lindsay, Stuart M (Committee member) / Yan, Hao (Committee member) / Levitus, Marcia (Committee member) / Arizona State University (Publisher)

Created2013

Functional and regulatory biomolecular networks organized by DNA nanostructures

Description

DNA has recently emerged as an extremely promising material to organize molecules on nanoscale. The reliability of base recognition, self-assembling behavior, and attractive structural properties of DNA are of unparalleled value in systems of this size. DNA scaffolds have already been used to organize a variety of molecules including nanoparticles…

DNA has recently emerged as an extremely promising material to organize molecules on nanoscale. The reliability of base recognition, self-assembling behavior, and attractive structural properties of DNA are of unparalleled value in systems of this size. DNA scaffolds have already been used to organize a variety of molecules including nanoparticles and proteins. New protein-DNA bio-conjugation chemistries make it possible to precisely position proteins and other biomolecules on underlying DNA scaffolds, generating multi-biomolecule pathways with the ability to modulate inter-molecular interactions and the local environment. This dissertation focuses on studying the application of using DNA nanostructure to direct the self-assembly of other biomolecular networks to translate biochemical pathways to non-cellular environments. Presented here are a series of studies toward this application. First, a novel strategy utilized DNA origami as a scaffold to arrange spherical virus capsids into one-dimensional arrays with precise nanoscale positioning. This hierarchical self-assembly allows us to position the virus particles with unprecedented control and allows the future construction of integrated multi-component systems from biological scaffolds using the power of rationally engineered DNA nanostructures. Next, discrete glucose oxidase (GOx)/ horseradish peroxidase (HRP) enzyme pairs were organized on DNA origami tiles with controlled interenzyme spacing and position. This study revealed two different distance-dependent kinetic processes associated with the assembled enzyme pairs. Finally, a tweezer-like DNA nanodevice was designed and constructed to actuate the activity of an enzyme/cofactor pair. Using this approach, several cycles of externally controlled enzyme inhibition and activation were successfully demonstrated. This principle of responsive enzyme nanodevices may be used to regulate other types of enzymes and to introduce feedback or feed-forward control loops.

ContributorsLiu, Minghui (Author) / Yan, Hao (Thesis advisor) / Liu, Yan (Thesis advisor) / Chen, Julian (Committee member) / Zhang, Peiming (Committee member) / Arizona State University (Publisher)

Created2013

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a…

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

ContributorsChakraborty, Shayok (Author) / Panchanathan, Sethuraman (Thesis advisor) / Balasubramanian, Vineeth N. (Committee member) / Li, Baoxin (Committee member) / Mittelmann, Hans (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Filtering by