Search Content

Invariant human pose feature extraction for movement recognition and pose estimation

Description

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from…

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.

ContributorsPeng, Bo (Author) / Qian, Gang (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2011

Somatic ABC's: a theoretical framework for designing, developing and evaluating the building blocks of touch-based information delivery

Description

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's…

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.

ContributorsMcDaniel, Troy Lee (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2012

Towards building cyber-human systems for individuals with visual impairment

Description

A lot of strides have been made in enabling technologies to aid individuals with visual impairment live an independent life. The advent of smart devices and participatory web has especially facilitated the possibility of new interactions to aide everyday tasks. Current systems however tend to be complex and require multiple…

A lot of strides have been made in enabling technologies to aid individuals with visual impairment live an independent life. The advent of smart devices and participatory web has especially facilitated the possibility of new interactions to aide everyday tasks. Current systems however tend to be complex and require multiple cumbersome devices which invariably come with steep learning curves. Building new cyber-human systems with simple integrated interfaces while keeping in mind the specific requirements of the target users would help alleviate their mundane yet significant daily needs. Navigation is one such significant need that forms an integral part of everyday life and is one of the areas where individuals with visual impairment face the most discomfort. There is little technology out there to help travelers with navigating new routes. A number of research prototypes have been proposed but none of them are available to the general population. This may be due to the need for special equipment that needs expertise before deployment, or trained professionals needing to calibrate devices or because of the fact that the systems are just not scalable. Another area that needs assistance is the field of education. Lot of the classroom material and textbook material is not readily available in alternate formats for use. Another such area that requires attention is information delivery in the age of web 2.0. Popular websites like Facebook, Amazon, etc are designed with sighted people as target audience. While the mobile editions with their pared down versions make it easier to navigate with screen readers, the truth remains that there is still a long way to go in making such websites truly accessible.

ContributorsPaladugu, Devi Archana (Author) / Li, Baoxin (Thesis advisor) / Hedgpeth, Terri (Committee member) / Atkinson, Robert (Committee member) / Walker, Erin (Committee member) / Arizona State University (Publisher)

Created2016

Computational Beltrami Coefficient Quantification of Retinotopic Maps in the Visual Processing Cascade

Description

This dissertation constructs a new computational processing framework to robustly and precisely quantify retinotopic maps based on their angle distortion properties. More generally, this framework solves the problem of how to robustly and precisely quantify (angle) distortions of noisy or incomplete (boundary enclosed) 2-dimensional surface to surface mappings. This framework…

This dissertation constructs a new computational processing framework to robustly and precisely quantify retinotopic maps based on their angle distortion properties. More generally, this framework solves the problem of how to robustly and precisely quantify (angle) distortions of noisy or incomplete (boundary enclosed) 2-dimensional surface to surface mappings. This framework builds upon the Beltrami Coefficient (BC) description of quasiconformal mappings that directly quantifies local mapping (circles to ellipses) distortions between diffeomorphisms of boundary enclosed plane domains homeomorphic to the unit disk. A new map called the Beltrami Coefficient Map (BCM) was constructed to describe distortions in retinotopic maps. The BCM can be used to fully reconstruct the original target surface (retinal visual field) of retinotopic maps. This dissertation also compared retinotopic maps in the visual processing cascade, which is a series of connected retinotopic maps responsible for visual data processing of physical images captured by the eyes. By comparing the BCM results from a large Human Connectome project (HCP) retinotopic dataset (N=181), a new computational quasiconformal mapping description of the transformed retinal image as it passes through the cascade is proposed, which is not present in any current literature. The description applied on HCP data provided direct visible and quantifiable geometric properties of the cascade in a way that has not been observed before. Because retinotopic maps are generated from in vivo noisy functional magnetic resonance imaging (fMRI), quantifying them comes with a certain degree of uncertainty. To quantify the uncertainties in the quantification results, it is necessary to generate statistical models of retinotopic maps from their BCMs and raw fMRI signals. Considering that estimating retinotopic maps from real noisy fMRI time series data using the population receptive field (pRF) model is a time consuming process, a convolutional neural network (CNN) was constructed and trained to predict pRF model parameters from real noisy fMRI data

ContributorsTa, Duyan Nguyen (Author) / Wang, Yalin (Thesis advisor) / Lu, Zhong-Lin (Committee member) / Hansford, Dianne (Committee member) / Liu, Huan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2022

Exploring Deep Learning Vulnerability: Attack and Defense

Description

Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some…

Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some showing state-of-the-art performance in attacking certain networks. In the meanwhile, many defense mechanisms have been proposed in the literature, some of which are quite effective for guarding against typical attacks. Yet, most of these attacks fail when the targeted network modifies its architecture or uses another set of parameters and vice versa. Moreover, the emerging of more advanced deep neural networks, such as generative adversarial networks (GANs), has made the situation more complicated and the game between the attack and defense is continuing. This dissertation aims at exploring the venerability of the deep neural networks by investigating the mechanisms behind the success/failure of the existing attack and defense approaches. Therefore, several deep learning-based approaches have been proposed to study the problem from different perspectives. First, I developed an adversarial attack approach by exploring the unlearned region of a typical deep neural network which is often over-parameterized. Second, I proposed an end-to-end learning framework to analyze the images generated by different GAN models. Third, I developed a defense mechanism that can secure the deep neural network against adversarial attacks with a defense layer consisting of a set of orthogonal kernels. Substantial experiments are conducted to unveil the potential factors that contribute to attack/defense effectiveness. This dissertation also concludes with a discussion of possible future works of achieving a robust deep neural network.

ContributorsDing, Yuzhen (Author) / Li, Baoxin (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth Kumar Demakethepalli (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2022

Pervasive quantied-self using multiple sensors

Description

The advent of commercial inexpensive sensors and the advances in information and communication technology (ICT) have brought forth the era of pervasive Quantified-Self. Automatic diet monitoring is one of the most important aspects for Quantified-Self because it is vital for ensuring the well-being of patients suffering from chronic diseases as…

The advent of commercial inexpensive sensors and the advances in information and communication technology (ICT) have brought forth the era of pervasive Quantified-Self. Automatic diet monitoring is one of the most important aspects for Quantified-Self because it is vital for ensuring the well-being of patients suffering from chronic diseases as well as for providing a low cost means for maintaining the health for everyone else. Automatic dietary monitoring consists of: a) Determining the type and amount of food intake, and b) Monitoring eating behavior, i.e., time, frequency, and speed of eating. Although there are some existing techniques towards these ends, they suffer from issues of low accuracy and low adherence. To overcome these issues, multiple sensors were utilized because the availability of affordable sensors that can capture the different aspect information has the potential for increasing the available knowledge for Quantified-Self. For a), I envision an intelligent dietary monitoring system that automatically identifies food items by using the knowledge obtained from visible spectrum camera and infrared spectrum camera. This system is able to outperform the state-of-the-art systems for cooked food recognition by 25% while also minimizing user intervention. For b), I propose a novel methodology, IDEA that performs accurate eating action identification within eating episodes with an average F1-score of 0.92. This is an improvement of 0.11 for precision and 0.15 for recall for the worst-case users as compared to the state-of-the-art. IDEA uses only a single wrist-band which includes four sensors and provides feedback on eating speed every 2 minutes without obtaining any manual input from the user.

ContributorsLee, Junghyo (Author) / Gupta, Sandeep K.S. (Thesis advisor) / Banerjee, Ayan (Committee member) / Li, Baoxin (Committee member) / Chiou, Erin (Committee member) / Kudva, Yogish C. (Committee member) / Arizona State University (Publisher)

Created2019

BraiNet: A Framework for Designing Pervasive Brain-Machine Interface Applications

Description

Due to the advent of easy-to-use, portable, and cost-effective brain signal sensing devices, pervasive Brain-Machine Interface (BMI) applications using Electroencephalogram (EEG) are growing rapidly. The main objectives of these applications are: 1) pervasive collection of brain data from multiple users, 2) processing the collected data to recognize the corresponding mental…

Due to the advent of easy-to-use, portable, and cost-effective brain signal sensing devices, pervasive Brain-Machine Interface (BMI) applications using Electroencephalogram (EEG) are growing rapidly. The main objectives of these applications are: 1) pervasive collection of brain data from multiple users, 2) processing the collected data to recognize the corresponding mental states, and 3) providing real-time feedback to the end users, activating an actuator, or information harvesting by enterprises for further services. Developing BMI applications faces several challenges, such as cumbersome setup procedure, low signal-to-noise ratio, insufficient signal samples for analysis, and long processing times. Internet-of-Things (IoT) technologies provide the opportunity to solve these challenges through large scale data collection, fast data transmission, and computational offloading.

This research proposes an IoT-based framework, called BraiNet, that provides a standard design methodology for fulfilling the pervasive BMI applications requirements including: accuracy, timeliness, energy-efficiency, security, and dependability. BraiNet applies Machine Learning (ML) based solutions (e.g. classifiers and predictive models) to: 1) improve the accuracy of mental state detection on-the-go, 2) provide real-time feedback to the users, and 3) save power on mobile platforms. However, BraiNet inherits security breaches of IoT, due to applying off-the-shelf soft/hardware, high accessibility, and massive network size. ML algorithms, as the core technology for mental state recognition, are among the main targets for cyber attackers. Novel ML security solutions are proposed and added to BraiNet, which provide analytical methodologies for tuning the ML hyper-parameters to be secure against attacks.

To implement these solutions, two main optimization problems are solved: 1) maximizing accuracy, while minimizing delays and power consumption, and 2) maximizing the ML security, while keeping its accuracy high. Deep learning algorithms, delay and power models are developed to solve the former problem, while gradient-free optimization techniques, such as Bayesian optimization are applied for the latter. To test the framework, several BMI applications are implemented, such as EEG-based drivers fatigue detector (SafeDrive), EEG-based identification and authentication system (E-BIAS), and interactive movies that adapt to viewers mental states (nMovie). The results from the experiments on the implemented applications show the successful design of pervasive BMI applications based on the BraiNet framework.

ContributorsSadeghi Oskooyee, Seyed Koosha (Author) / Gupta, Sandeep K S (Thesis advisor) / Santello, Marco (Committee member) / Li, Baoxin (Committee member) / Venkatasubramanian, Krishna K (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)

Created2020

Transportation Techniques for Geometric Clustering

Description

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is…

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing.

ContributorsMi, Liang (Author) / Wang, Yalin (Thesis advisor) / Chen, Kewei (Committee member) / Karam, Lina (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2020

Learning in Compressed Domains

Description

A massive volume of data is generated at an unprecedented rate in the information age. The growth of data signiﬁcantly exceeds the computing and storage capacities of the existing digital infrastructure. In the past decade, many methods are invented for data compression, compressive sensing and reconstruction, and compressed learning (learning…

A massive volume of data is generated at an unprecedented rate in the information age. The growth of data signiﬁcantly exceeds the computing and storage capacities of the existing digital infrastructure. In the past decade, many methods are invented for data compression, compressive sensing and reconstruction, and compressed learning (learning directly upon compressed data) to overcome the data-explosion challenge. While prior works are predominantly model-based, focus on small models, and not suitable for task-oriented sensing or hardware acceleration, the number of available models for compression-related tasks has escalated by orders of magnitude in the past decade. Motivated by this signiﬁcant growth and the success of big data, this dissertation proposes to revolutionize both the compressive sensing reconstruction (CSR) and compressed learning (CL) methods from the data-driven perspective. In this dissertation, a series of topics on data-driven CSR are discussed. Individual data-driven models are proposed for the CSR of bio-signals, images, and videos with improved compression ratio and recovery ﬁdelity trade-oﬀ. Speciﬁcally, a scalable Laplacian pyramid reconstructive adversarial network (LAPRAN) is proposed for single-image CSR. LAPRAN progressively reconstructs images following the concept of the Laplacian pyramid through the concatenation of multiple reconstructive adversarial networks (RANs). For the CSR of videos, CSVideoNet is proposed to improve the spatial-temporal resolution of reconstructed videos. Apart from CSR, data-driven CL is discussed in the dissertation. A CL framework is proposed to extract features directly from compressed data for image classiﬁcation, objection detection, and semantic/instance segmentation. Besides, the spectral bias of neural networks is analyzed from the frequency perspective, leading to a learning-based frequency selection method for identifying the trivial frequency components which can be removed without accuracy loss. Compared with the conventional spatial downsampling approaches, the proposed frequency-domain learning method can achieve higher accuracy with reduced input data size. The methodologies proposed in this dissertation are not restricted to the above-mentioned applications. The dissertation also discusses other potential applications and directions for future research.

ContributorsXu, Kai (Author) / Ren, Fengbo (Thesis advisor) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Hardware-friendly Deep Learning for Edge Computing

Description

The Internet-of-Things (IoT) boosts the vast amount of streaming data. However, even considering the growth of the cloud computing infrastructure, IoT devices will generate two orders of magnitude more than the capacity that centralized data center servers can process or store. This trend inevitability calls for the need for offloading…

The Internet-of-Things (IoT) boosts the vast amount of streaming data. However, even considering the growth of the cloud computing infrastructure, IoT devices will generate two orders of magnitude more than the capacity that centralized data center servers can process or store. This trend inevitability calls for the need for offloading IoT data processing to a decentralized edge computing infrastructure. On the other hand, deep-learning-based applications gain great progress by taking advantage of heavy centralized computing resources for training large models to fit increasingly complicated tasks. Even though large-scale deep learning models perform well in terms of accuracy, their high computational complexity makes it impossible to offload them onto edge devices for real-time inference and timely response. To enable timely IoT services on edge devices, this dissertation addresses the challenge from two perspectives. On the hardware side, a new field-programmable gate array (FPGA)-based framework for binary neural network and an application-specific integrated circuit (ASIC) accelerator for natural scene text interpretation are proposed, with the awareness of the computing resources and power constraint on edge. On the algorithm side, this work presents both the methodology of building more compact models and finding better computation-accuracy trade-off for existing models.

ContributorsLi, Yixing (Author) / Ren, Fengbo (Thesis advisor) / Vrudhula, Sarma (Committee member) / Seo, Jae-Sun (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by