Matching Items (23)
Filtering by

Clear all filters

152770-Thumbnail Image.png
Description
Texture analysis plays an important role in applications like automated pattern inspection, image and video compression, content-based image retrieval, remote-sensing, medical imaging and document processing, to name a few. Texture Structure Analysis is the process of studying the structure present in the textures. This structure can be expressed in terms

Texture analysis plays an important role in applications like automated pattern inspection, image and video compression, content-based image retrieval, remote-sensing, medical imaging and document processing, to name a few. Texture Structure Analysis is the process of studying the structure present in the textures. This structure can be expressed in terms of perceived regularity. Our human visual system (HVS) uses the perceived regularity as one of the important pre-attentive cues in low-level image understanding. Similar to the HVS, image processing and computer vision systems can make fast and efficient decisions if they can quantify this regularity automatically. In this work, the problem of quantifying the degree of perceived regularity when looking at an arbitrary texture is introduced and addressed. One key contribution of this work is in proposing an objective no-reference perceptual texture regularity metric based on visual saliency. Other key contributions include an adaptive texture synthesis method based on texture regularity, and a low-complexity reduced-reference visual quality metric for assessing the quality of synthesized textures. In order to use the best performing visual attention model on textures, the performance of the most popular visual attention models to predict the visual saliency on textures is evaluated. Since there is no publicly available database with ground-truth saliency maps on images with exclusive texture content, a new eye-tracking database is systematically built. Using the Visual Saliency Map (VSM) generated by the best visual attention model, the proposed texture regularity metric is computed. The proposed metric is based on the observation that VSM characteristics differ between textures of differing regularity. The proposed texture regularity metric is based on two texture regularity scores, namely a textural similarity score and a spatial distribution score. In order to evaluate the performance of the proposed regularity metric, a texture regularity database called RegTEX, is built as a part of this work. It is shown through subjective testing that the proposed metric has a strong correlation with the Mean Opinion Score (MOS) for the perceived regularity of textures. The proposed method is also shown to be robust to geometric and photometric transformations and outperforms some of the popular texture regularity metrics in predicting the perceived regularity. The impact of the proposed metric to improve the performance of many image-processing applications is also presented. The influence of the perceived texture regularity on the perceptual quality of synthesized textures is demonstrated through building a synthesized textures database named SynTEX. It is shown through subjective testing that textures with different degrees of perceived regularities exhibit different degrees of vulnerability to artifacts resulting from different texture synthesis approaches. This work also proposes an algorithm for adaptively selecting the appropriate texture synthesis method based on the perceived regularity of the original texture. A reduced-reference texture quality metric for texture synthesis is also proposed as part of this work. The metric is based on the change in perceived regularity and the change in perceived granularity between the original and the synthesized textures. The perceived granularity is quantified through a new granularity metric that is proposed in this work. It is shown through subjective testing that the proposed quality metric, using just 2 parameters, has a strong correlation with the MOS for the fidelity of synthesized textures and outperforms the state-of-the-art full-reference quality metrics on 3 different texture databases. Finally, the ability of the proposed regularity metric in predicting the perceived degradation of textures due to compression and blur artifacts is also established.
ContributorsVaradarajan, Srenivas (Author) / Karam, Lina J (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Li, Baoxin (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)
Created2014
150362-Thumbnail Image.png
Description
There are many wireless communication and networking applications that require high transmission rates and reliability with only limited resources in terms of bandwidth, power, hardware complexity etc.. Real-time video streaming, gaming and social networking are a few such examples. Over the years many problems have been addressed towards the goal

There are many wireless communication and networking applications that require high transmission rates and reliability with only limited resources in terms of bandwidth, power, hardware complexity etc.. Real-time video streaming, gaming and social networking are a few such examples. Over the years many problems have been addressed towards the goal of enabling such applications; however, significant challenges still remain, particularly, in the context of multi-user communications. With the motivation of addressing some of these challenges, the main focus of this dissertation is the design and analysis of capacity approaching coding schemes for several (wireless) multi-user communication scenarios. Specifically, three main themes are studied: superposition coding over broadcast channels, practical coding for binary-input binary-output broadcast channels, and signalling schemes for two-way relay channels. As the first contribution, we propose an analytical tool that allows for reliable comparison of different practical codes and decoding strategies over degraded broadcast channels, even for very low error rates for which simulations are impractical. The second contribution deals with binary-input binary-output degraded broadcast channels, for which an optimal encoding scheme that achieves the capacity boundary is found, and a practical coding scheme is given by concatenation of an outer low density parity check code and an inner (non-linear) mapper that induces desired distribution of "one" in a codeword. The third contribution considers two-way relay channels where the information exchange between two nodes takes place in two transmission phases using a coding scheme called physical-layer network coding. At the relay, a near optimal decoding strategy is derived using a list decoding algorithm, and an approximation is obtained by a joint decoding approach. For the latter scheme, an analytical approximation of the word error rate based on a union bounding technique is computed under the assumption that linear codes are employed at the two nodes exchanging data. Further, when the wireless channel is frequency selective, two decoding strategies at the relay are developed, namely, a near optimal decoding scheme implemented using list decoding, and a reduced complexity detection/decoding scheme utilizing a linear minimum mean squared error based detector followed by a network coded sequence decoder.
ContributorsBhat, Uttam (Author) / Duman, Tolga M. (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Li, Baoxin (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)
Created2011
149977-Thumbnail Image.png
Description
Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.
ContributorsPeng, Bo (Author) / Qian, Gang (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2011
150599-Thumbnail Image.png
Description
Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.
ContributorsMcDaniel, Troy Lee (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)
Created2012
156384-Thumbnail Image.png
Description
Digital imaging and image processing technologies have revolutionized the way in which

we capture, store, receive, view, utilize, and share images. In image-based applications,

through different processing stages (e.g., acquisition, compression, and transmission), images

are subjected to different types of distortions which degrade their visual quality. Image

Quality Assessment (IQA) attempts to use computational

Digital imaging and image processing technologies have revolutionized the way in which

we capture, store, receive, view, utilize, and share images. In image-based applications,

through different processing stages (e.g., acquisition, compression, and transmission), images

are subjected to different types of distortions which degrade their visual quality. Image

Quality Assessment (IQA) attempts to use computational models to automatically evaluate

and estimate the image quality in accordance with subjective evaluations. Moreover, with

the fast development of computer vision techniques, it is important in practice to extract

and understand the information contained in blurred images or regions.

The work in this dissertation focuses on reduced-reference visual quality assessment of

images and textures, as well as perceptual-based spatially-varying blur detection.

A training-free low-cost Reduced-Reference IQA (RRIQA) method is proposed. The

proposed method requires a very small number of reduced-reference (RR) features. Extensive

experiments performed on different benchmark databases demonstrate that the proposed

RRIQA method, delivers highly competitive performance as compared with the

state-of-the-art RRIQA models for both natural and texture images.

In the context of texture, the effect of texture granularity on the quality of synthesized

textures is studied. Moreover, two RR objective visual quality assessment methods that

quantify the perceived quality of synthesized textures are proposed. Performance evaluations

on two synthesized texture databases demonstrate that the proposed RR metrics outperforms

full-reference (FR), no-reference (NR), and RR state-of-the-art quality metrics in

predicting the perceived visual quality of the synthesized textures.

Last but not least, an effective approach to address the spatially-varying blur detection

problem from a single image without requiring any knowledge about the blur type, level,

or camera settings is proposed. The evaluations of the proposed approach on a diverse

sets of blurry images with different blur types, levels, and content demonstrate that the

proposed algorithm performs favorably against the state-of-the-art methods qualitatively

and quantitatively.
ContributorsGolestaneh, Seyedalireza (Author) / Karam, Lina (Thesis advisor) / Bliss, Daniel W. (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan K. (Committee member) / Arizona State University (Publisher)
Created2018
156219-Thumbnail Image.png
Description
Deep learning architectures have been widely explored in computer vision and have

depicted commendable performance in a variety of applications. A fundamental challenge

in training deep networks is the requirement of large amounts of labeled training

data. While gathering large quantities of unlabeled data is cheap and easy, annotating

the data is an expensive

Deep learning architectures have been widely explored in computer vision and have

depicted commendable performance in a variety of applications. A fundamental challenge

in training deep networks is the requirement of large amounts of labeled training

data. While gathering large quantities of unlabeled data is cheap and easy, annotating

the data is an expensive process in terms of time, labor and human expertise.

Thus, developing algorithms that minimize the human effort in training deep models

is of immense practical importance. Active learning algorithms automatically identify

salient and exemplar samples from large amounts of unlabeled data and can augment

maximal information to supervised learning models, thereby reducing the human annotation

effort in training machine learning models. The goal of this dissertation is to

fuse ideas from deep learning and active learning and design novel deep active learning

algorithms. The proposed learning methodologies explore diverse label spaces to

solve different computer vision applications. Three major contributions have emerged

from this work; (i) a deep active framework for multi-class image classication, (ii)

a deep active model with and without label correlation for multi-label image classi-

cation and (iii) a deep active paradigm for regression. Extensive empirical studies

on a variety of multi-class, multi-label and regression vision datasets corroborate the

potential of the proposed methods for real-world applications. Additional contributions

include: (i) a multimodal emotion database consisting of recordings of facial

expressions, body gestures, vocal expressions and physiological signals of actors enacting

various emotions, (ii) four multimodal deep belief network models and (iii)

an in-depth analysis of the effect of transfer of multimodal emotion features between

source and target networks on classification accuracy and training time. These related

contributions help comprehend the challenges involved in training deep learning

models and motivate the main goal of this dissertation.
ContributorsRanganathan, Hiranmayi (Author) / Sethuraman, Panchanathan (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Li, Baoxin (Committee member) / Chakraborty, Shayok (Committee member) / Arizona State University (Publisher)
Created2018
156972-Thumbnail Image.png
Description
Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to face and ear recognition under unconstrained visual variations, with a main focus on recognition in the presence of blur, occlusion and additive noise distortions.

First, the dissertation addresses the problem of scene variations in the presence of blur, occlusion and additive noise distortions resulting from capture, processing and transmission. Despite their excellent performance, ’deep’ methods are susceptible to visual distortions, which significantly reduce their performance. Sparse representations, on the other hand, have shown huge potential capabilities in handling problems, such as occlusion and corruption. In this work, an augmented SRC (ASRC) framework is presented to improve the performance of the Spare Representation Classifier (SRC) in the presence of blur, additive noise and block occlusion, while preserving its robustness to scene dependent variations. Different feature types are considered in the performance evaluation including image raw pixels, HoG and deep learning VGG-Face. The proposed ASRC framework is shown to outperform the conventional SRC in terms of recognition accuracy, in addition to other existing sparse-based methods and blur invariant methods at medium to high levels of distortion, when particularly used with discriminative features.

In order to assess the quality of features in improving both the sparsity of the representation and the classification accuracy, a feature sparse coding and classification index (FSCCI) is proposed and used for feature ranking and selection within both the SRC and ASRC frameworks.

The second part of the dissertation presents a method for unconstrained ear recognition using deep learning features. The unconstrained ear recognition is performed using transfer learning with deep neural networks (DNNs) as a feature extractor followed by a shallow classifier. Data augmentation is used to improve the recognition performance by augmenting the training dataset with image transformations. The recognition performance of the feature extraction models is compared with an ensemble of fine-tuned networks. The results show that, in the case where long training time is not desirable or a large amount of data is not available, the features from pre-trained DNNs can be used with a shallow classifier to give a comparable recognition accuracy to the fine-tuned networks.
ContributorsMounsef, Jinane (Author) / Karam, Lina (Thesis advisor) / Papandreou-Suppapola, Antonia (Committee member) / Li, Baoxin (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)
Created2018
155774-Thumbnail Image.png
Description
In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the

In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the emergence of novel imagers called spatial-multiplexing cameras, which offer compression at the sensing level itself by providing an arbitrary linear measurements of the scene instead of pixel-based sampling. In this dissertation, I discuss various approaches for effective information extraction from spatial-multiplexing measurements and present the trade-offs between reliability of the performance and computational/storage load of the system. In the first part, I present a reconstruction-free approach to high-level inference in computer vision, wherein I consider the specific case of activity analysis, and show that using correlation filters, one can perform effective action recognition and localization directly from a class of spatial-multiplexing cameras, called compressive cameras, even at very low measurement rates of 1\%. In the second part, I outline a deep learning based non-iterative and real-time algorithm to reconstruct images from compressively sensed (CS) measurements, which can outperform the traditional iterative CS reconstruction algorithms in terms of reconstruction quality and time complexity, especially at low measurement rates. To overcome the limitations of compressive cameras, which are operated with random measurements and not particularly tuned to any task, in the third part of the dissertation, I propose a method to design spatial-multiplexing measurements, which are tuned to facilitate the easy extraction of features that are useful in computer vision tasks like object tracking. The work presented in the dissertation provides sufficient evidence to high-level inference in computer vision at extremely low measurement rates, and hence allows us to think about the possibility of revamping the current day computer systems.
ContributorsKulkarni, Kuldeep Sharad (Author) / Turaga, Pavan (Thesis advisor) / Li, Baoxin (Committee member) / Chakrabarti, Chaitali (Committee member) / Sankaranarayanan, Aswin (Committee member) / LiKamWa, Robert (Committee member) / Arizona State University (Publisher)
Created2017
155148-Thumbnail Image.png
Description
Visual attention (VA) is the study of mechanisms that allow the human visual system (HVS) to selectively process relevant visual information. This work focuses on the subjective and objective evaluation of computational VA models for the distortion-free case as well as in the presence of image distortions.



Existing VA models are

Visual attention (VA) is the study of mechanisms that allow the human visual system (HVS) to selectively process relevant visual information. This work focuses on the subjective and objective evaluation of computational VA models for the distortion-free case as well as in the presence of image distortions.



Existing VA models are traditionally evaluated by using VA metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though there is a considerable number of objective VA metrics, there exists no study that validates that these metrics are adequate for the evaluation of VA models. This work constructs a VA Quality (VAQ) Database by subjectively assessing the prediction performance of VA models on distortion-free images. Additionally, shortcomings in existing metrics are discussed through illustrative examples and a new metric that uses local weights based on fixation density and that overcomes these flaws, is proposed. The proposed VA metric outperforms all other popular existing metrics in terms of the correlation with subjective ratings.



In practice, the image quality is affected by a host of factors at several stages of the image processing pipeline such as acquisition, compression, and transmission. However, none of the existing studies have discussed the subjective and objective evaluation of visual saliency models in the presence of distortion. In this work, a Distortion-based Visual Attention Quality (DVAQ) subjective database is constructed to evaluate the quality of VA maps for images in the presence of distortions. For creating this database, saliency maps obtained from images subjected to various types of distortions, including blur, noise and compression, and varying levels of distortion severity are rated by human observers in terms of their visual resemblance to corresponding ground-truth fixation density maps. The performance of traditionally used as well as recently proposed VA metrics are evaluated by correlating their scores with the human subjective ratings. In addition, an objective evaluation of 20 state-of-the-art VA models is performed using the top-performing VA metrics together with a study of how the VA models’ prediction performance changes with different types and levels of distortions.
ContributorsGide, Milind Subhash (Author) / Karam, Lina J (Thesis advisor) / Abousleman, Glen (Committee member) / Li, Baoxin (Committee member) / Reisslein, Martin (Committee member) / Arizona State University (Publisher)
Created2016
155174-Thumbnail Image.png
Description
Monitoring vital physiological signals, such as heart rate, blood pressure and breathing pattern, are basic requirements in the diagnosis and management of various diseases. Traditionally, these signals are measured only in hospital and clinical settings. An important recent trend is the development of portable devices for tracking these physiological signals

Monitoring vital physiological signals, such as heart rate, blood pressure and breathing pattern, are basic requirements in the diagnosis and management of various diseases. Traditionally, these signals are measured only in hospital and clinical settings. An important recent trend is the development of portable devices for tracking these physiological signals non-invasively by using optical methods. These portable devices, when combined with cell phones, tablets or other mobile devices, provide a new opportunity for everyone to monitor one’s vital signs out of clinic.

This thesis work develops camera-based systems and algorithms to monitor several physiological waveforms and parameters, without having to bring the sensors in contact with a subject. Based on skin color change, photoplethysmogram (PPG) waveform is recorded, from which heart rate and pulse transit time are obtained. Using a dual-wavelength illumination and triggered camera control system, blood oxygen saturation level is captured. By monitoring shoulder movement using differential imaging processing method, respiratory information is acquired, including breathing rate and breathing volume. Ballistocardiogram (BCG) is obtained based on facial feature detection and motion tracking. Blood pressure is further calculated from simultaneously recorded PPG and BCG, based on the time difference between these two waveforms.

The developed methods have been validated by comparisons against reference devices and through pilot studies. All of the aforementioned measurements are conducted without any physical contact between sensors and subjects. The work presented herein provides alternative solutions to track one’s health and wellness under normal living condition.
ContributorsShao, Dangdang (Author) / Tao, Nongjian (Thesis advisor) / Li, Baoxin (Committee member) / Hekler, Eric (Committee member) / Karam, Lina (Committee member) / Arizona State University (Publisher)
Created2016