Search Content

Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition

Description

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use…

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements.

First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.

Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.

Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.

ContributorsDodge, Samuel Fuller (Author) / Karam, Lina (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2018

Distortion Robust Biometric Recognition

Description

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to…

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to face and ear recognition under unconstrained visual variations, with a main focus on recognition in the presence of blur, occlusion and additive noise distortions.

First, the dissertation addresses the problem of scene variations in the presence of blur, occlusion and additive noise distortions resulting from capture, processing and transmission. Despite their excellent performance, ’deep’ methods are susceptible to visual distortions, which significantly reduce their performance. Sparse representations, on the other hand, have shown huge potential capabilities in handling problems, such as occlusion and corruption. In this work, an augmented SRC (ASRC) framework is presented to improve the performance of the Spare Representation Classifier (SRC) in the presence of blur, additive noise and block occlusion, while preserving its robustness to scene dependent variations. Different feature types are considered in the performance evaluation including image raw pixels, HoG and deep learning VGG-Face. The proposed ASRC framework is shown to outperform the conventional SRC in terms of recognition accuracy, in addition to other existing sparse-based methods and blur invariant methods at medium to high levels of distortion, when particularly used with discriminative features.

In order to assess the quality of features in improving both the sparsity of the representation and the classification accuracy, a feature sparse coding and classification index (FSCCI) is proposed and used for feature ranking and selection within both the SRC and ASRC frameworks.

The second part of the dissertation presents a method for unconstrained ear recognition using deep learning features. The unconstrained ear recognition is performed using transfer learning with deep neural networks (DNNs) as a feature extractor followed by a shallow classifier. Data augmentation is used to improve the recognition performance by augmenting the training dataset with image transformations. The recognition performance of the feature extraction models is compared with an ensemble of fine-tuned networks. The results show that, in the case where long training time is not desirable or a large amount of data is not available, the features from pre-trained DNNs can be used with a shallow classifier to give a comparable recognition accuracy to the fine-tuned networks.

ContributorsMounsef, Jinane (Author) / Karam, Lina (Thesis advisor) / Papandreou-Suppapola, Antonia (Committee member) / Li, Baoxin (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)

Created2018

Low Complexity Optical Flow Using Neighbor-Guided Semi-Global Matching

Description

Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory requirements. This thesis focuses on designing low complexity dense optical flow algorithms.

First, a new method for optical flow that is based on Semi-Global Matching (SGM), a popular dynamic…

Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory requirements. This thesis focuses on designing low complexity dense optical flow algorithms.

First, a new method for optical flow that is based on Semi-Global Matching (SGM), a popular dynamic programming algorithm for stereo vision, is presented. In SGM, the disparity of each pixel is calculated by aggregating local matching costs over the entire image to resolve local ambiguity in texture-less and occluded regions. The proposed method, Neighbor-Guided Semi-Global Matching (NG-fSGM) achieves significantly less complexity compared to SGM, by 1) operating on a subset of the search space that has been aggressively pruned based on neighboring pixels’ information, 2) using a simple cost aggregation function, 3) approximating aggregated cost array and embedding pixel-wise matching cost computation and flow computation in aggregation. Evaluation on the Middlebury benchmark suite showed that, compared to a prior SGM extension for optical flow, the proposed basic NG-fSGM provides robust optical flow with 0.53% accuracy improvement, 40x reduction in number of operations and 6x reduction in memory size. To further reduce the complexity, sparse-to-dense flow estimation method is proposed. The number of operations and memory size are reduced by 68% and 47%, respectively, with only 0.42% accuracy degradation, compared to the basic NG-fSGM.

A parallel block-based version of NG-fSGM is also proposed. The image is divided into overlapping blocks and the blocks are processed in parallel to improve throughput, latency and power efficiency. To minimize the amount of overlap among blocks with minimal effect on the accuracy, temporal information is used to estimate a flow map that guides flow vector selections for pixels along block boundaries. The proposed block-based NG-fSGM achieves significant reduction in complexity with only 0.51% accuracy degradation compared to the basic NG-fSGM.

ContributorsXiang, Jiang (Author) / Chakrabarti, Chaitali (Thesis advisor) / Karam, Lina (Committee member) / Kim, Hun Seok (Committee member) / Arizona State University (Publisher)

Created2017

Locally Adaptive Stereo Vision Based 3D Visual Reconstruction

Description

Using stereo vision for 3D reconstruction and depth estimation has become a popular and promising research area as it has a simple setup with passive cameras and relatively efficient processing procedure. The work in this dissertation focuses on locally adaptive stereo vision methods and applications to different imaging setups and…

Using stereo vision for 3D reconstruction and depth estimation has become a popular and promising research area as it has a simple setup with passive cameras and relatively efficient processing procedure. The work in this dissertation focuses on locally adaptive stereo vision methods and applications to different imaging setups and image scenes.

Solder ball height and substrate coplanarity inspection is essential to the detection of potential connectivity issues in semi-conductor units. Current ball height and substrate coplanarity inspection tools are expensive and slow, which makes them difficult to use in a real-time manufacturing setting. In this dissertation, an automatic, stereo vision based, in-line ball height and coplanarity inspection method is presented. The proposed method includes an imaging setup together with a computer vision algorithm for reliable, in-line ball height measurement. The imaging setup and calibration, ball height estimation and substrate coplanarity calculation are presented with novel stereo vision methods. The results of the proposed method are evaluated in a measurement capability analysis (MCA) procedure and compared with the ground-truth obtained by an existing laser scanning tool and an existing confocal inspection tool. The proposed system outperforms existing inspection tools in terms of accuracy and stability.

In a rectified stereo vision system, stereo matching methods can be categorized into global methods and local methods. Local stereo methods are more suitable for real-time processing purposes with competitive accuracy as compared with global methods. This work proposes a stereo matching method based on sparse locally adaptive cost aggregation. In order to reduce outlier disparity values that correspond to mis-matches, a novel sparse disparity subset selection method is proposed by assigning a significance status to candidate disparity values, and selecting the significant disparity values adaptively. An adaptive guided filtering method using the disparity subset for refined cost aggregation and disparity calculation is demonstrated. The proposed stereo matching algorithm is tested on the Middlebury and the KITTI stereo evaluation benchmark images. A performance analysis of the proposed method in terms of the I0 norm of the disparity subset is presented to demonstrate the achieved efficiency and accuracy.

ContributorsLi, Jinjin (Author) / Karam, Lina (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Patel, Nital (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2017

Monitoring physiological signals using camera

Description

Monitoring vital physiological signals, such as heart rate, blood pressure and breathing pattern, are basic requirements in the diagnosis and management of various diseases. Traditionally, these signals are measured only in hospital and clinical settings. An important recent trend is the development of portable devices for tracking these physiological signals…

Monitoring vital physiological signals, such as heart rate, blood pressure and breathing pattern, are basic requirements in the diagnosis and management of various diseases. Traditionally, these signals are measured only in hospital and clinical settings. An important recent trend is the development of portable devices for tracking these physiological signals non-invasively by using optical methods. These portable devices, when combined with cell phones, tablets or other mobile devices, provide a new opportunity for everyone to monitor one’s vital signs out of clinic.

This thesis work develops camera-based systems and algorithms to monitor several physiological waveforms and parameters, without having to bring the sensors in contact with a subject. Based on skin color change, photoplethysmogram (PPG) waveform is recorded, from which heart rate and pulse transit time are obtained. Using a dual-wavelength illumination and triggered camera control system, blood oxygen saturation level is captured. By monitoring shoulder movement using differential imaging processing method, respiratory information is acquired, including breathing rate and breathing volume. Ballistocardiogram (BCG) is obtained based on facial feature detection and motion tracking. Blood pressure is further calculated from simultaneously recorded PPG and BCG, based on the time difference between these two waveforms.

The developed methods have been validated by comparisons against reference devices and through pilot studies. All of the aforementioned measurements are conducted without any physical contact between sensors and subjects. The work presented herein provides alternative solutions to track one’s health and wellness under normal living condition.

ContributorsShao, Dangdang (Author) / Tao, Nongjian (Thesis advisor) / Li, Baoxin (Committee member) / Hekler, Eric (Committee member) / Karam, Lina (Committee member) / Arizona State University (Publisher)

Created2016

Traffic characterization and modeling of H.264 scalable & multi view encoded video

Description

Present day Internet Protocol (IP) based video transport and dissemination systems are heterogeneous in that they differ in network bandwidth, display resolutions and processing capabilities. One important objective in such an environment is the flexible adaptation of once-encoded content and to achieve this, one popular method is the scalable video…

Present day Internet Protocol (IP) based video transport and dissemination systems are heterogeneous in that they differ in network bandwidth, display resolutions and processing capabilities. One important objective in such an environment is the flexible adaptation of once-encoded content and to achieve this, one popular method is the scalable video coding (SVC) technique. The SVC extension of the H.264/AVC standard has higher compression efficiency when compared to the previous scalable video standards. The network transport of 3D video, which is obtained by superimposing two views of a video scene, poses significant challenges due to the increased video data compared to conventional single-view video. Addressing these challenges requires a thorough understanding of the traffic and multiplexing characteristics of the different representation formats of 3D video. In this study, H.264 quality scalability and multiview representation formats are examined. As H.264/AVC, it's SVC and multiview extensions are expected to become widely adopted for the network transport of video, it is important to thoroughly study their network traffic characteristics, including the bit rate variability. Primarily the focus is on the SVC amendment of the H.264/AVC standard, with particular focus on Coarse-Grain Scalability (CGS) and Medium-Grain Scalability (MGS). In this study, we report on a large-scale study of the rate-distortion (RD) and rate variability-distortion (VD) characteristics of CGS and MGS. We also examine the RD and VD characteristics of three main multiview (3D) representation formats. Specifically, we compare multiview video (MV) representation and encoding, frame sequential (FS) representation, and side-by-side (SBS) representation; whereby conventional single-view encoding is employed for the FS and SBS representations. As a last step, we also examine Video traffic modeling which plays a major part in network traffic analysis. It is imperative to network design and simulation, providing Quality of Service (QoS) to network applications, besides providing insights into the coding process and structure of video sequences. We propose our models on top of the recent unified traffic model developed by Dai et al. [1], for modeling MPEG-4 and H.264 VBR video traffic. We exploit the hierarchical predication structure inherent in H.264 for intra-GoP (group of pictures) analysis.

ContributorsPulipaka, Venkata Sai Akshay (Author) / Reisslein, Martin (Thesis advisor) / Karam, Lina (Thesis advisor) / Li, Baoxin (Committee member) / Seeling, Patrick (Committee member) / Arizona State University (Publisher)

Created2012

A single-phase current source solar inverter with constant instantaneous power, improved reliability, and reduced-size DC-link filter

Description

This dissertation presents a novel current source converter topology that is primarily intended for single-phase photovoltaic (PV) applications. In comparison with the existing PV inverter technology, the salient features of the proposed topology are: a) the low frequency (double of line frequency) ripple that is common to single-phase inverters is…

This dissertation presents a novel current source converter topology that is primarily intended for single-phase photovoltaic (PV) applications. In comparison with the existing PV inverter technology, the salient features of the proposed topology are: a) the low frequency (double of line frequency) ripple that is common to single-phase inverters is greatly reduced; b) the absence of low frequency ripple enables significantly reduced size pass components to achieve necessary DC-link stiffness and c) improved maximum power point tracking (MPPT) performance is readily achieved due to the tightened current ripple even with reduced-size passive components. The proposed topology does not utilize any electrolytic capacitors. Instead an inductor is used as the DC-link filter and reliable AC film capacitors are utilized for the filter and auxiliary capacitor. The proposed topology has a life expectancy on par with PV panels. The proposed modulation technique can be used for any current source inverter where an unbalanced three-phase operation is desires such as active filters and power controllers. The proposed topology is ready for the next phase of microgrid and power system controllers in that it accepts reactive power commands. This work presents the proposed topology and its working principle supported by with numerical verifications and hardware results. Conclusions and future work are also presented.

ContributorsBush, Craig R (Author) / Ayyanar, Raja (Thesis advisor) / Karam, Lina (Committee member) / Heydt, Gerald (Committee member) / Karady, George G. (Committee member) / Arizona State University (Publisher)

Created2013

Power system mode estimation using associate hermite expansion

Description

Many methods have been proposed to estimate power system small signal stability, for either analysis or control, through identification of modal frequencies and their damping levels. Generally, estimation methods have been employed to assess small signal stability from collected field measurements. However, the challenge to using these methods in assessing…

Many methods have been proposed to estimate power system small signal stability, for either analysis or control, through identification of modal frequencies and their damping levels. Generally, estimation methods have been employed to assess small signal stability from collected field measurements. However, the challenge to using these methods in assessing field measurements is their ability to accurately estimate stability in the presence of noise. In this thesis a new method is developed which estimates the modal content of simulated and actual field measurements using orthogonal polynomials and the results are compared to other commonly used estimators. This new method estimates oscillatory performance by fitting an associate Hermite polynomial to time domain data and extrapolating its spectrum to identify small signal power system frequencies. Once the frequencies are identified, damping assessment is performed using a modified sliding window technique with the use of linear prediction (LP). Once the entire assessment is complete the measurements can be judged to be stable or unstable. Collectively, this new technique is known as the associate Hermite expansion (AHE) algorithm. Validation of the AHE method versus results from four other spectral estimators demonstrates the method's accuracy and modal estimation ability with and without the presence of noise. A Prony analysis, a Yule-Walker autoregressive algorithm, a second sliding window estimator and the Hilbert-Huang Transform method are used in comparative assessments in support of this thesis. Results from simulated and actual field measurements are used in the comparisons, as well as artificially generated simple signals. A search for actual field testing results performed by a utility was undertaken and a request was made to obtain the measurements of a brake insertion test. Comparison results show that the AHE method is accurate as compared to the other commonly used spectral estimators and its predictive capability exceeded the other estimators in the presence of Gaussian noise. As a result, the AHE method could be employed in areas including operations and planning analysis, post-mortem analysis, power system damping scheme design and other analysis areas.

ContributorsKokanos, Barrie Lee (Author) / Karady, George G. (Thesis advisor) / Heydt, Gerald (Committee member) / Farmer, Richard G (Committee member) / Ayyanar, Raja (Committee member) / Karam, Lina (Committee member) / Arizona State University (Publisher)

Created2010

Noise resilient image segmentation and classification methods with applications in biomedical and semiconductor images

Description

Thousands of high-resolution images are generated each day. Segmenting, classifying, and analyzing the contents of these images are the key steps in image understanding. This thesis focuses on image segmentation and classification and its applications in synthetic, texture, natural, biomedical, and industrial images. A robust level-set-based multi-region and texture image…

Thousands of high-resolution images are generated each day. Segmenting, classifying, and analyzing the contents of these images are the key steps in image understanding. This thesis focuses on image segmentation and classification and its applications in synthetic, texture, natural, biomedical, and industrial images. A robust level-set-based multi-region and texture image segmentation approach is proposed in this thesis to tackle most of the challenges in the existing multi-region segmentation methods, including computational complexity and sensitivity to initialization. Medical image analysis helps in understanding biological processes and disease pathologies. In this thesis, two cell evolution analysis schemes are proposed for cell cluster extraction in order to analyze cell migration, cell proliferation, and cell dispersion in different cancer cell images. The proposed schemes accurately segment both the cell cluster area and the individual cells inside and outside the cell cluster area. The method is currently used by different cell biology labs to study the behavior of cancer cells, which helps in drug discovery. Defects can cause failure to motherboards, processors, and semiconductor units. An automatic defect detection and classification methodology is very desirable in many industrial applications. This helps in producing consistent results, facilitating the processing, speeding up the processing time, and reducing the cost. In this thesis, three defect detection and classification schemes are proposed to automatically detect and classify different defects related to semiconductor unit images. The first proposed defect detection scheme is used to detect and classify the solder balls in the processor sockets as either defective (Non-Wet) or non-defective. The method produces a 96% classification rate and saves 89% of the time used by the operator. The second proposed defect detection scheme is used for detecting and measuring voids inside solder balls of different boards and products. The third proposed defect detection scheme is used to detect different defects in the die area of semiconductor unit images such as cracks, scratches, foreign materials, fingerprints, and stains. The three proposed defect detection schemes give high accuracy and are inexpensive to implement compared to the existing high cost state-of-the-art machines.

ContributorsSaid, Asaad F (Author) / Karam, Lina (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Patel, Nital (Committee member) / Arizona State University (Publisher)

Created2010

Visual quality with a focus on 3D blur discrimination and texture granularity

Description

Blur is an important attribute in the study and modeling of the human visual system. In this work, 3D blur discrimination experiments are conducted to measure the just noticeable additional blur required to differentiate a target blur from the reference blur level. The past studies on blur discrimination have measured…

Blur is an important attribute in the study and modeling of the human visual system. In this work, 3D blur discrimination experiments are conducted to measure the just noticeable additional blur required to differentiate a target blur from the reference blur level. The past studies on blur discrimination have measured the sensitivity of the human visual system to blur using 2D test patterns. In this dissertation, subjective tests are performed to measure blur discrimination thresholds using stereoscopic 3D test patterns. The results of this study indicate that, in the symmetric stereo viewing case, binocular disparity does not affect the blur discrimination thresholds for the selected 3D test patterns. In the asymmetric viewing case, the blur discrimination thresholds decreased and the decrease in threshold values is found to be dominated by the eye observing the higher blur.

The second part of the dissertation focuses on texture granularity in the context of 2D images. A texture granularity database referred to as GranTEX, consisting of textures with varying granularity levels is constructed. A subjective study is conducted to measure the perceived granularity level of textures present in the GranTEX database. An objective index that automatically measures the perceived granularity level of textures is also presented. It is shown that the proposed granularity metric correlates well with the subjective granularity scores and outperforms the other methods presented in the literature.

A subjective study is conducted to assess the effect of compression on textures with varying degrees of granularity. A logarithmic function model is proposed as a fit to the subjective test data. It is demonstrated that the proposed model can be used for rate-distortion control by allowing the automatic selection of the needed compression ratio for a target visual quality. The proposed model can also be used for visual quality assessment by providing a measure of the visual quality for a target compression ratio.

The effect of texture granularity on the quality of synthesized textures is studied. A subjective study is presented to assess the quality of synthesized textures with varying levels of texture granularity using different types of texture synthesis methods. This work also proposes a reduced-reference visual quality index referred to as delta texture granularity index for assessing the visual quality of synthesized textures.

ContributorsSubedar, Mahesh M (Author) / Karam, Lina (Thesis advisor) / Abousleman, Glen (Committee member) / Li, Baoxin (Committee member) / Reisslein, Martin (Committee member) / Arizona State University (Publisher)

Created2015