Matching Items (25)

128281-Thumbnail Image.png

The Effect of Applying 2D Enhancement Algorithms on 3D Video Content

Description

Enhancement algorithms are typically applied to video content to increase their appeal to viewers. Such algorithms are readily available in the literature and are already widely applied in, for example,

Enhancement algorithms are typically applied to video content to increase their appeal to viewers. Such algorithms are readily available in the literature and are already widely applied in, for example, commercially available TVs. On the contrary, not much research has been done on enhancing stereoscopic 3D video content. In this paper, we present research focused on the effect of applying enhancement algorithms used for 2D content on 3D side-by-side content. We evaluate both offline enhancement of video content based on proprietary enhancement algorithms and real-time enhancement in the TVs. This is done using stereoscopic TVs with active shutter glasses, viewed both in their 2D and 3D viewing mode. The results of this research show that 2D enhancement algorithms are a viable first approach to enhance 3D content. In addition to video quality degradation due to the loss of spatial resolution as a consequence of the 3D video format, brightness reduction inherent to polarized or shutter glasses similarly degrades video quality. We illustrate the benefit of providing brightness enhancement for stereoscopic displays.

Contributors

Agent

Created

Date Created
  • 2014-06-19

152216-Thumbnail Image.png

A single-phase current source solar inverter with constant instantaneous power, improved reliability, and reduced-size DC-link filter

Description

This dissertation presents a novel current source converter topology that is primarily intended for single-phase photovoltaic (PV) applications. In comparison with the existing PV inverter technology, the salient features of

This dissertation presents a novel current source converter topology that is primarily intended for single-phase photovoltaic (PV) applications. In comparison with the existing PV inverter technology, the salient features of the proposed topology are: a) the low frequency (double of line frequency) ripple that is common to single-phase inverters is greatly reduced; b) the absence of low frequency ripple enables significantly reduced size pass components to achieve necessary DC-link stiffness and c) improved maximum power point tracking (MPPT) performance is readily achieved due to the tightened current ripple even with reduced-size passive components. The proposed topology does not utilize any electrolytic capacitors. Instead an inductor is used as the DC-link filter and reliable AC film capacitors are utilized for the filter and auxiliary capacitor. The proposed topology has a life expectancy on par with PV panels. The proposed modulation technique can be used for any current source inverter where an unbalanced three-phase operation is desires such as active filters and power controllers. The proposed topology is ready for the next phase of microgrid and power system controllers in that it accepts reactive power commands. This work presents the proposed topology and its working principle supported by with numerical verifications and hardware results. Conclusions and future work are also presented.

Contributors

Agent

Created

Date Created
  • 2013

155477-Thumbnail Image.png

Low Complexity Optical Flow Using Neighbor-Guided Semi-Global Matching

Description

Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory requirements. This thesis focuses on designing low complexity

Many real-time vision applications require accurate estimation of optical flow. This problem is quite challenging due to extremely high computation and memory requirements. This thesis focuses on designing low complexity dense optical flow algorithms.

First, a new method for optical flow that is based on Semi-Global Matching (SGM), a popular dynamic programming algorithm for stereo vision, is presented. In SGM, the disparity of each pixel is calculated by aggregating local matching costs over the entire image to resolve local ambiguity in texture-less and occluded regions. The proposed method, Neighbor-Guided Semi-Global Matching (NG-fSGM) achieves significantly less complexity compared to SGM, by 1) operating on a subset of the search space that has been aggressively pruned based on neighboring pixels’ information, 2) using a simple cost aggregation function, 3) approximating aggregated cost array and embedding pixel-wise matching cost computation and flow computation in aggregation. Evaluation on the Middlebury benchmark suite showed that, compared to a prior SGM extension for optical flow, the proposed basic NG-fSGM provides robust optical flow with 0.53% accuracy improvement, 40x reduction in number of operations and 6x reduction in memory size. To further reduce the complexity, sparse-to-dense flow estimation method is proposed. The number of operations and memory size are reduced by 68% and 47%, respectively, with only 0.42% accuracy degradation, compared to the basic NG-fSGM.

A parallel block-based version of NG-fSGM is also proposed. The image is divided into overlapping blocks and the blocks are processed in parallel to improve throughput, latency and power efficiency. To minimize the amount of overlap among blocks with minimal effect on the accuracy, temporal information is used to estimate a flow map that guides flow vector selections for pixels along block boundaries. The proposed block-based NG-fSGM achieves significant reduction in complexity with only 0.51% accuracy degradation compared to the basic NG-fSGM.

Contributors

Agent

Created

Date Created
  • 2017

156747-Thumbnail Image.png

Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition

Description

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements.

First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.

Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.

Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.

Contributors

Agent

Created

Date Created
  • 2018

156972-Thumbnail Image.png

Distortion Robust Biometric Recognition

Description

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to face and ear recognition under unconstrained visual variations, with a main focus on recognition in the presence of blur, occlusion and additive noise distortions.

First, the dissertation addresses the problem of scene variations in the presence of blur, occlusion and additive noise distortions resulting from capture, processing and transmission. Despite their excellent performance, ’deep’ methods are susceptible to visual distortions, which significantly reduce their performance. Sparse representations, on the other hand, have shown huge potential capabilities in handling problems, such as occlusion and corruption. In this work, an augmented SRC (ASRC) framework is presented to improve the performance of the Spare Representation Classifier (SRC) in the presence of blur, additive noise and block occlusion, while preserving its robustness to scene dependent variations. Different feature types are considered in the performance evaluation including image raw pixels, HoG and deep learning VGG-Face. The proposed ASRC framework is shown to outperform the conventional SRC in terms of recognition accuracy, in addition to other existing sparse-based methods and blur invariant methods at medium to high levels of distortion, when particularly used with discriminative features.

In order to assess the quality of features in improving both the sparsity of the representation and the classification accuracy, a feature sparse coding and classification index (FSCCI) is proposed and used for feature ranking and selection within both the SRC and ASRC frameworks.

The second part of the dissertation presents a method for unconstrained ear recognition using deep learning features. The unconstrained ear recognition is performed using transfer learning with deep neural networks (DNNs) as a feature extractor followed by a shallow classifier. Data augmentation is used to improve the recognition performance by augmenting the training dataset with image transformations. The recognition performance of the feature extraction models is compared with an ensemble of fine-tuned networks. The results show that, in the case where long training time is not desirable or a large amount of data is not available, the features from pre-trained DNNs can be used with a shallow classifier to give a comparable recognition accuracy to the fine-tuned networks.

Contributors

Agent

Created

Date Created
  • 2018

158291-Thumbnail Image.png

Transportation Techniques for Geometric Clustering

Description

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing.

Contributors

Agent

Created

Date Created
  • 2020

150476-Thumbnail Image.png

Multidimensional DFT IP generators for FPGA platforms

Description

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.

Contributors

Agent

Created

Date Created
  • 2012

151446-Thumbnail Image.png

Traffic characterization and modeling of H.264 scalable & multi view encoded video

Description

Present day Internet Protocol (IP) based video transport and dissemination systems are heterogeneous in that they differ in network bandwidth, display resolutions and processing capabilities. One important objective in such

Present day Internet Protocol (IP) based video transport and dissemination systems are heterogeneous in that they differ in network bandwidth, display resolutions and processing capabilities. One important objective in such an environment is the flexible adaptation of once-encoded content and to achieve this, one popular method is the scalable video coding (SVC) technique. The SVC extension of the H.264/AVC standard has higher compression efficiency when compared to the previous scalable video standards. The network transport of 3D video, which is obtained by superimposing two views of a video scene, poses significant challenges due to the increased video data compared to conventional single-view video. Addressing these challenges requires a thorough understanding of the traffic and multiplexing characteristics of the different representation formats of 3D video. In this study, H.264 quality scalability and multiview representation formats are examined. As H.264/AVC, it's SVC and multiview extensions are expected to become widely adopted for the network transport of video, it is important to thoroughly study their network traffic characteristics, including the bit rate variability. Primarily the focus is on the SVC amendment of the H.264/AVC standard, with particular focus on Coarse-Grain Scalability (CGS) and Medium-Grain Scalability (MGS). In this study, we report on a large-scale study of the rate-distortion (RD) and rate variability-distortion (VD) characteristics of CGS and MGS. We also examine the RD and VD characteristics of three main multiview (3D) representation formats. Specifically, we compare multiview video (MV) representation and encoding, frame sequential (FS) representation, and side-by-side (SBS) representation; whereby conventional single-view encoding is employed for the FS and SBS representations. As a last step, we also examine Video traffic modeling which plays a major part in network traffic analysis. It is imperative to network design and simulation, providing Quality of Service (QoS) to network applications, besides providing insights into the coding process and structure of video sequences. We propose our models on top of the recent unified traffic model developed by Dai et al. [1], for modeling MPEG-4 and H.264 VBR video traffic. We exploit the hierarchical predication structure inherent in H.264 for intra-GoP (group of pictures) analysis.

Contributors

Agent

Created

Date Created
  • 2012

155174-Thumbnail Image.png

Monitoring physiological signals using camera

Description

Monitoring vital physiological signals, such as heart rate, blood pressure and breathing pattern, are basic requirements in the diagnosis and management of various diseases. Traditionally, these signals are measured only

Monitoring vital physiological signals, such as heart rate, blood pressure and breathing pattern, are basic requirements in the diagnosis and management of various diseases. Traditionally, these signals are measured only in hospital and clinical settings. An important recent trend is the development of portable devices for tracking these physiological signals non-invasively by using optical methods. These portable devices, when combined with cell phones, tablets or other mobile devices, provide a new opportunity for everyone to monitor one’s vital signs out of clinic.

This thesis work develops camera-based systems and algorithms to monitor several physiological waveforms and parameters, without having to bring the sensors in contact with a subject. Based on skin color change, photoplethysmogram (PPG) waveform is recorded, from which heart rate and pulse transit time are obtained. Using a dual-wavelength illumination and triggered camera control system, blood oxygen saturation level is captured. By monitoring shoulder movement using differential imaging processing method, respiratory information is acquired, including breathing rate and breathing volume. Ballistocardiogram (BCG) is obtained based on facial feature detection and motion tracking. Blood pressure is further calculated from simultaneously recorded PPG and BCG, based on the time difference between these two waveforms.

The developed methods have been validated by comparisons against reference devices and through pilot studies. All of the aforementioned measurements are conducted without any physical contact between sensors and subjects. The work presented herein provides alternative solutions to track one’s health and wellness under normal living condition.

Contributors

Agent

Created

Date Created
  • 2016

155540-Thumbnail Image.png

Locally Adaptive Stereo Vision Based 3D Visual Reconstruction

Description

Using stereo vision for 3D reconstruction and depth estimation has become a popular and promising research area as it has a simple setup with passive cameras and relatively efficient processing

Using stereo vision for 3D reconstruction and depth estimation has become a popular and promising research area as it has a simple setup with passive cameras and relatively efficient processing procedure. The work in this dissertation focuses on locally adaptive stereo vision methods and applications to different imaging setups and image scenes.

Solder ball height and substrate coplanarity inspection is essential to the detection of potential connectivity issues in semi-conductor units. Current ball height and substrate coplanarity inspection tools are expensive and slow, which makes them difficult to use in a real-time manufacturing setting. In this dissertation, an automatic, stereo vision based, in-line ball height and coplanarity inspection method is presented. The proposed method includes an imaging setup together with a computer vision algorithm for reliable, in-line ball height measurement. The imaging setup and calibration, ball height estimation and substrate coplanarity calculation are presented with novel stereo vision methods. The results of the proposed method are evaluated in a measurement capability analysis (MCA) procedure and compared with the ground-truth obtained by an existing laser scanning tool and an existing confocal inspection tool. The proposed system outperforms existing inspection tools in terms of accuracy and stability.

In a rectified stereo vision system, stereo matching methods can be categorized into global methods and local methods. Local stereo methods are more suitable for real-time processing purposes with competitive accuracy as compared with global methods. This work proposes a stereo matching method based on sparse locally adaptive cost aggregation. In order to reduce outlier disparity values that correspond to mis-matches, a novel sparse disparity subset selection method is proposed by assigning a significance status to candidate disparity values, and selecting the significant disparity values adaptively. An adaptive guided filtering method using the disparity subset for refined cost aggregation and disparity calculation is demonstrated. The proposed stereo matching algorithm is tested on the Middlebury and the KITTI stereo evaluation benchmark images. A performance analysis of the proposed method in terms of the I0 norm of the disparity subset is presented to demonstrate the achieved efficiency and accuracy.

Contributors

Agent

Created

Date Created
  • 2017