Matching Items (29)

152126-Thumbnail Image.png

Saliency cut: an automatic approach for video object segmentation based on saliency energy minimization

Description

Video object segmentation (VOS) is an important task in computer vision with a lot of applications, e.g., video editing, object tracking, and object based encoding. Different from image object segmentation,

Video object segmentation (VOS) is an important task in computer vision with a lot of applications, e.g., video editing, object tracking, and object based encoding. Different from image object segmentation, video object segmentation must consider both spatial and temporal coherence for the object. Despite extensive previous work, the problem is still challenging. Usually, foreground object in the video draws more attention from humans, i.e. it is salient. In this thesis we tackle the problem from the aspect of saliency, where saliency means a certain subset of visual information selected by a visual system (human or machine). We present a novel unsupervised method for video object segmentation that considers both low level vision cues and high level motion cues. In our model, video object segmentation can be formulated as a unified energy minimization problem and solved in polynomial time by employing the min-cut algorithm. Specifically, our energy function comprises the unary term and pair-wise interaction energy term respectively, where unary term measures region saliency and interaction term smooths the mutual effects between object saliency and motion saliency. Object saliency is computed in spatial domain from each discrete frame using multi-scale context features, e.g., color histogram, gradient, and graph based manifold ranking. Meanwhile, motion saliency is calculated in temporal domain by extracting phase information of the video. In the experimental section of this thesis, our proposed method has been evaluated on several benchmark datasets. In MSRA 1000 dataset the result demonstrates that our spatial object saliency detection is superior to the state-of-art methods. Moreover, our temporal motion saliency detector can achieve better performance than existing motion detection approaches in UCF sports action analysis dataset and Weizmann dataset respectively. Finally, we show the attractive empirical result and quantitative evaluation of our approach on two benchmark video object segmentation datasets.

Contributors

Agent

Created

Date Created
  • 2013

152122-Thumbnail Image.png

Exploring video denoising using matrix completion

Description

Video denoising has been an important task in many multimedia and computer vision applications. Recent developments in the matrix completion theory and emergence of new numerical methods which can efficiently

Video denoising has been an important task in many multimedia and computer vision applications. Recent developments in the matrix completion theory and emergence of new numerical methods which can efficiently solve the matrix completion problem have paved the way for exploration of new techniques for some classical image processing tasks. Recent literature shows that many computer vision and image processing problems can be solved by using the matrix completion theory. This thesis explores the application of matrix completion in video denoising. A state-of-the-art video denoising algorithm in which the denoising task is modeled as a matrix completion problem is chosen for detailed study. The contribution of this thesis lies in both providing extensive analysis to bridge the gap in existing literature on matrix completion frame work for video denoising and also in proposing some novel techniques to improve the performance of the chosen denoising algorithm. The chosen algorithm is implemented for thorough analysis. Experiments and discussions are presented to enable better understanding of the problem. Instability shown by the algorithm at some parameter values in a particular case of low levels of pure Gaussian noise is identified. Artifacts introduced in such cases are analyzed. A novel way of grouping structurally-relevant patches is proposed to improve the algorithm. Experiments show that this technique is useful, especially in videos containing high amounts of motion. Based on the observation that matrix completion is not suitable for denoising patches containing relatively low amount of image details, a framework is designed to separate patches corresponding to low structured regions from a noisy image. Experiments are conducted by not subjecting such patches to matrix completion, instead denoising such patches in a different way. The resulting improvement in performance suggests that denoising low structured patches does not require a complex method like matrix completion and in fact it is counter-productive to subject such patches to matrix completion. These results also indicate the inherent limitation of matrix completion to deal with cases in which noise dominates the structural properties of an image. A novel method for introducing priorities to the ranked patches in matrix completion is also presented. Results showed that this method yields improved performance in general. It is observed that the artifacts in presence of low levels of pure Gaussian noise appear differently after introducing priorities to the patches and the artifacts occur at a wider range of parameter values. Results and discussion suggesting future ways to explore this problem are also presented.

Contributors

Agent

Created

Date Created
  • 2013

155167-Thumbnail Image.png

An analysis of the memory bottleneck and cache performance of most apparent distortion image quality assessment algorithm on GPU

Description

As digital images are transmitted over the network or stored on a disk, image processing is done as part of the standard for efficient storage and bandwidth. This causes some

As digital images are transmitted over the network or stored on a disk, image processing is done as part of the standard for efficient storage and bandwidth. This causes some amount of distortion or artifacts in the image which demands the need for quality assessment. Subjective image quality assessment is expensive, time consuming and influenced by the subject's perception. Hence, there is a need for developing mathematical models that are capable of predicting the quality evaluation. With the advent of the information era and an exponential growth in image/video generation and consumption, the requirement for automated quality assessment has become mandatory to assess the degradation. The last few decades have seen research on automated image quality assessment (IQA) algorithms gaining prominence. However, the focus has been on achieving better predication accuracy, and not on improving computational performance. As a result, existing serial implementations require a lot of time in processing a single frame. In the last 5 years, research on general-purpose graphic processing unit (GPGPU) based image quality assessment (IQA) algorithm implementation has shown promising results for single images. Still, the implementations are not efficient enough for deployment in real world applications, especially for live videos at high resolution. Hence, in this thesis, it is proposed that microarchitecture-conscious coding on a graphics processing unit (GPU) combined with detailed understanding of the image quality assessment (IQA) algorithm can result in non-trivial speedups without compromising quality prediction accuracy. This document focusses on the microarchitectural analysis of the most apparent distortion (MAD) algorithm. The results are analyzed in-depth and one of the major bottlenecks is identified. With the knowledge of underlying microarchitecture, the implementation is restructured thereby resolving the bottleneck and improving the performance.

Contributors

Agent

Created

Date Created
  • 2016

155227-Thumbnail Image.png

High speed camera chip

Description

The market for high speed camera chips, or image sensors, has experienced rapid growth over the past decades owing to its broad application space in security, biomedical equipment, and mobile

The market for high speed camera chips, or image sensors, has experienced rapid growth over the past decades owing to its broad application space in security, biomedical equipment, and mobile devices. CMOS (complementary metal-oxide-semiconductor) technology has significantly improved the performance of the high speed camera chip by enabling the monolithic integration of pixel circuits and on-chip analog-to-digital conversion. However, for low light intensity applications, many CMOS image sensors have a sub-optimum dynamic range, particularly in high speed operation. Thus the requirements for a sensor to have a high frame rate and high fill factor is attracting more attention. Another drawback for the high speed camera chip is its high power demands due to its high operating frequency. Therefore, a CMOS image sensor with high frame rate, high fill factor, high voltage range and low power is difficult to realize.

This thesis presents the design of pixel circuit, the pixel array and column readout chain for a high speed camera chip. An integrated PN (positive-negative) junction photodiode and an accompanying ten transistor pixel circuit are implemented using a 0.18 µm CMOS technology. Multiple methods are applied to minimize the subthreshold currents, which is critical for low light detection. A layout sharing technique is used to increase the fill factor to 64.63%. Four programmable gain amplifiers (PGAs) and 10-bit pipeline analog-to-digital converters (ADCs) are added to complete on-chip analog to digital conversion. The simulation results of extracted circuit indicate ENOB (effective number of bits) is greater than 8 bits with FoM (figures of merit) =0.789. The minimum detectable voltage level is determined to be 470μV based on noise analysis. The total power consumption of PGA and ADC is 8.2mW for each conversion. The whole camera chip reaches 10508 frames per second (fps) at full resolution with 3.1mm x 3.4mm area.

Contributors

Agent

Created

Date Created
  • 2017

154364-Thumbnail Image.png

Perceptual-based locally adaptive noise and blur detection

Description

The quality of real-world visual content is typically impaired by many factors including image noise and blur. Detecting and analyzing these impairments are important steps for multiple computer vision tasks.

The quality of real-world visual content is typically impaired by many factors including image noise and blur. Detecting and analyzing these impairments are important steps for multiple computer vision tasks. This work focuses on perceptual-based locally adaptive noise and blur detection and their application to image restoration.

In the context of noise detection, this work proposes perceptual-based full-reference and no-reference objective image quality metrics by integrating perceptually weighted local noise into a probability summation model. Results are reported on both the LIVE and TID2008 databases. The proposed metrics achieve consistently a good performance across noise types and across databases as compared to many of the best very recent quality metrics. The proposed metrics are able to predict with high accuracy the relative amount of perceived noise in images of different content.

In the context of blur detection, existing approaches are either computationally costly or cannot perform reliably when dealing with the spatially-varying nature of the defocus blur. In addition, many existing approaches do not take human perception into account. This work proposes a blur detection algorithm that is capable of detecting and quantifying the level of spatially-varying blur by integrating directional edge spread calculation, probability of blur detection and local probability summation. The proposed method generates a blur map indicating the relative amount of perceived local blurriness. In order to detect the flat
ear flat regions that do not contribute to perceivable blur, a perceptual model based on the Just Noticeable Difference (JND) is further integrated in the proposed blur detection algorithm to generate perceptually significant blur maps. We compare our proposed method with six other state-of-the-art blur detection methods. Experimental results show that the proposed method performs the best both visually and quantitatively.

This work further investigates the application of the proposed blur detection methods to image deblurring. Two selective perceptual-based image deblurring frameworks are proposed, to improve the image deblurring results and to reduce the restoration artifacts. In addition, an edge-enhanced super resolution algorithm is proposed, and is shown to achieve better reconstructed results for the edge regions.

Contributors

Agent

Created

Date Created
  • 2016

154820-Thumbnail Image.png

Quantification of solar photovoltaic encapsulant browning level using image processing tool

Description

In recent years, solar photovoltaic (PV) industry has seen lots of improvements in technology and of growth in market with crystalline silicon PV modules being the most widely used technology.

In recent years, solar photovoltaic (PV) industry has seen lots of improvements in technology and of growth in market with crystalline silicon PV modules being the most widely used technology. Plant inspections are gaining much importance to identify and quantitatively determine the impacts of various visual defects on performance. There are about 86 different types of defects found in the PV modules installed in various climates and most of them can be visually observed. However, a quantitative determination of impact or risk of each of identified defect on performance is challenging. Thus, it is utmost important to quantify the risk for each of the visual defects without any human subjectivity. The best way to quantify the risk of each defect is to perform current-voltage measurements of the defective modules installed in the plant but it requires disruption of plant operation, expensive measuring equipment and intensive human resources. One of the most riskiest and dominant visual defects is encapsulant browning which affects the PV module performance in the form of current degradation. The present study deals with developing an automated image processing tool which can address the issues of human subjectivity on browning level impacting performance. The image processing tool developed in this work can be directly used to quantify the impact of browning on performance without intrusively disconnecting the modules from the plant. In this work, the quantified browning level impact on performance has also been experimentally validated through a correlation study using short-circuit current and reflectance/transmittance measurements of browned PV modules retrieved from aged plants/systems installed in diverse climatic conditions. The primary goal of the image processing tool developed in this work is to determine the performance impact of encapsulant browning without interrupting the plant operation for I-V measurements. The use of image processing tool provides a single numerical value, called browning index (BI), which can accurately quantify browning levels on modules and also correlate with the performance and reflectance/transmittance parameters of the modules.

Contributors

Agent

Created

Date Created
  • 2016

153241-Thumbnail Image.png

Spatial and multi-temporal visual change detection with application to SAR image analysis

Description

Thousands of high-resolution images are generated each day. Detecting and analyzing variations in these images are key steps in image understanding. This work focuses on spatial and multitemporal

visual change detection

Thousands of high-resolution images are generated each day. Detecting and analyzing variations in these images are key steps in image understanding. This work focuses on spatial and multitemporal

visual change detection and its applications in multi-temporal synthetic aperture radar (SAR) images.

The Canny edge detector is one of the most widely-used edge detection algorithms due to its superior performance in terms of SNR and edge localization and only one response to a single edge. In this work, we propose a mechanism to implement the Canny algorithm at the block level without any loss in edge detection performance as compared to the original frame-level Canny algorithm. The resulting block-based algorithm has significantly reduced memory requirements and can achieve a significantly reduced latency. Furthermore, the proposed algorithm can be easily integrated with other block-based image processing systems. In addition, quantitative evaluations and subjective tests show that the edge detection performance of the proposed algorithm is better than the original frame-based algorithm, especially when noise is present in the images.

In the context of multi-temporal SAR images for earth monitoring applications, one critical issue is the detection of changes occurring after a natural or anthropic disaster. In this work, we propose a novel similarity measure for automatic change detection using a pair of SAR images

acquired at different times and apply it in both the spatial and wavelet domains. This measure is based on the evolution of the local statistics of the image between two dates. The local statistics are modeled as a Gaussian Mixture Model (GMM), which is more suitable and flexible to approximate the local distribution of the SAR image with distinct land-cover typologies. Tests on real datasets show that the proposed detectors outperform existing methods in terms of the quality of the similarity maps, which are assessed using the receiver operating characteristic (ROC) curves, and in terms of the total error rates of the final change detection maps. Furthermore, we proposed a new

similarity measure for automatic change detection based on a divisive normalization transform in order to reduce the computation complexity. Tests show that our proposed DNT-based change detector

exhibits competitive detection performance while achieving lower computational complexity as compared to previously suggested methods.

Contributors

Agent

Created

Date Created
  • 2014

155085-Thumbnail Image.png

Video2Vec: learning semantic spatio-temporal embedding for video representations

Description

High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is

High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representations for static images, researchers have been using similar techniques to tackle video contents. Typical techniques first extract spatial features by processing raw images using deep convolution architectures designed for static image classifications. Then simple average, concatenation or classifier-based fusion/pooling methods are applied to the extracted features. I argue that features extracted in such ways do not acquire enough representative information since videos, unlike images, should be characterized as a temporal sequence of semantically coherent visual contents and thus need to be represented in a manner considering both semantic and spatio-temporal information.

In this thesis, I propose a novel architecture to learn semantic spatio-temporal embedding for videos to support high-level video analysis. The proposed method encodes video spatial and temporal information separately by employing a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Fully Connected Gated Recurrent Unit (FC-GRU) encoders for capturing longer-term temporal structure of the CNN features. The resultant spatio-temporal representation (a vector) is used to learn a mapping via a Fully Connected Multilayer Perceptron (FC-MLP) to the word2vec semantic embedding space, leading to a semantic interpretation of the video vector that supports high-level analysis. I evaluate the usefulness and effectiveness of this new video representation by conducting experiments on action recognition, zero-shot video classification, and semantic video retrieval (word-to-video) retrieval, using the UCF101 action recognition dataset.

Contributors

Agent

Created

Date Created
  • 2016

155003-Thumbnail Image.png

GPGPU based implementation of BLIINDS-II NR-IQA

Description

The technological advances in the past few decades have made possible creation and consumption of digital visual content at an explosive rate. Consequently, there is a need for efficient quality

The technological advances in the past few decades have made possible creation and consumption of digital visual content at an explosive rate. Consequently, there is a need for efficient quality monitoring systems to ensure minimal degradation of images and videos during various processing operations like compression, transmission, storage etc. Objective Image Quality Assessment (IQA) algorithms have been developed that predict quality scores which match well with human subjective quality assessment. However, a lot of research still remains to be done before IQA algorithms can be deployed in real world systems. Long runtimes for one frame of image is a major hurdle. Graphics Processing Units (GPUs), equipped with massive number of computational cores, provide an opportunity to accelerate IQA algorithms by performing computations in parallel. Indeed, General Purpose Graphics Processing Units (GPGPU) techniques have been applied to a few Full Reference IQA algorithms which fall under the. We present a GPGPU implementation of Blind Image Integrity Notator using DCT Statistics (BLIINDS-II), which falls under the No Reference IQA algorithm paradigm. We have been able to achieve a speedup of over 30x over the previous CPU version of this algorithm. We test our implementation using various distorted images from the CSIQ database and present the performance trends observed. We achieve a very consistent performance of around 9 milliseconds per distorted image, which made possible the execution of over 100 images per second (100 fps).

Contributors

Agent

Created

Date Created
  • 2016

154946-Thumbnail Image.png

High speed CMOS image sensor

Description

High speed image sensors are used as a diagnostic tool to analyze high speed processes for industrial, automotive, defense and biomedical application. The high fame rate of these sensors, capture

High speed image sensors are used as a diagnostic tool to analyze high speed processes for industrial, automotive, defense and biomedical application. The high fame rate of these sensors, capture a series of images that enables the viewer to understand and analyze the high speed phenomena. However, the pixel readout circuits designed for these sensors with a high frame rate (100fps to 1 Mfps) have a very low fill factor which are less than 58%. For high speed operation, the exposure time is less and (or) the light intensity incident on the image sensor is less. This makes it difficult for the sensor to detect faint light signals and gives a lower limit on the signal levels being detected by the sensor. Moreover, the leakage paths in the pixel readout circuit also sets a limit on the signal level being detected. Therefore, the fill factor of the pixel should be maximized and the leakage currents in the readout circuits should be minimized.

This thesis work presents the design of the pixel readout circuit suitable for high speed and low light imaging application. The circuit is an improvement to the 6T pixel readout architecture. The designed readout circuit minimizes the leakage currents in the circuit and detects light producing a signal level of 350µV at the cathode of the photodiode. A novel layout technique is used for the pixel, which improves the fill factor of the pixel to 64.625%. The read out circuit designed is an integral part of high speed image sensor, which is fabricated using a 0.18 µm CMOS technology with the die size of 3.1mm x 3.4 mm, the pixel size of 20µm x 20 µm, number of pixel of 96 x 96 and four 10-bit pipelined ADC’s. The image sensor achieves a high frame rate of 10508 fps and readout speed of 96 M pixels / sec.

Contributors

Agent

Created

Date Created
  • 2016