Matching Items (41)
Filtering by

Clear all filters

153394-Thumbnail Image.png
Description
As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a

As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a conventional camera, into a single step. A popular variant is the single-pixel camera that obtains measurements of the scene using a pseudo-random measurement matrix. Advances in compressive sensing (CS) theory in the past decade have supplied the tools that, in theory, allow near-perfect reconstruction of an image from these measurements even for sub-Nyquist sampling rates. However, current state-of-the-art reconstruction algorithms suffer from two drawbacks -- They are (1) computationally very expensive and (2) incapable of yielding high fidelity reconstructions for high compression ratios. In computer vision, the final goal is usually to perform an inference task using the images acquired and not signal recovery. With this motivation, this thesis considers the possibility of inference directly from compressed measurements, thereby obviating the need to use expensive reconstruction algorithms. It is often the case that non-linear features are used for inference tasks in computer vision. However, currently, it is unclear how to extract such features from compressed measurements. Instead, using the theoretical basis provided by the Johnson-Lindenstrauss lemma, discriminative features using smashed correlation filters are derived and it is shown that it is indeed possible to perform reconstruction-free inference at high compression ratios with only a marginal loss in accuracy. As a specific inference problem in computer vision, face recognition is considered, mainly beyond the visible spectrum such as in the short wave infra-red region (SWIR), where sensors are expensive.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2015
153488-Thumbnail Image.png
Description
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
152813-Thumbnail Image.png
Description
Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient

Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient classification of human activities by employing machine learning techniques. We are interested in the generalization of classical tools for signal approximation to newer spaces, such as rotation data, which is best studied in a non-Euclidean setting, and its application to activity analysis. Attributing to the non-linear nature of the rotation data space, which involve a heavy overload on the smart phone's processor and memory as opposed to feature extraction on the Euclidean space, indexing and compaction of the acquired sensor data is performed prior to feature extraction, to reduce CPU overhead and thereby increase the lifetime of the battery with a little loss in recognition accuracy of the activities. The sensor data represented as unit quaternions, is a more intrinsic representation of the orientation of smart phone compared to Euler angles (which suffers from Gimbal lock problem) or the computationally intensive rotation matrices. Classification algorithms are employed to classify these manifold sequences in the non-Euclidean space. By performing customized indexing (using K-means algorithm) of the evolved manifold sequences before feature extraction, considerable energy savings is achieved in terms of smart phone's battery life.
ContributorsSivakumar, Aswin (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2014
153249-Thumbnail Image.png
Description
In this thesis we consider the problem of facial expression recognition (FER) from video sequences. Our method is based on subspace representations and Grassmann manifold based learning. We use Local Binary Pattern (LBP) at the frame level for representing the facial features. Next we develop a model to represent the

In this thesis we consider the problem of facial expression recognition (FER) from video sequences. Our method is based on subspace representations and Grassmann manifold based learning. We use Local Binary Pattern (LBP) at the frame level for representing the facial features. Next we develop a model to represent the video sequence in a lower dimensional expression subspace and also as a linear dynamical system using Autoregressive Moving Average (ARMA) model. As these subspaces lie on Grassmann space, we use Grassmann manifold based learning techniques such as kernel Fisher Discriminant Analysis with Grassmann kernels for classification. We consider six expressions namely, Angry (AN), Disgust (Di), Fear (Fe), Happy (Ha), Sadness (Sa) and Surprise (Su) for classification. We perform experiments on extended Cohn-Kanade (CK+) facial expression database to evaluate the expression recognition performance. Our method demonstrates good expression recognition performance outperforming other state of the art FER algorithms. We achieve an average recognition accuracy of 97.41% using a method based on expression subspace, kernel-FDA and Support Vector Machines (SVM) classifier. By using a simpler classifier, 1-Nearest Neighbor (1-NN) along with kernel-FDA, we achieve a recognition accuracy of 97.09%. We find that to process a group of 19 frames in a video sequence, LBP feature extraction requires majority of computation time (97 %) which is about 1.662 seconds on the Intel Core i3, dual core platform. However when only 3 frames (onset, middle and peak) of a video sequence are used, the computational complexity is reduced by about 83.75 % to 260 milliseconds at the expense of drop in the recognition accuracy to 92.88 %.
ContributorsYellamraju, Anirudh (Author) / Chakrabarti, Chaitali (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Karam, Lina (Committee member) / Arizona State University (Publisher)
Created2014
155774-Thumbnail Image.png
Description
In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the

In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the emergence of novel imagers called spatial-multiplexing cameras, which offer compression at the sensing level itself by providing an arbitrary linear measurements of the scene instead of pixel-based sampling. In this dissertation, I discuss various approaches for effective information extraction from spatial-multiplexing measurements and present the trade-offs between reliability of the performance and computational/storage load of the system. In the first part, I present a reconstruction-free approach to high-level inference in computer vision, wherein I consider the specific case of activity analysis, and show that using correlation filters, one can perform effective action recognition and localization directly from a class of spatial-multiplexing cameras, called compressive cameras, even at very low measurement rates of 1\%. In the second part, I outline a deep learning based non-iterative and real-time algorithm to reconstruct images from compressively sensed (CS) measurements, which can outperform the traditional iterative CS reconstruction algorithms in terms of reconstruction quality and time complexity, especially at low measurement rates. To overcome the limitations of compressive cameras, which are operated with random measurements and not particularly tuned to any task, in the third part of the dissertation, I propose a method to design spatial-multiplexing measurements, which are tuned to facilitate the easy extraction of features that are useful in computer vision tasks like object tracking. The work presented in the dissertation provides sufficient evidence to high-level inference in computer vision at extremely low measurement rates, and hence allows us to think about the possibility of revamping the current day computer systems.
ContributorsKulkarni, Kuldeep Sharad (Author) / Turaga, Pavan (Thesis advisor) / Li, Baoxin (Committee member) / Chakrabarti, Chaitali (Committee member) / Sankaranarayanan, Aswin (Committee member) / LiKamWa, Robert (Committee member) / Arizona State University (Publisher)
Created2017
155809-Thumbnail Image.png
Description
Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow

Light field imaging is limited in its computational processing demands of high

sampling for both spatial and angular dimensions. Single-shot light field cameras

sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing

incoming rays onto a 2D sensor array. While this resolution can be recovered using

compressive sensing, these iterative solutions are slow in processing a light field. We

present a deep learning approach using a new, two branch network architecture,

consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution

4D light field from a single coded 2D image. This network decreases reconstruction

time significantly while achieving average PSNR values of 26-32 dB on a variety of

light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7

minutes as compared to the dictionary method for equivalent visual quality. These

reconstructions are performed at small sampling/compression ratios as low as 8%,

allowing for cheaper coded light field cameras. We test our network reconstructions

on synthetic light fields, simulated coded measurements of real light fields captured

from a Lytro Illum camera, and real coded images from a custom CMOS diffractive

light field camera. The combination of compressive light field capture with deep

learning allows the potential for real-time light field video acquisition systems in the

future.
ContributorsGupta, Mayank (Author) / Turaga, Pavan (Thesis advisor) / Yang, Yezhou (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
158717-Thumbnail Image.png
Description
Semantic image segmentation has been a key topic in applications involving image processing and computer vision. Owing to the success and continuous research in the field of deep learning, there have been plenty of deep learning-based segmentation architectures that have been designed for various tasks. In this thesis, deep-learning architectures

Semantic image segmentation has been a key topic in applications involving image processing and computer vision. Owing to the success and continuous research in the field of deep learning, there have been plenty of deep learning-based segmentation architectures that have been designed for various tasks. In this thesis, deep-learning architectures for a specific application in material science; namely the segmentation process for the non-destructive study of the microstructure of Aluminum Alloy AA 7075 have been developed. This process requires the use of various imaging tools and methodologies to obtain the ground-truth information. The image dataset obtained using Transmission X-ray microscopy (TXM) consists of raw 2D image specimens captured from the projections at every beam scan. The segmented 2D ground-truth images are obtained by applying reconstruction and filtering algorithms before using a scientific visualization tool for segmentation. These images represent the corrosive behavior caused by the precipitates and inclusions particles on the Aluminum AA 7075 alloy. The study of the tools that work best for X-ray microscopy-based imaging is still in its early stages.

In this thesis, the underlying concepts behind Convolutional Neural Networks (CNNs) and state-of-the-art Semantic Segmentation architectures have been discussed in detail. The data generation and pre-processing process applied to the AA 7075 Data have also been described, along with the experimentation methodologies performed on the baseline and four other state-of-the-art Segmentation architectures that predict the segmented boundaries from the raw 2D images. A performance analysis based on various factors to decide the best techniques and tools to apply Semantic image segmentation for X-ray microscopy-based imaging was also conducted.
ContributorsBarboza, Daniel (Author) / Turaga, Pavan (Thesis advisor) / Chawla, Nikhilesh (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2020
157697-Thumbnail Image.png
Description
The depth richness of a scene translates into a spatially variable defocus blur in the acquired image. Blurring can mislead computational image understanding; therefore, blur detection can be used for selective image enhancement of blurred regions and the application of image understanding algorithms to sharp regions. This work focuses on

The depth richness of a scene translates into a spatially variable defocus blur in the acquired image. Blurring can mislead computational image understanding; therefore, blur detection can be used for selective image enhancement of blurred regions and the application of image understanding algorithms to sharp regions. This work focuses on blur detection and its application to image enhancement.

This work proposes a spatially-varying defocus blur detection based on the quotient of spectral bands; additionally, to avoid the use of computationally intensive algorithms for the segmentation of foreground and background regions, a global threshold defined using weak textured regions on the input image is proposed. Quantitative results expressed in the precision-recall space as well as qualitative results overperform current state-of-the-art algorithms while keeping the computational requirements at competitive levels.

Imperfections in the curvature of lenses can lead to image radial distortion (IRD). Computer vision applications can be drastically affected by IRD. This work proposes a novel robust radial distortion correction algorithm based on alternate optimization using two cost functions tailored for the estimation of the center of distortion and radial distortion coefficients. Qualitative and quantitative results show the competitiveness of the proposed algorithm.

Blur is one of the causes of visual discomfort in stereopsis. Sharpening applying traditional algorithms can produce an interdifference which causes eyestrain and visual fatigue for the viewer. A sharpness enhancement method for stereo images that incorporates binocular vision cues and depth information is presented. Perceptual evaluation and quantitative results based on the metric of interdifference deviation are reported; results of the proposed algorithm are competitive with state-of-the-art stereo algorithms.

Digital images and videos are produced every day in astonishing amounts. Consequently, the market-driven demand for higher quality content is constantly increasing which leads to the need of image quality assessment (IQA) methods. A training-free, no-reference image sharpness assessment method based on the singular value decomposition of perceptually-weighted normalized-gradients of relevant pixels in the input image is proposed. Results over six subject-rated publicly available databases show competitive performance when compared with state-of-the-art algorithms.
ContributorsAndrade Rodas, Juan Manuel (Author) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Abousleman, Glen (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2019
157840-Thumbnail Image.png
Description
Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is

Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is generally not clear how the architectures are to be designed for different applications, or how the neural networks behave under different input perturbations and it is not easy to make the internal representations and parameters more interpretable. In this dissertation, I propose building constraints into feature maps, parameters and and design of algorithms involving neural networks for applications in low-level vision problems such as compressive imaging and multi-spectral image fusion, and high-level inference problems including activity and face recognition. Depending on the application, such constraints can be used to design architectures which are invariant/robust to certain nuisance factors, more efficient and, in some cases, more interpretable. Through extensive experiments on real-world datasets, I demonstrate these advantages of the proposed methods over conventional frameworks.
ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019
158864-Thumbnail Image.png
Description
Infants born before 37 weeks of pregnancy are considered to be preterm. Typically, preterm infants have to be strictly monitored since they are highly susceptible to health problems like hypoxemia (low blood oxygen level), apnea, respiratory issues, cardiac problems, neurological problems as well as an increased chance of long-term health

Infants born before 37 weeks of pregnancy are considered to be preterm. Typically, preterm infants have to be strictly monitored since they are highly susceptible to health problems like hypoxemia (low blood oxygen level), apnea, respiratory issues, cardiac problems, neurological problems as well as an increased chance of long-term health issues such as cerebral palsy, asthma and sudden infant death syndrome. One of the leading health complications in preterm infants is bradycardia - which is defined as the slower than expected heart rate, generally beating lower than 60 beats per minute. Bradycardia is often accompanied by low oxygen levels and can cause additional long term health problems in the premature infant.The implementation of a non-parametric method to predict the onset of brady- cardia is presented. This method assumes no prior knowledge of the data and uses kernel density estimation to predict the future onset of bradycardia events. The data is preprocessed, and then analyzed to detect the peaks in the ECG signals, following which different kernels are implemented to estimate the shared underlying distribu- tion of the data. The performance of the algorithm is evaluated using various metrics and the computational challenges and methods to overcome them are also discussed.
It is observed that the performance of the algorithm with regards to the kernels used are consistent with the theoretical performance of the kernel as presented in a previous work. The theoretical approach has also been automated in this work and the various implementation challenges have been addressed.
ContributorsMitra, Sinjini (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Moraffah, Bahman (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2020