This collection includes both ASU Theses and Dissertations, submitted by graduate students, and the Barrett, Honors College theses submitted by undergraduate students. 

Displaying 1 - 2 of 2
Filtering by

Clear all filters

153488-Thumbnail Image.png
Description
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
157697-Thumbnail Image.png
Description
The depth richness of a scene translates into a spatially variable defocus blur in the acquired image. Blurring can mislead computational image understanding; therefore, blur detection can be used for selective image enhancement of blurred regions and the application of image understanding algorithms to sharp regions. This work focuses on

The depth richness of a scene translates into a spatially variable defocus blur in the acquired image. Blurring can mislead computational image understanding; therefore, blur detection can be used for selective image enhancement of blurred regions and the application of image understanding algorithms to sharp regions. This work focuses on blur detection and its application to image enhancement.

This work proposes a spatially-varying defocus blur detection based on the quotient of spectral bands; additionally, to avoid the use of computationally intensive algorithms for the segmentation of foreground and background regions, a global threshold defined using weak textured regions on the input image is proposed. Quantitative results expressed in the precision-recall space as well as qualitative results overperform current state-of-the-art algorithms while keeping the computational requirements at competitive levels.

Imperfections in the curvature of lenses can lead to image radial distortion (IRD). Computer vision applications can be drastically affected by IRD. This work proposes a novel robust radial distortion correction algorithm based on alternate optimization using two cost functions tailored for the estimation of the center of distortion and radial distortion coefficients. Qualitative and quantitative results show the competitiveness of the proposed algorithm.

Blur is one of the causes of visual discomfort in stereopsis. Sharpening applying traditional algorithms can produce an interdifference which causes eyestrain and visual fatigue for the viewer. A sharpness enhancement method for stereo images that incorporates binocular vision cues and depth information is presented. Perceptual evaluation and quantitative results based on the metric of interdifference deviation are reported; results of the proposed algorithm are competitive with state-of-the-art stereo algorithms.

Digital images and videos are produced every day in astonishing amounts. Consequently, the market-driven demand for higher quality content is constantly increasing which leads to the need of image quality assessment (IQA) methods. A training-free, no-reference image sharpness assessment method based on the singular value decomposition of perceptually-weighted normalized-gradients of relevant pixels in the input image is proposed. Results over six subject-rated publicly available databases show competitive performance when compared with state-of-the-art algorithms.
ContributorsAndrade Rodas, Juan Manuel (Author) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Abousleman, Glen (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2019