This collection includes both ASU Theses and Dissertations, submitted by graduate students, and the Barrett, Honors College theses submitted by undergraduate students. 

Displaying 1 - 4 of 4
Filtering by

Clear all filters

153488-Thumbnail Image.png
Description
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
154471-Thumbnail Image.png
Description
The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.
ContributorsAnirudh, Rushil (Author) / Turaga, Pavan (Thesis advisor) / Cochran, Douglas (Committee member) / Runger, George C. (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)
Created2016
155900-Thumbnail Image.png
Description
Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from

Compressive sensing theory allows to sense and reconstruct signals/images with lower sampling rate than Nyquist rate. Applications in resource constrained environment stand to benefit from this theory, opening up many possibilities for new applications at the same time. The traditional inference pipeline for computer vision sequence reconstructing the image from compressive measurements. However,the reconstruction process is a computationally expensive step that also provides poor results at high compression rate. There have been several successful attempts to perform inference tasks directly on compressive measurements such as activity recognition. In this thesis, I am interested to tackle a more challenging vision problem - Visual question answering (VQA) without reconstructing the compressive images. I investigate the feasibility of this problem with a series of experiments, and I evaluate proposed methods on a VQA dataset and discuss promising results and direction for future work.
ContributorsHuang, Li-Chin (Author) / Turaga, Pavan (Thesis advisor) / Yang, Yezhou (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2017
168276-Thumbnail Image.png
Description
This thesis develops geometrically and statistically rigorous foundations for multivariate analysis and bayesian inference posed on grassmannian manifolds. Requisite to the development of key elements of statistical theory in a geometric realm are closed-form, analytic expressions for many differential geometric objects, e.g., tangent vectors, metrics, geodesics, volume forms. The first

This thesis develops geometrically and statistically rigorous foundations for multivariate analysis and bayesian inference posed on grassmannian manifolds. Requisite to the development of key elements of statistical theory in a geometric realm are closed-form, analytic expressions for many differential geometric objects, e.g., tangent vectors, metrics, geodesics, volume forms. The first part of this thesis is devoted to a mathematical exposition of these. In particular, it leverages the classical work of Alan James to derive the exterior calculus of differential forms on special grassmannians for invariant measures with respect to which integration is permissible. Motivated by various multi-­sensor remote sensing applications, the second part of this thesis describes the problem of recursively estimating the state of a dynamical system propagating on the Grassmann manifold. Fundamental to the bayesian treatment of this problem is the choice of a suitable probability distribution to a priori model the state. Using the Method of Maximum Entropy, a derivation of maximum-­entropy probability distributions on the state space that uses the developed geometric theory is characterized. Statistical analyses of these distributions, including parameter estimation, are also presented. These probability distributions and the statistical analysis thereof are original contributions. Using the bayesian framework, two recursive estimation algorithms, both of which rely on noisy measurements on (special cases of) the Grassmann manifold, are the devised and implemented numerically. The first is applied to an idealized scenario, the second to a more practically motivated scenario. The novelty of both of these algorithms lies in the use of thederived maximum­entropy probability measures as models for the priors. Numerical simulations demonstrate that, under mild assumptions, both estimation algorithms produce accurate and statistically meaningful outputs. This thesis aims to chart the interface between differential geometry and statistical signal processing. It is my deepest hope that the geometric-statistical approach underlying this work facilitates and encourages the development of new theories and new computational methods in geometry. Application of these, in turn, will bring new insights and bettersolutions to a number of extant and emerging problems in signal processing.
ContributorsCrider, Lauren N (Author) / Cochran, Douglas (Thesis advisor) / Kotschwar, Brett (Committee member) / Scharf, Louis (Committee member) / Taylor, Thomas (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2021