ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Computer Science
- Creators: Spanias, Andreas
The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.
Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.
Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.
The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems.
In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph.
Revealing the underlying structure and dynamics of complex networked systems from observed data without of any specific prior information is of fundamental importance to science, engineering, and society. We articulate a Markov network based model, the sparse dynamical Boltzmann machine (SDBM), as a universal network structural estimator and dynamics approximator based on techniques including compressive sensing and K-means algorithm. It recovers the network structure of the original system and predicts its short-term or even long-term dynamical behavior for a large variety of representative dynamical processes on model and real-world complex networks.
One of the most challenging problems in complex dynamical systems is to control complex networks.
Upon finding that the energy required to approach a target state with reasonable precision
is often unbearably large, and the energy of controlling a set of networks with similar structural properties follows a fat-tail distribution, we identify fundamental structural ``short boards'' that play a dominant role in the enormous energy and offer a theoretical interpretation for the fat-tail distribution and simple strategies to significantly reduce the energy.
Extreme events and cascading failure, a type of collective behavior in complex networked systems, often have catastrophic consequences. Utilizing transportation and evolutionary game dynamics as prototypical
settings, we investigate the emergence of extreme events in simplex complex networks, mobile ad-hoc networks and multi-layer interdependent networks. A striking resonance-like phenomenon and the emergence of global-scale cascading breakdown are discovered. We derive analytic theories to understand the mechanism of
control at a quantitative level and articulate cost-effective control schemes to significantly suppress extreme events and the cascading process.
Motivated by recent studies in motor control and therapy, in this thesis an existing computational framework is used to assess balance impairment and disease severity in people suffering from Parkinson's disease. The framework uses high-dimensional shape descriptors of the reconstructed phase space, of the subjects' center of pressure (CoP) tracings while performing dynamical postural shifts. The performance of the framework is evaluated using a dataset collected from 43 healthy and 17 Parkinson's disease impaired subjects, and outperforms other methods, such as dynamical shift indices and use of chaotic invariants, in assessment of balance impairment.
In this thesis, an unsupervised method is also proposed that measures movement quality assessment of simple actions like sit-to-stand and dynamic posture shifts by modeling the deviation of a given movement from an ideal movement path in the configuration space, i.e. the quality of movement is directly related to similarity to the ideal trajectory, between the start and end pose. The S^1xS^1 configuration space was used to model the interaction of two joint angles in sit-to-stand actions, and the R^2 space was used to model the subject's CoP while performing dynamic posture shifts for application in movement quality estimation.