Theses and Dissertations
Filtering by
- All Subjects: Spatial Audio
- All Subjects: Feature Fusion
We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems.
In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph.
Spatial audio can be especially useful for directing human attention. However, delivering spatial audio through speakers, rather than headphones that deliver audio directly to the ears, produces the issue of crosstalk, where sounds from each of the two speakers reach the opposite ear, inhibiting the spatialized effect. A research team at Meteor Studio has developed an algorithm called Xblock that solves this issue using a crosstalk cancellation technique. This thesis project expands upon the existing Xblock IoT system by providing a way to test the accuracy of the directionality of sounds generated with spatial audio. More specifically, the objective is to determine whether the usage of Xblock with smart speakers can provide generalized audio localization, which refers to the ability to detect a general direction of where a sound might be coming from. This project also expands upon the existing Xblock technique to integrate voice commands, where users can verbalize the name of a lost item using the phrase, “Find [item]”, and the IoT system will use spatial audio to guide them to it.
Spatial audio can be especially useful for directing human attention. However, delivering spatial audio through speakers, rather than headphones that deliver audio directly to the ears, produces the issue of crosstalk, where sounds from each of the two speakers reach the opposite ear, inhibiting the spatialized effect. A research team at Meteor Studio has developed an algorithm called Xblock that solves this issue using a crosstalk cancellation technique. This thesis project expands upon the existing Xblock IoT system by providing a way to test the accuracy of the directionality of sounds generated with spatial audio. More specifically, the objective is to determine whether the usage of Xblock with smart speakers can provide generalized audio localization, which refers to the ability to detect a general direction of where a sound might be coming from. This project also expands upon the existing Xblock technique to integrate voice commands, where users can verbalize the name of a lost item using the phrase, “Find [item]”, and the IoT system will use spatial audio to guide them to it.