Matching Items (3)
Filtering by

Clear all filters

156430-Thumbnail Image.png
Description
Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the

Machine learning models convert raw data in the form of video, images, audio,

text, etc. into feature representations that are convenient for computational process-

ing. Deep neural networks have proven to be very efficient feature extractors for a

variety of machine learning tasks. Generative models based on deep neural networks

introduce constraints on the feature space to learn transferable and disentangled rep-

resentations. Transferable feature representations help in training machine learning

models that are robust across different distributions of data. For example, with the

application of transferable features in domain adaptation, models trained on a source

distribution can be applied to a data from a target distribution even though the dis-

tributions may be different. In style transfer and image-to-image translation, disen-

tangled representations allow for the separation of style and content when translating

images.

This thesis examines learning transferable data representations in novel deep gen-

erative models. The Semi-Supervised Adversarial Translator (SAT) utilizes adversar-

ial methods and cross-domain weight sharing in a neural network to extract trans-

ferable representations. These transferable interpretations can then be decoded into

the original image or a similar image in another domain. The Explicit Disentangling

Network (EDN) utilizes generative methods to disentangle images into their core at-

tributes and then segments sets of related attributes. The EDN can separate these

attributes by controlling the ow of information using a novel combination of losses

and network architecture. This separation of attributes allows precise modi_cations

to speci_c components of the data representation, boosting the performance of ma-

chine learning tasks. The effectiveness of these models is evaluated across domain

adaptation, style transfer, and image-to-image translation tasks.
ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2018
158117-Thumbnail Image.png
Description
Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by the amount of training data. Inspired by the human ability

Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by the amount of training data. Inspired by the human ability to recognize object categories based on textual descriptions of objects and previous visual knowledge, the research community has extensively pursued the area of zero-shot learning. In this area of research, machine vision models are trained to recognize object categories that are not observed during the training process. Zero-shot learning models leverage textual information to transfer visual knowledge from seen object categories in order to recognize unseen object categories.

Generative models have recently gained popularity as they synthesize unseen visual features and convert zero-shot learning into a classical supervised learning problem. These generative models are trained using seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting towards seen classes, which leads to substandard performance in generalized zero-shot learning. To address this concern, this dissertation proposes a novel generative model that leverages the semantic relationship between seen and unseen categories and explicitly performs knowledge transfer from seen categories to unseen categories. Experiments were conducted on several benchmark datasets to demonstrate the efficacy of the proposed model for both zero-shot learning and generalized zero-shot learning. The dissertation also provides a unique Student-Teacher based generative model for zero-shot learning and concludes with future research directions in this area.
ContributorsVyas, Maunil Rohitbhai (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158233-Thumbnail Image.png
Description
Individuals with voice disorders experience challenges communicating daily. These challenges lead to a significant decrease in the quality of life for individuals with dysphonia. While voice amplification systems are often employed as a voice-assistive technology, individuals with voice disorders generally still experience difficulties being understood while using voice amplification systems.

Individuals with voice disorders experience challenges communicating daily. These challenges lead to a significant decrease in the quality of life for individuals with dysphonia. While voice amplification systems are often employed as a voice-assistive technology, individuals with voice disorders generally still experience difficulties being understood while using voice amplification systems. With the goal of developing systems that help improve the quality of life of individuals with dysphonia, this work outlines the landscape of voice-assistive technology, the inaccessibility of state-of-the-art voice-based technology and the need for the development of intelligibility improving voice-assistive technologies designed both with and for individuals with voice disorders. With the rise of voice-based technologies in society, in order for everyone to participate in the use of voice-based technologies individuals with voice disorders must be included in both the data that is used to train these systems and the design process. An important and necessary step towards the development of better voice assistive technology as well as more inclusive voice-based systems is the creation of a large, publicly available dataset of dysphonic speech. To this end, a web-based platform to crowdsource voice disorder speech was developed to create such a dataset. This dataset will be released so that it is freely and publicly available to stimulate research in the field of voice-assistive technologies. Future work includes building a robust intelligibility estimation model, as well as employing that model to measure, and therefore enhance, the intelligibility of a given utterance. The hope is that this model will lead to the development of voice-assistive technology using state-of-the-art machine learning models to help individuals with voice disorders be better understood.
ContributorsMoore, Meredith Kay (Author) / Panchanathan, Sethuraman (Thesis advisor) / Berisha, Visar (Committee member) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2020