Matching Items (7)
Filtering by

Clear all filters

168404-Thumbnail Image.png
Description
Communicating with computers through thought has been a remarkable achievement in recent years. This was made possible by the use of Electroencephalography (EEG). Brain-computer interface (BCI) relies heavily on Electroencephalography (EEG) signals for communication between humans and computers. With the advent ofdeep learning, many studies recently applied these techniques to

Communicating with computers through thought has been a remarkable achievement in recent years. This was made possible by the use of Electroencephalography (EEG). Brain-computer interface (BCI) relies heavily on Electroencephalography (EEG) signals for communication between humans and computers. With the advent ofdeep learning, many studies recently applied these techniques to EEG data to perform various tasks like emotion recognition, motor imagery classification, sleep analysis, and many more. Despite the rise of interest in EEG signal classification, very few studies have explored the MindBigData dataset, which collects EEG signals recorded at the stimulus of seeing a digit and thinking about it. This dataset takes us closer to realizing the idea of mind-reading or communication via thought. Thus classifying these signals into the respective digit that the user thinks about is a challenging task. This serves as a motivation to study this dataset and apply existing deep learning techniques to study it. Given the recent success of transformer architecture in different domains like Computer Vision and Natural language processing, this thesis studies transformer architecture for EEG signal classification. Also, it explores other deep learning techniques for the same. As a result, the proposed classification pipeline achieves comparable performance with the existing methods.
ContributorsMuglikar, Omkar Dushyant (Author) / Wang, Yalin (Thesis advisor) / Liang, Jianming (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2021
158117-Thumbnail Image.png
Description
Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by the amount of training data. Inspired by the human ability

Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by the amount of training data. Inspired by the human ability to recognize object categories based on textual descriptions of objects and previous visual knowledge, the research community has extensively pursued the area of zero-shot learning. In this area of research, machine vision models are trained to recognize object categories that are not observed during the training process. Zero-shot learning models leverage textual information to transfer visual knowledge from seen object categories in order to recognize unseen object categories.

Generative models have recently gained popularity as they synthesize unseen visual features and convert zero-shot learning into a classical supervised learning problem. These generative models are trained using seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting towards seen classes, which leads to substandard performance in generalized zero-shot learning. To address this concern, this dissertation proposes a novel generative model that leverages the semantic relationship between seen and unseen categories and explicitly performs knowledge transfer from seen categories to unseen categories. Experiments were conducted on several benchmark datasets to demonstrate the efficacy of the proposed model for both zero-shot learning and generalized zero-shot learning. The dissertation also provides a unique Student-Teacher based generative model for zero-shot learning and concludes with future research directions in this area.
ContributorsVyas, Maunil Rohitbhai (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158127-Thumbnail Image.png
Description
Over the past decade, advancements in neural networks have been instrumental in achieving remarkable breakthroughs in the field of computer vision. One of the applications is in creating assistive technology to improve the lives of visually impaired people by making the world around them more accessible. A lot of research

Over the past decade, advancements in neural networks have been instrumental in achieving remarkable breakthroughs in the field of computer vision. One of the applications is in creating assistive technology to improve the lives of visually impaired people by making the world around them more accessible. A lot of research in convolutional neural networks has led to human-level performance in different vision tasks including image classification, object detection, instance segmentation, semantic segmentation, panoptic segmentation and scene text recognition. All the before mentioned tasks, individually or in combination, have been used to create assistive technologies to improve accessibility for the blind.

This dissertation outlines various applications to improve accessibility and independence for visually impaired people during shopping by helping them identify products in retail stores. The dissertation includes the following contributions; (i) A dataset containing images of breakfast-cereal products and a classifier using a deep neural (ResNet) network; (ii) A dataset for training a text detection and scene-text recognition model; (iii) A model for text detection and scene-text recognition to identify product images using a user-controlled camera; (iv) A dataset of twenty thousand products with product information and related images that can be used to train and test a system designed to identify products.
ContributorsPatel, Akshar (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
157623-Thumbnail Image.png
Description
Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models

Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models have seen numerous applications in both language and vision community as they capture the information in the modality (English language) efficiently. Inspired by these language models, this work focuses on learning embedding spaces for two visual computing tasks, 1. Image Hashing 2. Zero Shot Learning. The training set was used to learn embedding spaces over which similarity/dissimilarity is measured using several distance metrics like hamming / euclidean / cosine distances. While the above-mentioned language models learn generic word embeddings, in this work task specific embeddings were learnt which can be used for Image Retrieval and Classification separately.

Image Hashing is the task of mapping images to binary codes such that some notion of user-defined similarity is preserved. The first part of this work focuses on designing a new framework that uses the hash-tags associated with web images to learn the binary codes. Such codes can be used in several applications like Image Retrieval and Image Classification. Further, this framework requires no labelled data, leaving it very inexpensive. Results show that the proposed approach surpasses the state-of-art approaches by a significant margin.

Zero-shot classification is the task of classifying the test sample into a new class which was not seen during training. This is possible by establishing a relationship between the training and the testing classes using auxiliary information. In the second part of this thesis, a framework is designed that trains using the handcrafted attribute vectors and word vectors but doesn’t require the expensive attribute vectors during test time. More specifically, an intermediate space is learnt between the word vector space and the image feature space using the hand-crafted attribute vectors. Preliminary results on two zero-shot classification datasets show that this is a promising direction to explore.
ContributorsGattupalli, Jaya Vijetha (Author) / Li, Baoxin (Thesis advisor) / Yang, Yezhou (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2019
158259-Thumbnail Image.png
Description
In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of

In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of catastrophic forgetting, where the model forgets previously learned knowledge as it overfits to the newly available data. Incremental learning algorithms enable deep neural networks to prevent catastrophic forgetting by retaining knowledge of previously observed data while also learning from newly available data.

This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications.
ContributorsPatil, Rishabh (Author) / Venkateswara, Hemanth (Thesis advisor) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)
Created2020
158233-Thumbnail Image.png
Description
Individuals with voice disorders experience challenges communicating daily. These challenges lead to a significant decrease in the quality of life for individuals with dysphonia. While voice amplification systems are often employed as a voice-assistive technology, individuals with voice disorders generally still experience difficulties being understood while using voice amplification systems.

Individuals with voice disorders experience challenges communicating daily. These challenges lead to a significant decrease in the quality of life for individuals with dysphonia. While voice amplification systems are often employed as a voice-assistive technology, individuals with voice disorders generally still experience difficulties being understood while using voice amplification systems. With the goal of developing systems that help improve the quality of life of individuals with dysphonia, this work outlines the landscape of voice-assistive technology, the inaccessibility of state-of-the-art voice-based technology and the need for the development of intelligibility improving voice-assistive technologies designed both with and for individuals with voice disorders. With the rise of voice-based technologies in society, in order for everyone to participate in the use of voice-based technologies individuals with voice disorders must be included in both the data that is used to train these systems and the design process. An important and necessary step towards the development of better voice assistive technology as well as more inclusive voice-based systems is the creation of a large, publicly available dataset of dysphonic speech. To this end, a web-based platform to crowdsource voice disorder speech was developed to create such a dataset. This dataset will be released so that it is freely and publicly available to stimulate research in the field of voice-assistive technologies. Future work includes building a robust intelligibility estimation model, as well as employing that model to measure, and therefore enhance, the intelligibility of a given utterance. The hope is that this model will lead to the development of voice-assistive technology using state-of-the-art machine learning models to help individuals with voice disorders be better understood.
ContributorsMoore, Meredith Kay (Author) / Panchanathan, Sethuraman (Thesis advisor) / Berisha, Visar (Committee member) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)
Created2020
168538-Thumbnail Image.png
Description
Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. This thesis introduces a novel representation for user-vectors by combining user demographics and user preferences, making the model

Recently, Generative Adversarial Networks (GANs) have been applied to the problem of Cold-Start Recommendation, but the training performance of these models is hampered by the extreme sparsity in warm user purchase behavior. This thesis introduces a novel representation for user-vectors by combining user demographics and user preferences, making the model a hybrid system which uses Collaborative Filtering and Content Based Recommendation. This system models user purchase behavior using weighted user-product preferences (explicit feedback) rather than binary user-product interactions (implicit feedback). Using this a novel sparse adversarial model, Sparse ReguLarized Generative Adversarial Network (SRLGAN), is developed for Cold-Start Recommendation. SRLGAN leverages the sparse user-purchase behavior which ensures training stability and avoids over-fitting on warm users. The performance of SRLGAN is evaluated on two popular datasets and demonstrate state-of-the-art results.
ContributorsShah, Aksheshkumar Ajaykumar (Author) / Venkateswara, Hemanth (Thesis advisor) / Berman, Spring (Thesis advisor) / Ladani, Leila J (Committee member) / Arizona State University (Publisher)
Created2022