Search Content

Data-Driven Representation Learning in Multimodal Feature Fusion

Description

Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance.…

Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction.

We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems.

In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph.

ContributorsSong, Huan (Author) / Spanias, Andreas (Thesis advisor) / Thiagarajan, Jayaraman (Committee member) / Berisha, Visar (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2018

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

Description

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such…

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.

ContributorsSrivastava, Gaurav (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2018

Using Capsule Networks for Image and Speech Recognition Problems

Description

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional…

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.

ContributorsXiong, Yan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Thesis advisor) / Weng, Yang (Committee member) / Arizona State University (Publisher)

Created2018

Convolutional Neural Networks for Facial Expression Recognition

Description

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can…

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can be used to understand the image better through recognizing different features present within the image. Deep CNNs, however, require training sets that can be larger than a million pictures in order to fine tune their feature detectors. For the case of facial expression datasets, none of these large datasets are available. Due to this limited availability of data required to train a new CNN, the idea of using naïve domain adaptation is explored. Instead of creating and using a new CNN trained specifically to extract features related to FER, a previously trained CNN originally trained for another computer vision task is used. Work for this research involved creating a system that can run a CNN, can extract feature vectors from the CNN, and can classify these extracted features. Once this system was built, different aspects of the system were tested and tuned. These aspects include the pre-trained CNN that was used, the layer from which features were extracted, normalization used on input images, and training data for the classifier. Once properly tuned, the created system returned results more accurate than previous attempts on facial expression recognition. Based on these positive results, naïve domain adaptation is shown to successfully leverage advantages of deep CNNs for facial expression recognition.

ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

A New Virtual Reality: Video Game Addiction in the Age of ESports

Description

While not officially recognized as an addictive activity by the Diagnostic and Statistical Manual of Mental Disorders, video game addiction has well-documented resources pointing to its effects on physiological and mental health for both addict and those close to the addict. With the rise of eSports, treating video game addiction…

While not officially recognized as an addictive activity by the Diagnostic and Statistical Manual of Mental Disorders, video game addiction has well-documented resources pointing to its effects on physiological and mental health for both addict and those close to the addict. With the rise of eSports, treating video game addiction has become trickier as a passionate and growing fan base begins to act as a culture not unlike traditional sporting. These concerns call for a better understanding of what constitutes a harmful addiction to video games as its heavy practice becomes more financially viable and accepted into mainstream culture.

ContributorsGohil, Abhishek Bhagirathsinh (Author) / Kashiwagi, Dean (Thesis director) / Kashiwagi, Jacob (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Game Engine for 2D Fighting Games with Simple DirectMedia Layer

Description

This project is a Game Engine for 2D Fighting Games which uses Simple DirectMedia Layer and C++. The Game Engine's goal is to model the conventions the genre has for dynamically handling combat between two characters. The characters can be in a variety of different states that animate certain features…

This project is a Game Engine for 2D Fighting Games which uses Simple DirectMedia Layer and C++. The Game Engine's goal is to model the conventions the genre has for dynamically handling combat between two characters. The characters can be in a variety of different states that animate certain features while also responding to the environment based on key statuses. There is a playable test game that is the subject of a user study. The Game Engine's capabilities are shown by the test game and the limitations / missing features are discussed.

ContributorsStanton, Nicholas Scott (Author) / Kobayashi, Yoshihiro (Thesis director) / Hansford, Dianne (Committee member) / Computer Science and Engineering Program (Contributor) / Sanford School of Social and Family Dynamics (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Jaipur Simulation and AI

Description

This paper details the process for designing both a simulation of the board game Jaipur, and an artificial intelligence (AI) agent that can play the game against a human player. When designing an AI for a card game, there are two major problems that can arise. The first is the…

This paper details the process for designing both a simulation of the board game Jaipur, and an artificial intelligence (AI) agent that can play the game against a human player. When designing an AI for a card game, there are two major problems that can arise. The first is the difficulty of using a search space to analyze every possible set of future moves. Due to the randomized nature of the deck of cards, the search space rapidly leads to an exponentially growing set of potential game states to analyze when one tries to look more than one turn ahead. The second aspect that poses difficulty is the element of uncertainty that exists from opponent feedback. Certain moves are weak to specific opponent reactions, and these are difficult to predict due to hidden information. To circumvent these problems, the AI uses a greedy approach to decision making, attempting to maximize the value of its plays immediately, and not play for future turns. The agent utilizes conditional statements to evaluate the game state and choose a game action that it deems optimal, a heuristic to place an expected value (EV) of the goods it can choose from, and selects the best one based on this evaluation. Initial implementation of the simulation was done using C++ through a terminal application, and then was translated to a graphical interface using Unity and C#.

ContributorsOrr, James Christopher (Author) / Kobayashi, Yoshihiro (Thesis director) / Selgrad, Justin (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

The Emblems: Speech-Recognition in Games

Description

Speech recognition in games is rarely seen. This work presents a project, a 2D computer game named "The Emblems" which utilizes speech recognition as input. The game itself is a two person strategy game whose goal is to defeat the opposing player's army. This report focuses on the speech-recognition aspect…

Speech recognition in games is rarely seen. This work presents a project, a 2D computer game named "The Emblems" which utilizes speech recognition as input. The game itself is a two person strategy game whose goal is to defeat the opposing player's army. This report focuses on the speech-recognition aspect of the project. The players interact on a turn-by-turn basis by speaking commands into the computer's microphone. When the computer recognizes a command, it will respond accordingly by having the player's unit perform an action on screen.

ContributorsNguyen, Jordan Ngoc (Author) / Kobayashi, Yoshihiro (Thesis director) / Maciejewski, Ross (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor)

Created2014-05

The Emblems: OpenGL

Description

The project, "The Emblems: OpenGL" is a 2D strategy game that incorporates Speech Recognition for control and OpenGL for computer graphics. Players control their own army by voice commands and try to eliminate the opponent's army. This report focuses on the 2D art and visual aspects of the project. There…

The project, "The Emblems: OpenGL" is a 2D strategy game that incorporates Speech Recognition for control and OpenGL for computer graphics. Players control their own army by voice commands and try to eliminate the opponent's army. This report focuses on the 2D art and visual aspects of the project. There are different sprites for the player's army units and icons within the game. The game also has a grid for easy unit placement.

ContributorsHsia, Allen (Author) / Kobayashi, Yoshihiro (Thesis director) / Maciejewski, Ross (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2014-05

Development of an Educational Video Game

Description

The objective of this creative project was to gain experience in digital modeling, animation, coding, shader development and implementation, model integration techniques, and application of gaming principles and design through developing a professional educational game. The team collaborated with Glendale Community College (GCC) to produce an interactive product intended to…

The objective of this creative project was to gain experience in digital modeling, animation, coding, shader development and implementation, model integration techniques, and application of gaming principles and design through developing a professional educational game. The team collaborated with Glendale Community College (GCC) to produce an interactive product intended to supplement educational instructions regarding nutrition. The educational game developed, "Nutribots" features the player acting as a nutrition based nanobot sent to the small intestine to help the body. Throughout the game the player will be asked nutrition based questions to test their knowledge of proteins, carbohydrates, and lipids. If the player is unable to answer the question, they must use game mechanics to progress and receive the information as a reward. The level is completed as soon as the question is answered correctly. If the player answers the questions incorrectly twenty times within the entirety of the game, the team loses faith in the player, and the player must reset from title screen. This is to limit guessing and to make sure the player retains the information through repetition once it is demonstrated that they do not know the answers. The team was split into two different groups for the development of this game. The first part of the team developed models, animations, and textures using Autodesk Maya 2016 and Marvelous Designer. The second part of the team developed code and shaders, and implemented products from the first team using Unity and Visual Studio. Once a prototype of the game was developed, it was show-cased amongst peers to gain feedback. Upon receiving feedback, the team implemented the desired changes accordingly. Development for this project began on November 2015 and ended on April 2017. Special thanks to Laura Avila Department Chair and Jennifer Nolz from Glendale Community College Technology and Consumer Sciences, Food and Nutrition Department.

ContributorsNolz, Daisy (Co-author) / Martin, Austin (Co-author) / Quinio, Santiago (Co-author) / Armstrong, Jessica (Co-author) / Kobayashi, Yoshihiro (Thesis director) / Valderrama, Jamie (Committee member) / School of Arts, Media and Engineering (Contributor) / School of Film, Dance and Theatre (Contributor) / Department of English (Contributor) / Computer Science and Engineering Program (Contributor) / Computing and Informatics Program (Contributor) / Herberger Institute for Design and the Arts (Contributor) / School of Sustainability (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Filtering by