Search Content

Matching Items (3)

Using Capsule Networks for Image and Speech Recognition Problems

Description

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.

ContributorsXiong, Yan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Thesis advisor) / Weng, Yang (Committee member) / Arizona State University (Publisher)

Created2018

Energy-Efficient Electoral System for Underwater Sea Turtle Image Recognition

Description

This paper presents work that was done to develop an energy-efficient electoral and frame count system for underwater sea turtle image and video recognition using convolutional neural networks, deep learning framework, and the Python programming language. An underwater sea turtle image recognition program is essential to protect turtles from the threat of bycatch - sea turtles are accidentally caught when fishermen aim for a different type of underwater species. This underwater image recognition system is used to detect the presence of sea turtles, then different kinds of acoustic and light stimuli are used to warn the turtles of approaching danger to reduce bycatch. This image detection system will be placed on a fishing boat to run on a machine at all times (24 hours and 7 days a week). A live video capture from a low-power underwater camera that is attached to the boat will be sent to the image detection system on the machine to analyze the presence of sea turtles in each frame of the video. To lower the computational time and energy of the machine, an energy-efficient electoral and frame count system is implemented on this image detection system.

ContributorsDeng, Enhong (Author) / Ozev, Sule (Thesis director) / Blain Christen, Jennifer (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

An Empirical Study of View Construction for Multi-View Learning

Description

Multi-view learning, a subfield of machine learning that aims to improve model performance by training on multiple views of the data, has been studied extensively in the past decades. It is typically applied in contexts where the input features naturally form multiple groups or views. An example of a naturally multi-view context is a data set of websites, where each website is described not only by the text on the page, but also by the text of hyperlinks pointing to the page. More recently, various studies have demonstrated the initial success of applying multi-view learning on single-view data with multiple artificially constructed views. However, there lacks a systematic study regarding the effectiveness of such artificially constructed views. To bridge this gap, this thesis begins by providing a high-level overview of multi-view learning with the co-training algorithm. Co-training is a classic semi-supervised learning algorithm that takes advantage of both labelled and unlabelled examples in the data set for training. Then, the thesis presents a web-based tool developed in Python allowing users to experiment with and compare the performance of multiple view construction approaches on various data sets. The supported view construction approaches in the web-based tool include subsampling, Optimal Feature Set Partitioning, and the genetic algorithm. Finally, the thesis presents an empirical comparison of the performance of these approaches, not only against one another, but also against traditional single-view models. The findings show that a simple subsampling approach combined with co-training often outperforms both the other view construction approaches, as well as traditional single-view methods.

ContributorsAksoy, Kaan (Author) / Maciejewski, Ross (Thesis director) / He, Jingrui (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12