Search Content

Using Capsule Networks for Image and Speech Recognition Problems

Description

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional…

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.

ContributorsXiong, Yan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Thesis advisor) / Weng, Yang (Committee member) / Arizona State University (Publisher)

Created2018

Use of Bayesian filtering and adaptive learning methods to improve the detection and estimation of pathological and neurological disorders

Description

Biological and biomedical measurements, when adequately analyzed and processed, can be used to impart quantitative diagnosis during primary health care consultation to improve patient adherence to recommended treatments. For example, analyzing neural recordings from neurostimulators implanted in patients with neurological disorders can be used by a physician to adjust detrimental…

Biological and biomedical measurements, when adequately analyzed and processed, can be used to impart quantitative diagnosis during primary health care consultation to improve patient adherence to recommended treatments. For example, analyzing neural recordings from neurostimulators implanted in patients with neurological disorders can be used by a physician to adjust detrimental stimulation parameters to improve treatment. As another example, biosequences, such as sequences from peptide microarrays obtained from a biological sample, can potentially provide pre-symptomatic diagnosis for infectious diseases when processed to associate antibodies to specific pathogens or infectious agents. This work proposes advanced statistical signal processing and machine learning methodologies to assess neurostimulation from neural recordings and to extract diagnostic information from biosequences.

For locating specific cognitive and behavioral information in different regions of the brain, neural recordings are processed using sequential Bayesian filtering methods to detect and estimate both the number of neural sources and their corresponding parameters. Time-frequency based feature selection algorithms are combined with adaptive machine learning approaches to suppress physiological and non-physiological artifacts present in neural recordings. Adaptive processing and unsupervised clustering methods applied to neural recordings are also used to suppress neurostimulation artifacts and classify between various behavior tasks to assess the level of neurostimulation in patients.

For pathogen detection and identification, random peptide sequences and their properties are first uniquely mapped to highly-localized signals and their corresponding parameters in the time-frequency plane. Time-frequency signal processing methods are then applied to estimate antigenic determinants or epitope candidates for detecting and identifying potential pathogens.

ContributorsMaurer, Alexander Joseph (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Committee member) / Chakrabarti, Chaitali (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)

Created2016

Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures

Description

With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.

First, not all parallel threads…

With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.

First, not all parallel threads have a uniform amount of workload to fully utilize GPU’s computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution time slices and additional cache resources to the critical warp. The evaluation result shows that with CAWA, GPUs can achieve an average of 1.23x speedup.

Second, the shared cache storage in GPUs is often insufficient to accommodate demands of the large number of concurrent threads. As a result, cache thrashing is commonly experienced in GPU’s cache memories, particularly in the L1 data caches. To alleviate the cache contention and thrashing problem, I develop an instruction aware Control Loop Based Adaptive Bypassing algorithm, called Ctrl-C. Ctrl-C learns the cache reuse behavior and bypasses a portion of memory requests with the help of feedback control loops. The evaluation result shows that Ctrl-C can effectively improve cache utilization in GPUs and achieve an average of 1.42x speedup for cache sensitive GPGPU workloads.

Finally, GPU workloads and the co-located processes running on the host chip multiprocessor (CMP) in a heterogeneous system setup can contend for memory resources in multiple levels, resulting in significant performance degradation. To maximize the system throughput and balance the performance degradation of all co-located applications, I design a scalable performance degradation predictor specifically for heterogeneous systems, called HeteroPDP. HeteroPDP predicts the application execution time and schedules OpenCL workloads to run on different devices based on the optimization goal. The evaluation result shows HeteroPDP can improve the system fairness from 24% to 65% when an OpenCL application is co-located with other processes, and gain an additional 50% speedup compared with always offloading the OpenCL workload to GPUs.

In summary, this dissertation aims to provide insights for the future microarchitecture and system architecture designs by identifying, analyzing, and addressing three critical performance problems in modern GPUs.

ContributorsLee, Shin-Ying (Author) / Wu, Carole-Jean (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Ren, Fengbo (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2017

Development of Multiple Protocols in Novel Simulation Environment

Description

When one considers the current state of wireless communications, it becomes clear that it is both absolutely amazing and something of a mess. Present communications standards are the result of local optimizations over time that led to a confusing set of suboptimal and fragile wireless standards. Starting from a clean…

When one considers the current state of wireless communications, it becomes clear that it is both absolutely amazing and something of a mess. Present communications standards are the result of local optimizations over time that led to a confusing set of suboptimal and fragile wireless standards. Starting from a clean sheet of paper, Bliss Laboratory for Information, Signals, and Systems (BLISS) is considering a fluid set of communications standards co-optimized with flexible but power-efficient computational implementations that will enable the next revolution of wireless communications. The main aim is to enable much higher data rates and much lower data rates with corresponding lower power consumption as the needs of the users vary.

The thesis mainly looks at the different sections of the work done, to prime the development of the protocol development engine. It discusses channel modeling, and system integration of receiver and channel noise. It also proposes a Carrier-Sense Multiple Access (CSMA) Media Access Control (MAC) layer protocol implementation for (Wireless Fidelity) Wi-Fi protocol. This work also talks about the Graphical User Interface (GUI), which is a part of Protocol Development Kit (PDK) - a combination of the Protocol Recommendation Engine (PRE) and simulation package to aid the development of protocols. It also sheds light on the Automatic Dependent Surveillance - Broadcast (ADS-B) radio protocol, that will eventually replace radar as Air Traffic Control's (ATC) primary tool for separating aircraft.

All the algorithms used in this thesis, to define radio operation were in principle defined by mathematical descriptions; however, to test and implement these algorithms they had to be converted to a computer language. There were multiple phases of this conversion. In the first phase, the implementation of these algorithms was done in Matrix Laboratory (MATLAB). To aid this development, basic radio finite state machines and radio algorithmic tools were provided.

ContributorsRupakula, Venkata Sai Karteek (Author) / Bliss, Daniel W (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / McGiffen, Tom (Committee member) / Arizona State University (Publisher)

Created2017

Low Complexity Wireless Communication Digital Baseband Design

Description

This thesis addresses two problems in digital baseband design of wireless communication systems, namely, those in Internet of Things (IoT) terminals that support long range communications and those in full-duplex systems that are designed for high spectral efficiency.

IoT terminals for long range communications are typically based on Orthogonal Frequency-Division Multiple…

This thesis addresses two problems in digital baseband design of wireless communication systems, namely, those in Internet of Things (IoT) terminals that support long range communications and those in full-duplex systems that are designed for high spectral efficiency.

IoT terminals for long range communications are typically based on Orthogonal Frequency-Division Multiple Access (OFDMA) and spread spectrum technologies. In order to design an efficient baseband architecture for such terminals, the workload profiles of both systems are analyzed. Since frame detection unit has by far the highest computational load, a simple architecture that uses only a scalar datapath is proposed. To optimize for low energy consumption, application-specific instructions that minimize register accesses and address generation units for streamlined memory access are introduced. Two parameters, namely, correlation window size and threshold value, affect the detection probability, the false alarm probability and hence energy consumption. Next, energy-optimal operation settings for correlation window size and threshold value are derived for different channel conditions. For both good and bad channel conditions, if target signal detection probability is greater than 0.9, the baseband processor has the lowest energy when the frame detection algorithm uses the longest correlation window and the highest threshold value.

A full-duplex system has high spectral efficiency but suffers from self-interference. Part of the interference can be cancelled digitally using equalization techniques. The cancellation performance and computation complexity of the competing equalization algorithms, namely, Least Mean Square (LMS), Normalized LMS (NLMS), Recursive Least Square (RLS) and feedback equalizers based on LMS, NLMS and RLS are analyzed, and a trade-off between performance and complexity established. NLMS linear equalizer is found to be suitable for resource-constrained mobile devices and NLMS decision feedback equalizer is more appropriate for base stations that are not energy constrained.

ContributorsWu, Shunyao (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Lee, Hyunseok (Committee member) / Arizona State University (Publisher)

Created2017

Algorithm and Hardware Co-design for Learning On-a-chip

Description

Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As…

Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm.

For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability.

Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance.

From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.

ContributorsXu, Zihan (Author) / Cao, Yu (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Seo, Jae-Sun (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)

Created2017

Computer Vision from Spatial-Multiplexing Cameras at Low Measurement Rates

Description

In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the…

In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the emergence of novel imagers called spatial-multiplexing cameras, which offer compression at the sensing level itself by providing an arbitrary linear measurements of the scene instead of pixel-based sampling. In this dissertation, I discuss various approaches for effective information extraction from spatial-multiplexing measurements and present the trade-offs between reliability of the performance and computational/storage load of the system. In the first part, I present a reconstruction-free approach to high-level inference in computer vision, wherein I consider the specific case of activity analysis, and show that using correlation filters, one can perform effective action recognition and localization directly from a class of spatial-multiplexing cameras, called compressive cameras, even at very low measurement rates of 1\%. In the second part, I outline a deep learning based non-iterative and real-time algorithm to reconstruct images from compressively sensed (CS) measurements, which can outperform the traditional iterative CS reconstruction algorithms in terms of reconstruction quality and time complexity, especially at low measurement rates. To overcome the limitations of compressive cameras, which are operated with random measurements and not particularly tuned to any task, in the third part of the dissertation, I propose a method to design spatial-multiplexing measurements, which are tuned to facilitate the easy extraction of features that are useful in computer vision tasks like object tracking. The work presented in the dissertation provides sufficient evidence to high-level inference in computer vision at extremely low measurement rates, and hence allows us to think about the possibility of revamping the current day computer systems.

ContributorsKulkarni, Kuldeep Sharad (Author) / Turaga, Pavan (Thesis advisor) / Li, Baoxin (Committee member) / Chakrabarti, Chaitali (Committee member) / Sankaranarayanan, Aswin (Committee member) / LiKamWa, Robert (Committee member) / Arizona State University (Publisher)

Created2017

Locally Adaptive Stereo Vision Based 3D Visual Reconstruction

Description

Using stereo vision for 3D reconstruction and depth estimation has become a popular and promising research area as it has a simple setup with passive cameras and relatively efficient processing procedure. The work in this dissertation focuses on locally adaptive stereo vision methods and applications to different imaging setups and…

Using stereo vision for 3D reconstruction and depth estimation has become a popular and promising research area as it has a simple setup with passive cameras and relatively efficient processing procedure. The work in this dissertation focuses on locally adaptive stereo vision methods and applications to different imaging setups and image scenes.

Solder ball height and substrate coplanarity inspection is essential to the detection of potential connectivity issues in semi-conductor units. Current ball height and substrate coplanarity inspection tools are expensive and slow, which makes them difficult to use in a real-time manufacturing setting. In this dissertation, an automatic, stereo vision based, in-line ball height and coplanarity inspection method is presented. The proposed method includes an imaging setup together with a computer vision algorithm for reliable, in-line ball height measurement. The imaging setup and calibration, ball height estimation and substrate coplanarity calculation are presented with novel stereo vision methods. The results of the proposed method are evaluated in a measurement capability analysis (MCA) procedure and compared with the ground-truth obtained by an existing laser scanning tool and an existing confocal inspection tool. The proposed system outperforms existing inspection tools in terms of accuracy and stability.

In a rectified stereo vision system, stereo matching methods can be categorized into global methods and local methods. Local stereo methods are more suitable for real-time processing purposes with competitive accuracy as compared with global methods. This work proposes a stereo matching method based on sparse locally adaptive cost aggregation. In order to reduce outlier disparity values that correspond to mis-matches, a novel sparse disparity subset selection method is proposed by assigning a significance status to candidate disparity values, and selecting the significant disparity values adaptively. An adaptive guided filtering method using the disparity subset for refined cost aggregation and disparity calculation is demonstrated. The proposed stereo matching algorithm is tested on the Middlebury and the KITTI stereo evaluation benchmark images. A performance analysis of the proposed method in terms of the I0 norm of the disparity subset is presented to demonstrate the achieved efficiency and accuracy.

ContributorsLi, Jinjin (Author) / Karam, Lina (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Patel, Nital (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2017

Target discrimination against clutter based on unsupervised clustering and sequential Monte Carlo tracking

Description

The radar performance of detecting a target and estimating its parameters can deteriorate rapidly in the presence of high clutter. This is because radar measurements due to clutter returns can be falsely detected as if originating from the actual target. Various data association methods and multiple hypothesis filtering…

The radar performance of detecting a target and estimating its parameters can deteriorate rapidly in the presence of high clutter. This is because radar measurements due to clutter returns can be falsely detected as if originating from the actual target. Various data association methods and multiple hypothesis filtering approaches have been considered to solve this problem. Such methods, however, can be computationally intensive for real time radar processing. This work proposes a new approach that is based on the unsupervised clustering of target and clutter detections before target tracking using particle filtering. In particular, Gaussian mixture modeling is first used to separate detections into two Gaussian distinct mixtures. Using eigenvector analysis, the eccentricity of the covariance matrices of the Gaussian mixtures are computed and compared to threshold values that are obtained a priori. The thresholding allows only target detections to be used for target tracking. Simulations demonstrate the performance of the new algorithm and compare it with using k-means for clustering instead of Gaussian mixture modeling.

ContributorsFreeman, Matthew Gregory (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2016

Designing Low Cost Error Correction Schemes for Improving Memory Reliability

Description

Memory systems are becoming increasingly error-prone, and thus guaranteeing their reliability is a major challenge. In this dissertation, new techniques to improve the reliability of both 2D and 3D dynamic random access memory (DRAM) systems are presented. The proposed schemes have higher reliability than current systems but with lower power,…

Memory systems are becoming increasingly error-prone, and thus guaranteeing their reliability is a major challenge. In this dissertation, new techniques to improve the reliability of both 2D and 3D dynamic random access memory (DRAM) systems are presented. The proposed schemes have higher reliability than current systems but with lower power, better performance and lower hardware cost.

First, a low overhead solution that improves the reliability of commodity DRAM systems with no change in the existing memory architecture is presented. Specifically, five erasure and error correction (E-ECC) schemes are proposed that provide at least Chipkill-Correct protection for x4 (Schemes 1, 2 and 3), x8 (Scheme 4) and x16 (Scheme 5) DRAM systems. All schemes have superior error correction performance due to the use of strong symbol-based codes. In addition, the use of erasure codes extends the lifetime of the 2D DRAM systems.

Next, two error correction schemes are presented for 3D DRAM memory systems. The first scheme is a rate-adaptive, two-tiered error correction scheme (RATT-ECC) that provides strong reliability (10^10x) reduction in raw FIT rate) for an HBM-like 3D DRAM system that services CPU applications. The rate-adaptive feature of RATT-ECC enables permanent bank failures to be handled through sparing. It can also be used to significantly reduce the refresh power consumption without decreasing the reliability and timing performance.

The second scheme is a two-tiered error correction scheme (Config-ECC) that supports different sized accesses in GPU applications with strong reliability. It addresses the mismatch between data access size and fixed sized ECC scheme by designing a product code based flexible scheme. Config-ECC is built around a core unit designed for 32B access with a simple extension to support 64B and 128B accesses. Compared to fixed 32B and 64B ECC schemes, Config-ECC reduces the failure in time (FIT) rate by 200x and 20x, respectively. It also reduces the memory energy by 17% (in the dynamic mode) and 21% (in the static mode) compared to a state-of-the-art fixed 64B ECC scheme.

ContributorsChen, Hsing-Min (Author) / Chakrabarti, Chaitali (Thesis advisor) / Mudge, Trevor (Committee member) / Wu, Carole-Jean (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)

Created2017

Filtering by