Search Content

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

Description

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such…

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.

ContributorsSrivastava, Gaurav (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2018

Approximate neural networks for speech applications in resource-constrained environments

Description

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the…

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost

of keyword detection and speech recognition networks (or DNNs) are presented.

The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the

accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning,

where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory

footprint by 81 - 84% for speech recognition and keyword detection networks respectively.

Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces

hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without

loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory

requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate.

ContributorsArunachalam, Sairam (Author) / Chakrabarti, Chaitali (Thesis advisor) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2016

Enhancing Stress Detection Systems Using Real-World Data and Deep Neural Networks

Description

As threats emerge and change, the life of a police officer continues to intensify. To better support police training curriculums and police cadets through this critical career juncture, this thesis proposes a state-of-the-art framework for stress detection using real-world data and deep neural networks. As an integral step of a…

As threats emerge and change, the life of a police officer continues to intensify. To better support police training curriculums and police cadets through this critical career juncture, this thesis proposes a state-of-the-art framework for stress detection using real-world data and deep neural networks. As an integral step of a larger study, this thesis investigates data processing techniques to handle the ambiguity of data collected in naturalistic contexts and leverages data structuring approaches to train deep neural networks. The analysis used data collected from 37 police training cadetsin five different training cohorts at the Phoenix Police Regional Training Academy. The data was collected at different intervals during the cadets’ rigorous six-month training course. In total, data were collected over 11 months from all the cohorts combined. All cadets were equipped with a Fitbit wearable device with a custom-built application to collect biometric data, including heart rate and self-reported stress levels. Throughout the data collection period, the cadets were asked to wear the Fitbit device and respond to stress level prompts to capture real-time responses. To manage this naturalistic data, this thesis leveraged heart rate filtering algorithms, including Hampel, Median, Savitzky-Golay, and Wiener, to remove potentially noisy data. After data processing and noise removal, the heart rate data and corresponding stress level labels are processed into two different dataset sizes. The data is then fed into a Deep ECGNet (created by Prajod et al.), a simple Feed Forward network (created by Sim et al.), and a Multilayer Perceptron (MLP) network for binary classification. Experimental results show that the Feed Forward network achieves the highest accuracy (90.66%) for data from a single cohort, while the MLP model performs best on data across cohorts, achieving an 85.92% accuracy. These findings suggest that stress detection is feasible on a variate set of real-world data using deepneural networks.

ContributorsParanjpe, Tara Anand (Author) / Zhao, Ming (Thesis advisor) / Roberts, Nicole (Thesis advisor) / Duran, Nicholas (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2023

Novel Learning-Based Task Schedulers for Domain-Specific SoCs

Description

This Master’s thesis includes the design, integration on-chip, and evaluation of a set of imitation learning (IL)-based scheduling policies: deep neural network (DNN)and decision tree (DT). We first developed IL-based scheduling policies for heterogeneous systems-on-chips (SoCs). Then, we tested these policies using a system-level domain-specific system-on-chip simulation framework [11]. Finally,…

This Master’s thesis includes the design, integration on-chip, and evaluation of a set of imitation learning (IL)-based scheduling policies: deep neural network (DNN)and decision tree (DT). We first developed IL-based scheduling policies for heterogeneous systems-on-chips (SoCs). Then, we tested these policies using a system-level domain-specific system-on-chip simulation framework [11]. Finally, we transformed them into efficient code using a cloud engine [1] and implemented on a user-space emulation framework [61] on a Unix-based SoC. IL is one area of machine learning (ML) and a useful method to train artificial intelligence (AI) models by imitating the decisions of an expert or Oracle that knows the optimal solution. This thesis's primary focus is to adapt an ML model to work on-chip and optimize the resource allocation for a set of domain-specific wireless and radar systems applications. Evaluation results with four streaming applications from wireless communications and radar domains show how the proposed IL-based scheduler approximates an offline Oracle expert with more than 97% accuracy and 1.20× faster execution time. The models have been implemented as an add-on, making it easy to port to other SoCs.

ContributorsHolt, Conrad Mestres (Author) / Ogras, Umit Y. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Akoglu, Ali (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

Approximate neural networks for speech applications in resource-constrained environments

Enhancing Stress Detection Systems Using Real-World Data and Deep Neural Networks

Novel Learning-Based Task Schedulers for Domain-Specific SoCs