Search Content

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

Description

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such…

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement.

To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model.

This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. Specifically, it obtains optimal setting of 2-bit weight and 2-bit activation coupled with 4X structured compression by performing combined exploration of quantization and structured compression settings. The optimal DNN model achieves 50X weight memory reduction compared to floating-point uncompressed DNN. This memory saving is significant since applying only structured sparsity constraints achieves 2X memory savings and only quantization constraints achieves 16X memory savings. The algorithm has been validated on both high and low capacity DNNs and on wide-sparse and deep-sparse DNN models. Experiments demonstrated that deep-sparse DNN outperforms shallow-dense DNN with varying level of memory savings depending on DNN precision and sparsity levels. This work further proposed a Pareto-optimal approach to systematically extract optimal DNN models from a huge set of sparse and dense DNN models. The resulting 11 optimal designs were further evaluated by considering overall DNN memory which includes activation memory and weight memory. It was found that there is only a small change in the memory footprint of the optimal designs corresponding to the low sparsity DNNs. However, activation memory cannot be ignored for high sparsity DNNs.

ContributorsSrivastava, Gaurav (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2018

Algorithm and Hardware Design for Efficient Deep Learning Inference

Description

Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data,…

Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data, robustness to noise in previously unseen data and high inference accuracy. With the ability to learn useful features from raw sensor data, deep learning algorithms have out-performed tradinal AI algorithms and pushed the boundaries of what can be achieved with AI. In this work, we demonstrate the power of deep learning by developing a neural network to automatically detect cough instances from audio recorded in un-constrained environments. For this, 24 hours long recordings from 9 dierent patients is collected and carefully labeled by medical personel. A pre-processing algorithm is proposed to convert event based cough dataset to a more informative dataset with start and end of coughs and also introduce data augmentation for regularizing the training procedure. The proposed neural network achieves 92.3% leave-one-out accuracy on data captured in real world.

Deep neural networks are composed of multiple layers that are compute/memory intensive. This makes it difficult to execute these algorithms real-time with low power consumption using existing general purpose computers. In this work, we propose hardware accelerators for a traditional AI algorithm based on random forest trees and two representative deep convolutional neural networks (AlexNet and VGG). With the proposed acceleration techniques, ~ 30x performance improvement was achieved compared to CPU for random forest trees. For deep CNNS, we demonstrate that much higher performance can be achieved with architecture space exploration using any optimization algorithms with system level performance and area models for hardware primitives as inputs and goal of minimizing latency with given resource constraints. With this method, ~30GOPs performance was achieved for Stratix V FPGA boards.

Hardware acceleration of DL algorithms alone is not always the most ecient way and sucient to achieve desired performance. There is a huge headroom available for performance improvement provided the algorithms are designed keeping in mind the hardware limitations and bottlenecks. This work achieves hardware-software co-optimization for Non-Maximal Suppression (NMS) algorithm. Using the proposed algorithmic changes and hardware architecture

With CMOS scaling coming to an end and increasing memory bandwidth bottlenecks, CMOS based system might not scale enough to accommodate requirements of more complicated and deeper neural networks in future. In this work, we explore RRAM crossbars and arrays as compact, high performing and energy efficient alternative to CMOS accelerators for deep learning training and inference. We propose and implement RRAM periphery read and write circuits and achieved ~3000x performance improvement in online dictionary learning compared to CPU.

This work also examines the realistic RRAM devices and their non-idealities. We do an in-depth study of the effects of RRAM non-idealities on inference accuracy when a pretrained model is mapped to RRAM based accelerators. To mitigate this issue, we propose Random Sparse Adaptation (RSA), a novel scheme aimed at tuning the model to take care of the faults of the RRAM array on which it is mapped. Our proposed method can achieve inference accuracy much higher than what traditional Read-Verify-Write (R-V-W) method could achieve. RSA can also recover lost inference accuracy 100x ~ 1000x faster compared to R-V-W. Using 32-bit high precision RSA cells, we achieved ~10% higher accuracy using fautly RRAM arrays compared to what can be achieved by mapping a deep network to an 32 level RRAM array with no variations.

ContributorsMohanty, Abinash (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2018

Using Capsule Networks for Image and Speech Recognition Problems

Description

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional…

In recent years, conventional convolutional neural network (CNN) has achieved outstanding performance in image and speech processing applications. Unfortunately, the pooling operation in CNN ignores important spatial information which is an important attribute in many applications. The recently proposed capsule network retains spatial information and improves the capabilities of traditional CNN. It uses capsules to describe features in multiple dimensions and dynamic routing to increase the statistical stability of the network.

In this work, we first use capsule network for overlapping digit recognition problem. We evaluate the performance of the network with respect to recognition accuracy, convergence and training time per epoch. We show that capsule network achieves higher accuracy when training set size is small. When training set size is larger, capsule network and conventional CNN have comparable recognition accuracy. The training time per epoch for capsule network is longer than conventional CNN because of the dynamic routing algorithm. An analysis of the GPU timing shows that adjusting the capsule structure can help decrease the time complexity of the dynamic routing algorithm significantly.

Next, we design a capsule network for speech recognition, specifically, overlapping word recognition. We use both capsule network and conventional CNN to recognize 2 overlapping words in speech files created from 5 word classes. We show that capsule network achieves a considerably higher recognition accuracy (96.92%) compared to conventional CNN (85.19%). Our results show that capsule network recognizes overlapping word by recognizing each individual word in the speech. We also verify the scalability of capsule network by increasing the number of word classes from 5 to 10. Capsule network still shows a high recognition accuracy of 95.42% in case of 10 words while the accuracy of conventional CNN decreases sharply to 73.18%.

ContributorsXiong, Yan (Author) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Thesis advisor) / Weng, Yang (Committee member) / Arizona State University (Publisher)

Created2018

Convolutional Neural Networks for Facial Expression Recognition

Description

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can…

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can be used to understand the image better through recognizing different features present within the image. Deep CNNs, however, require training sets that can be larger than a million pictures in order to fine tune their feature detectors. For the case of facial expression datasets, none of these large datasets are available. Due to this limited availability of data required to train a new CNN, the idea of using naïve domain adaptation is explored. Instead of creating and using a new CNN trained specifically to extract features related to FER, a previously trained CNN originally trained for another computer vision task is used. Work for this research involved creating a system that can run a CNN, can extract feature vectors from the CNN, and can classify these extracted features. Once this system was built, different aspects of the system were tested and tuned. These aspects include the pre-trained CNN that was used, the layer from which features were extracted, normalization used on input images, and training data for the classifier. Once properly tuned, the created system returned results more accurate than previous attempts on facial expression recognition. Based on these positive results, naïve domain adaptation is shown to successfully leverage advantages of deep CNNs for facial expression recognition.

ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Exploring the Design of Vibrotactile Cues for Visio-Haptic Sensory Substitution

Description

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of…

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of a chair to provide vibrotactile stimulation in the context of a dyadic (one-on-one) interaction across a table. This work explores the design of spatiotemporal vibration patterns that can be used to convey the basic building blocks of facial movements according to the Facial Action Unit Coding System. A behavioral study was conducted to explore the factors that influence the naturalness of conveying affect using vibrotactile cues.

ContributorsBala, Shantanu (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Department of Psychology (Contributor)

Created2014-05

Development of a Wearable Haptic Feedback System for Use in Lower-Limb Prosthetics: Proof of Concept and Verification

Description

Skin and muscle receptors in the leg and foot provide able-bodied humans with force and position information that is crucial for balance and movement control. In lower-limb amputees however, this vital information is either missing or incomplete. Amputees typically compensate for the loss of sensory information by relying on haptic…

Skin and muscle receptors in the leg and foot provide able-bodied humans with force and position information that is crucial for balance and movement control. In lower-limb amputees however, this vital information is either missing or incomplete. Amputees typically compensate for the loss of sensory information by relying on haptic feedback from the stump-socket interface. Unfortunately, this is not an adequate substitute. Areas of the stump that directly interface with the socket are also prone to painful irritation, which further degrades haptic feedback. The lack of somatosensory feedback from prosthetic legs causes several problems for lower-limb amputees. Previous studies have established that the lack of adequate sensory feedback from prosthetic limbs contributes to poor balance and abnormal gait kinematics. These improper gait kinematics can, in turn, lead to the development of musculoskeletal diseases. Finally, the absence of sensory information has been shown to lead to steeper learning curves and increased rehabilitation times, which hampers amputees from recovering from the trauma. In this study, a novel haptic feedback system for lower-limb amputees was develped, and studies were performed to verify that information presented was sufficiently accurate and precise in comparison to a Bertec 4060-NC force plate. The prototype device consisted of a sensorized insole, a belt-mounted microcontroller, and a linear array of four vibrotactile motors worn on the thigh. The prototype worked by calculating the center of pressure in the anteroposterior plane, and applying a time-discrete vibrotactile stimulus based on the location of the center of pressure.

ContributorsKaplan, Gabriel Benjamin (Author) / Abbas, James (Thesis director) / McDaniel, Troy (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Noninvasive and Accurate Fine Motor Rehabilitation Through a Rhythm Based Game Using a Leap Motion Controller: Usability Evaluation of Leap Motion Game

Description

This paper presents a system to deliver automated, noninvasive, and effective fine motor rehabilitation through a rhythm-based game using a Leap Motion Controller. The system is a rhythm game where hand gestures are used as input and must match the rhythm and gestures shown on screen, thus allowing a physical…

This paper presents a system to deliver automated, noninvasive, and effective fine motor rehabilitation through a rhythm-based game using a Leap Motion Controller. The system is a rhythm game where hand gestures are used as input and must match the rhythm and gestures shown on screen, thus allowing a physical therapist to represent an exercise session involving the user's hand and finger joints as a series of patterns. Fine motor rehabilitation plays an important role in the recovery and improvement of the effects of stroke, Parkinson's disease, multiple sclerosis, and more. Individuals with these conditions possess a wide range of impairment in terms of fine motor movement. The serious game developed takes this into account and is designed to work with individuals with different levels of impairment. In a pilot study, under partnership with South West Advanced Neurological Rehabilitation (SWAN Rehab) in Phoenix, Arizona, we compared the performance of individuals with fine motor impairment to individuals without this impairment to determine whether a human-centered approach and adapting to an user's range of motion can allow an individual with fine motor impairment to perform at a similar level as a non-impaired user.

ContributorsShah, Vatsal Nimishkumar (Author) / McDaniel, Troy (Thesis director) / Tadayon, Ramin (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Why Pop? A System to Explain How Deep Learning Models Classify Music

Description

The impact of Artificial Intelligence (AI) has increased significantly in daily life. AI is taking big strides towards moving into areas of life that are critical such as healthcare but, also into areas such as entertainment and leisure. Deep neural networks have been pivotal in making all these advancements possible.…

The impact of Artificial Intelligence (AI) has increased significantly in daily life. AI is taking big strides towards moving into areas of life that are critical such as healthcare but, also into areas such as entertainment and leisure. Deep neural networks have been pivotal in making all these advancements possible. But, a well-known problem with deep neural networks is the lack of explanations for the choices it makes. To combat this, several methods have been tried in the field of research. One example of this is assigning rankings to the individual features and how influential they are in the decision-making process. In contrast a newer class of methods focuses on Concept Activation Vectors (CAV) which focus on extracting higher-level concepts from the trained model to capture more information as a mixture of several features and not just one. The goal of this thesis is to employ concepts in a novel domain: to explain how a deep learning model uses computer vision to classify music into different genres. Due to the advances in the field of computer vision with deep learning for classification tasks, it is rather a standard practice now to convert an audio clip into corresponding spectrograms and use those spectrograms as image inputs to the deep learning model. Thus, a pre-trained model can classify the spectrogram images (representing songs) into musical genres. The proposed explanation system called “Why Pop?” tries to answer certain questions about the classification process such as what parts of the spectrogram influence the model the most, what concepts were extracted and how are they different for different classes. These explanations aid the user gain insights into the model’s learnings, biases, and the decision-making process.

ContributorsSharma, Shubham (Author) / Bryan, Chris (Thesis advisor) / McDaniel, Troy (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2022

Low-Intensity Blood Flow Restriction Training as a Preoperative Rehabilitative Modality to Improve Postoperative Outcomes for Anterior Cruciate Ligament Reconstruction

Description

One of the long-standing issues that has arisen in the sports medicine field is identifying the ideal methodology to optimize recovery following anterior cruciate ligament reconstruction (ACLR). The perioperative period for ACLR is notoriously heterogeneous in nature as it consists of many variables that can impact surgical outcomes. While there…

One of the long-standing issues that has arisen in the sports medicine field is identifying the ideal methodology to optimize recovery following anterior cruciate ligament reconstruction (ACLR). The perioperative period for ACLR is notoriously heterogeneous in nature as it consists of many variables that can impact surgical outcomes. While there has been extensive literature published regarding the efficacy of various recovery and rehabilitation topics, it has been widely acknowledged that certain modalities within the field of ACLR rehabilitation need further high-quality evidence to support their use in clinical practice, such as blood flow restriction (BFR) training. BFR training involves the application of a tourniquet-like cuff to the proximal aspect of a limb prior to exercise; the cuff is inflated so that it occludes venous flow but allows arterial inflow. BFR is usually combined with low-intensity (LI) resistance training, with resistance as low as 20% of one-repetition maximum (1RM). LI-BFR has been used as an emerging clinical modality to combat postoperative atrophy of the quadriceps muscles for those who have undergone ACLR, as these individuals cannot safely tolerate high muscular tension exercise after surgery. Impairments of the quadriceps are the major cause of poor functional status of patients following an otherwise successful ACLR procedure; however, these impairments can be mitigated with preoperative rehabilitation done before surgery. It was hypothesized that the use of a preoperative LI-BFR training protocol could help improve postoperative outcomes following ACLR; primarily, strength and hypertrophy of the quadriceps. When compared with a SHAM control group, subjects who were randomized to a BFR intervention group made greater preoperative strength gains in the quadriceps and recovered quadriceps mass at an earlier timepoint than that of the SHAM group aftersurgery; however, the gains made in strength were not able to be maintained in the 8-week postoperative period. While these results do not support the use of LI-BFR from the short-term perspective after ACLR, follow-up data will be used to investigate trends in re-injury and return to sport rates to evaluate the efficacy of the use of LI-BFR from a long-term perspective.

ContributorsGlattke, Kaycee Elizabeth (Author) / Lockhart, Thurmon (Thesis advisor) / McDaniel, Troy (Committee member) / Banks, Scott (Committee member) / Peterson, Daniel (Committee member) / Lee, Hyunglae (Committee member) / Arizona State University (Publisher)

Created2022

Classification of Fabric Based Soft Actuators and Feedback Controller for At-home Hand Rehabilitation

Description

With an aging population, the number of later in life health related incidents like stroke stand to become more prevalent. Unfortunately, the majority those who are most at risk for debilitating heath episodes are either uninsured or under insured when it comes to long term physical/occupational therapy. As insurance companies…

With an aging population, the number of later in life health related incidents like stroke stand to become more prevalent. Unfortunately, the majority those who are most at risk for debilitating heath episodes are either uninsured or under insured when it comes to long term physical/occupational therapy. As insurance companies lower coverage and/or raise prices of plans with sufficient coverage, it can be expected that the proportion of uninsured/under insured to fully insured people will rise. To address this, lower cost alternative methods of treatment must be developed so people can obtain the treated required for a sufficient recovery. The presented robotic glove employs low cost fabric soft pneumatic actuators which use a closed loop feedback controller based on readings from embedded soft sensors. This provides the device with proprioceptive abilities for the dynamic control of each independent actuator. Force and fatigue tests were performed to determine the viability of the actuator design. A Box and Block test along with a motion capture study was completed to study the performance of the device. This paper presents the design and classification of a soft robotic glove with a feedback controller as a at-home stroke rehabilitation device.

ContributorsAxman, Reed C (Author) / Zhang, Wenlong (Thesis advisor) / Santello, Marco (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by