Search Content

Algorithm and Hardware Design for Efficient Deep Learning Inference

Description

Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data,…

Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data, robustness to noise in previously unseen data and high inference accuracy. With the ability to learn useful features from raw sensor data, deep learning algorithms have out-performed tradinal AI algorithms and pushed the boundaries of what can be achieved with AI. In this work, we demonstrate the power of deep learning by developing a neural network to automatically detect cough instances from audio recorded in un-constrained environments. For this, 24 hours long recordings from 9 dierent patients is collected and carefully labeled by medical personel. A pre-processing algorithm is proposed to convert event based cough dataset to a more informative dataset with start and end of coughs and also introduce data augmentation for regularizing the training procedure. The proposed neural network achieves 92.3% leave-one-out accuracy on data captured in real world.

Deep neural networks are composed of multiple layers that are compute/memory intensive. This makes it difficult to execute these algorithms real-time with low power consumption using existing general purpose computers. In this work, we propose hardware accelerators for a traditional AI algorithm based on random forest trees and two representative deep convolutional neural networks (AlexNet and VGG). With the proposed acceleration techniques, ~ 30x performance improvement was achieved compared to CPU for random forest trees. For deep CNNS, we demonstrate that much higher performance can be achieved with architecture space exploration using any optimization algorithms with system level performance and area models for hardware primitives as inputs and goal of minimizing latency with given resource constraints. With this method, ~30GOPs performance was achieved for Stratix V FPGA boards.

Hardware acceleration of DL algorithms alone is not always the most ecient way and sucient to achieve desired performance. There is a huge headroom available for performance improvement provided the algorithms are designed keeping in mind the hardware limitations and bottlenecks. This work achieves hardware-software co-optimization for Non-Maximal Suppression (NMS) algorithm. Using the proposed algorithmic changes and hardware architecture

With CMOS scaling coming to an end and increasing memory bandwidth bottlenecks, CMOS based system might not scale enough to accommodate requirements of more complicated and deeper neural networks in future. In this work, we explore RRAM crossbars and arrays as compact, high performing and energy efficient alternative to CMOS accelerators for deep learning training and inference. We propose and implement RRAM periphery read and write circuits and achieved ~3000x performance improvement in online dictionary learning compared to CPU.

This work also examines the realistic RRAM devices and their non-idealities. We do an in-depth study of the effects of RRAM non-idealities on inference accuracy when a pretrained model is mapped to RRAM based accelerators. To mitigate this issue, we propose Random Sparse Adaptation (RSA), a novel scheme aimed at tuning the model to take care of the faults of the RRAM array on which it is mapped. Our proposed method can achieve inference accuracy much higher than what traditional Read-Verify-Write (R-V-W) method could achieve. RSA can also recover lost inference accuracy 100x ~ 1000x faster compared to R-V-W. Using 32-bit high precision RSA cells, we achieved ~10% higher accuracy using fautly RRAM arrays compared to what can be achieved by mapping a deep network to an 32 level RRAM array with no variations.

ContributorsMohanty, Abinash (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2018

Convolutional Neural Networks for Facial Expression Recognition

Description

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can…

This paper presents work that was done to create a system capable of facial expression recognition (FER) using deep convolutional neural networks (CNNs) and test multiple configurations and methods. CNNs are able to extract powerful information about an image using multiple layers of generic feature detectors. The extracted information can be used to understand the image better through recognizing different features present within the image. Deep CNNs, however, require training sets that can be larger than a million pictures in order to fine tune their feature detectors. For the case of facial expression datasets, none of these large datasets are available. Due to this limited availability of data required to train a new CNN, the idea of using naïve domain adaptation is explored. Instead of creating and using a new CNN trained specifically to extract features related to FER, a previously trained CNN originally trained for another computer vision task is used. Work for this research involved creating a system that can run a CNN, can extract feature vectors from the CNN, and can classify these extracted features. Once this system was built, different aspects of the system were tested and tuned. These aspects include the pre-trained CNN that was used, the layer from which features were extracted, normalization used on input images, and training data for the classifier. Once properly tuned, the created system returned results more accurate than previous attempts on facial expression recognition. Based on these positive results, naïve domain adaptation is shown to successfully leverage advantages of deep CNNs for facial expression recognition.

ContributorsEusebio, Jose Miguel Ang (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Exploring the Design of Vibrotactile Cues for Visio-Haptic Sensory Substitution

Description

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of…

This paper presents the design and evaluation of a haptic interface for augmenting human-human interpersonal interactions by delivering facial expressions of an interaction partner to an individual who is blind using a visual-to-tactile mapping of facial action units and emotions. Pancake shaftless vibration motors are mounted on the back of a chair to provide vibrotactile stimulation in the context of a dyadic (one-on-one) interaction across a table. This work explores the design of spatiotemporal vibration patterns that can be used to convey the basic building blocks of facial movements according to the Facial Action Unit Coding System. A behavioral study was conducted to explore the factors that influence the naturalness of conveying affect using vibrotactile cues.

ContributorsBala, Shantanu (Author) / Panchanathan, Sethuraman (Thesis director) / McDaniel, Troy (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Department of Psychology (Contributor)

Created2014-05

Development of a Wearable Haptic Feedback System for Use in Lower-Limb Prosthetics: Proof of Concept and Verification

Description

Skin and muscle receptors in the leg and foot provide able-bodied humans with force and position information that is crucial for balance and movement control. In lower-limb amputees however, this vital information is either missing or incomplete. Amputees typically compensate for the loss of sensory information by relying on haptic…

Skin and muscle receptors in the leg and foot provide able-bodied humans with force and position information that is crucial for balance and movement control. In lower-limb amputees however, this vital information is either missing or incomplete. Amputees typically compensate for the loss of sensory information by relying on haptic feedback from the stump-socket interface. Unfortunately, this is not an adequate substitute. Areas of the stump that directly interface with the socket are also prone to painful irritation, which further degrades haptic feedback. The lack of somatosensory feedback from prosthetic legs causes several problems for lower-limb amputees. Previous studies have established that the lack of adequate sensory feedback from prosthetic limbs contributes to poor balance and abnormal gait kinematics. These improper gait kinematics can, in turn, lead to the development of musculoskeletal diseases. Finally, the absence of sensory information has been shown to lead to steeper learning curves and increased rehabilitation times, which hampers amputees from recovering from the trauma. In this study, a novel haptic feedback system for lower-limb amputees was develped, and studies were performed to verify that information presented was sufficiently accurate and precise in comparison to a Bertec 4060-NC force plate. The prototype device consisted of a sensorized insole, a belt-mounted microcontroller, and a linear array of four vibrotactile motors worn on the thigh. The prototype worked by calculating the center of pressure in the anteroposterior plane, and applying a time-discrete vibrotactile stimulus based on the location of the center of pressure.

ContributorsKaplan, Gabriel Benjamin (Author) / Abbas, James (Thesis director) / McDaniel, Troy (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Noninvasive and Accurate Fine Motor Rehabilitation Through a Rhythm Based Game Using a Leap Motion Controller: Usability Evaluation of Leap Motion Game

Description

This paper presents a system to deliver automated, noninvasive, and effective fine motor rehabilitation through a rhythm-based game using a Leap Motion Controller. The system is a rhythm game where hand gestures are used as input and must match the rhythm and gestures shown on screen, thus allowing a physical…

This paper presents a system to deliver automated, noninvasive, and effective fine motor rehabilitation through a rhythm-based game using a Leap Motion Controller. The system is a rhythm game where hand gestures are used as input and must match the rhythm and gestures shown on screen, thus allowing a physical therapist to represent an exercise session involving the user's hand and finger joints as a series of patterns. Fine motor rehabilitation plays an important role in the recovery and improvement of the effects of stroke, Parkinson's disease, multiple sclerosis, and more. Individuals with these conditions possess a wide range of impairment in terms of fine motor movement. The serious game developed takes this into account and is designed to work with individuals with different levels of impairment. In a pilot study, under partnership with South West Advanced Neurological Rehabilitation (SWAN Rehab) in Phoenix, Arizona, we compared the performance of individuals with fine motor impairment to individuals without this impairment to determine whether a human-centered approach and adapting to an user's range of motion can allow an individual with fine motor impairment to perform at a similar level as a non-impaired user.

ContributorsShah, Vatsal Nimishkumar (Author) / McDaniel, Troy (Thesis director) / Tadayon, Ramin (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Why Pop? A System to Explain How Deep Learning Models Classify Music

Description

The impact of Artificial Intelligence (AI) has increased significantly in daily life. AI is taking big strides towards moving into areas of life that are critical such as healthcare but, also into areas such as entertainment and leisure. Deep neural networks have been pivotal in making all these advancements possible.…

The impact of Artificial Intelligence (AI) has increased significantly in daily life. AI is taking big strides towards moving into areas of life that are critical such as healthcare but, also into areas such as entertainment and leisure. Deep neural networks have been pivotal in making all these advancements possible. But, a well-known problem with deep neural networks is the lack of explanations for the choices it makes. To combat this, several methods have been tried in the field of research. One example of this is assigning rankings to the individual features and how influential they are in the decision-making process. In contrast a newer class of methods focuses on Concept Activation Vectors (CAV) which focus on extracting higher-level concepts from the trained model to capture more information as a mixture of several features and not just one. The goal of this thesis is to employ concepts in a novel domain: to explain how a deep learning model uses computer vision to classify music into different genres. Due to the advances in the field of computer vision with deep learning for classification tasks, it is rather a standard practice now to convert an audio clip into corresponding spectrograms and use those spectrograms as image inputs to the deep learning model. Thus, a pre-trained model can classify the spectrogram images (representing songs) into musical genres. The proposed explanation system called “Why Pop?” tries to answer certain questions about the classification process such as what parts of the spectrogram influence the model the most, what concepts were extracted and how are they different for different classes. These explanations aid the user gain insights into the model’s learnings, biases, and the decision-making process.

ContributorsSharma, Shubham (Author) / Bryan, Chris (Thesis advisor) / McDaniel, Troy (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2022

Enabling Deep Learning at Edge: From Efficient and Dynamic Inference to On-Device Learning

Description

In recent years, Artificial Intelligence (AI) (e.g., Deep Neural Networks (DNNs), Transformer) has shown great success in real-world applications due to its superior performance in various cognitive tasks. The impressive performance achieved by AI models normally accompanies the cost of enormous model size and high computational complexity, which significantly hampers…

In recent years, Artificial Intelligence (AI) (e.g., Deep Neural Networks (DNNs), Transformer) has shown great success in real-world applications due to its superior performance in various cognitive tasks. The impressive performance achieved by AI models normally accompanies the cost of enormous model size and high computational complexity, which significantly hampers their implementation on resource-limited Cyber-Physical Systems (CPS), Internet-of-Things (IoT), or Edge systems due to their tightly constrained energy, computing, size, and memory budget. Thus, the urgent demand for enhancing the \textbf{Efficiency} of DNN has drawn significant research interests across various communities. Motivated by the aforementioned concerns, this doctoral research has been mainly focusing on Enabling Deep Learning at Edge: From Efficient and Dynamic Inference to On-Device Learning. Specifically, from the inference perspective, this dissertation begins by investigating a hardware-friendly model compression method that effectively reduces the size of AI model while simultaneously achieving improved speed on edge devices. Additionally, due to the fact that diverse resource constraints of different edge devices, this dissertation further explores dynamic inference, which allows for real-time tuning of inference model size, computation, and latency to accommodate the limitations of each edge device. Regarding efficient on-device learning, this dissertation starts by analyzing memory usage during transfer learning training. Based on this analysis, a novel framework called "Reprogramming Network'' (Rep-Net) is introduced that offers a fresh perspective on the on-device transfer learning problem. The Rep-Net enables on-device transferlearning by directly learning to reprogram the intermediate features of a pre-trained model. Lastly, this dissertation studies an efficient continual learning algorithm that facilitates learning multiple tasks without the risk of forgetting previously acquired knowledge. In practice, through the exploration of task correlation, an interesting phenomenon is observed that the intermediate features are highly correlated between tasks with the self-supervised pre-trained model. Building upon this observation, a novel approach called progressive task-correlated layer freezing is proposed to gradually freeze a subset of layers with the highest correlation ratios for each task leading to training efficiency.

ContributorsYang, Li (Author) / Fan, Deliang (Thesis advisor) / Seo, Jae-Sun (Committee member) / Zhang, Junshan (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2023

Low-Intensity Blood Flow Restriction Training as a Preoperative Rehabilitative Modality to Improve Postoperative Outcomes for Anterior Cruciate Ligament Reconstruction

Description

One of the long-standing issues that has arisen in the sports medicine field is identifying the ideal methodology to optimize recovery following anterior cruciate ligament reconstruction (ACLR). The perioperative period for ACLR is notoriously heterogeneous in nature as it consists of many variables that can impact surgical outcomes. While there…

One of the long-standing issues that has arisen in the sports medicine field is identifying the ideal methodology to optimize recovery following anterior cruciate ligament reconstruction (ACLR). The perioperative period for ACLR is notoriously heterogeneous in nature as it consists of many variables that can impact surgical outcomes. While there has been extensive literature published regarding the efficacy of various recovery and rehabilitation topics, it has been widely acknowledged that certain modalities within the field of ACLR rehabilitation need further high-quality evidence to support their use in clinical practice, such as blood flow restriction (BFR) training. BFR training involves the application of a tourniquet-like cuff to the proximal aspect of a limb prior to exercise; the cuff is inflated so that it occludes venous flow but allows arterial inflow. BFR is usually combined with low-intensity (LI) resistance training, with resistance as low as 20% of one-repetition maximum (1RM). LI-BFR has been used as an emerging clinical modality to combat postoperative atrophy of the quadriceps muscles for those who have undergone ACLR, as these individuals cannot safely tolerate high muscular tension exercise after surgery. Impairments of the quadriceps are the major cause of poor functional status of patients following an otherwise successful ACLR procedure; however, these impairments can be mitigated with preoperative rehabilitation done before surgery. It was hypothesized that the use of a preoperative LI-BFR training protocol could help improve postoperative outcomes following ACLR; primarily, strength and hypertrophy of the quadriceps. When compared with a SHAM control group, subjects who were randomized to a BFR intervention group made greater preoperative strength gains in the quadriceps and recovered quadriceps mass at an earlier timepoint than that of the SHAM group aftersurgery; however, the gains made in strength were not able to be maintained in the 8-week postoperative period. While these results do not support the use of LI-BFR from the short-term perspective after ACLR, follow-up data will be used to investigate trends in re-injury and return to sport rates to evaluate the efficacy of the use of LI-BFR from a long-term perspective.

ContributorsGlattke, Kaycee Elizabeth (Author) / Lockhart, Thurmon (Thesis advisor) / McDaniel, Troy (Committee member) / Banks, Scott (Committee member) / Peterson, Daniel (Committee member) / Lee, Hyunglae (Committee member) / Arizona State University (Publisher)

Created2022

Classification of Fabric Based Soft Actuators and Feedback Controller for At-home Hand Rehabilitation

Description

With an aging population, the number of later in life health related incidents like stroke stand to become more prevalent. Unfortunately, the majority those who are most at risk for debilitating heath episodes are either uninsured or under insured when it comes to long term physical/occupational therapy. As insurance companies…

With an aging population, the number of later in life health related incidents like stroke stand to become more prevalent. Unfortunately, the majority those who are most at risk for debilitating heath episodes are either uninsured or under insured when it comes to long term physical/occupational therapy. As insurance companies lower coverage and/or raise prices of plans with sufficient coverage, it can be expected that the proportion of uninsured/under insured to fully insured people will rise. To address this, lower cost alternative methods of treatment must be developed so people can obtain the treated required for a sufficient recovery. The presented robotic glove employs low cost fabric soft pneumatic actuators which use a closed loop feedback controller based on readings from embedded soft sensors. This provides the device with proprioceptive abilities for the dynamic control of each independent actuator. Force and fatigue tests were performed to determine the viability of the actuator design. A Box and Block test along with a motion capture study was completed to study the performance of the device. This paper presents the design and classification of a soft robotic glove with a feedback controller as a at-home stroke rehabilitation device.

ContributorsAxman, Reed C (Author) / Zhang, Wenlong (Thesis advisor) / Santello, Marco (Committee member) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)

Created2022

Accessible Retail Shopping For The Visually Impaired Using Deep Learning

Description

Over the past decade, advancements in neural networks have been instrumental in achieving remarkable breakthroughs in the field of computer vision. One of the applications is in creating assistive technology to improve the lives of visually impaired people by making the world around them more accessible. A lot of research…

Over the past decade, advancements in neural networks have been instrumental in achieving remarkable breakthroughs in the field of computer vision. One of the applications is in creating assistive technology to improve the lives of visually impaired people by making the world around them more accessible. A lot of research in convolutional neural networks has led to human-level performance in different vision tasks including image classification, object detection, instance segmentation, semantic segmentation, panoptic segmentation and scene text recognition. All the before mentioned tasks, individually or in combination, have been used to create assistive technologies to improve accessibility for the blind.

This dissertation outlines various applications to improve accessibility and independence for visually impaired people during shopping by helping them identify products in retail stores. The dissertation includes the following contributions; (i) A dataset containing images of breakfast-cereal products and a classifier using a deep neural (ResNet) network; (ii) A dataset for training a text detection and scene-text recognition model; (iii) A model for text detection and scene-text recognition to identify product images using a user-controlled camera; (iv) A dataset of twenty thousand products with product information and related images that can be used to train and test a system designed to identify products.

ContributorsPatel, Akshar (Author) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Thesis advisor) / McDaniel, Troy (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by