Matching Items (3)
171380-Thumbnail Image.png
Description
Deep neural networks (DNNs), as a main-stream algorithm for various AI tasks, achieve higher accuracy at the cost of increased computational complexity and model size, posing great challenges to hardware platforms. This dissertation first tackles the design challenges of resistive random-access-memory (RRAM) based in-memory computing (IMC) architectures. A new metric,

Deep neural networks (DNNs), as a main-stream algorithm for various AI tasks, achieve higher accuracy at the cost of increased computational complexity and model size, posing great challenges to hardware platforms. This dissertation first tackles the design challenges of resistive random-access-memory (RRAM) based in-memory computing (IMC) architectures. A new metric, model stability from the loss landscape, is proposed to help shed light on accuracy under variations and model compression and guide a novel variation-aware training (VAT) solution. The proposed method effectively improves post-mapping accuracy of multiple datasets. Next, a hybrid RRAM/SRAM IMC DNN inference accelerator is developed, that integrates an RRAM-based IMC macro, a reconfigurable SRAM-based multiply-accumulate (MAC) macro, and a programmable shifter. The hybrid IMC accelerator fully recovers the inference accuracy post the mapping. Furthermore, this dissertation researches on architectural optimizations for high IMC utilization, low on-chip communication cost, and low energy-delay product (EDP), including on-chip interconnect design, PE array utilization, and tile-to-router mapping and scheduling. The optimal choice of on-chip interconnect results in up to 6x improvement in energy-delay-area product for RRAM IMC architectures. Furthermore, the PE and NoC optimizations show up to 62% improvement in PE utilization, 78% reduction in area, and 78% lower energy-area product for a wide range of modern DNNs. Finally, this dissertation proposes a novel chiplet-based IMC benchmarking simulator, SIAM, and a heterogeneous chiplet IMC architecture to address the limitations of a monolithic DNN accelerator. SIAM utilizes model-based and cycle-accurate simulation to provide a scalable and flexible architecture. SIAM is calibrated against a published silicon result, SIMBA, from Nvidia. The heterogeneous architecture utilizes a custom mapping with a bank of big and little chiplets, and a hybrid network-on-package (NoP) to optimize the utilization, interconnect bandwidth, and energy efficiency. The proposed big-little chiplet-based RRAM IMC architecture significantly improves energy efficiency at lower area, compared to conventional GPUs. In summary, this dissertation comprehensively investigates novel methods that encompass device, circuits, architecture, packaging, and algorithm to design scalable high-performance and energy-efficient IMC architectures.
ContributorsKrishnan, Gokul (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Chakrabarti, Chaitali (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)
Created2022
158727-Thumbnail Image.png
Description
In the era of artificial intelligent (AI), deep neural networks (DNN) have achieved accuracy on par with humans on a variety of recognition tasks. However, the high computation and storage requirement of DNN training and inference have posed challenges to deploying or locally training the DNNs on mobile and wearable

In the era of artificial intelligent (AI), deep neural networks (DNN) have achieved accuracy on par with humans on a variety of recognition tasks. However, the high computation and storage requirement of DNN training and inference have posed challenges to deploying or locally training the DNNs on mobile and wearable devices. Energy-efficient hardware innovation from circuit to architecture level is required.In this dissertation, a smart electrocardiogram (ECG) processor is first presented for ECG-based authentication as well as cardiac monitoring. The 65nm testchip consumes 1.06 μW at 0.55 V for real-time ECG authentication achieving equal error rate of 1.7% for authentication on an in-house 645-subject database. Next, a couple of SRAM-based in-memory computing (IMC) accelerators for deep learning algorithms are presented. Two single-array macros titled XNOR-SRAM and C3SRAM are based on resistive and capacitive networks for XNOR-ACcumulation (XAC) operations, respectively. XNOR-SRAM and C3SRAM macros in 65nm CMOS achieve energy efficiency of 403 TOPS/W and 672 TOPS/W, respectively. Built on top of these two single-array macro designs, two multi-array architectures are presented. The XNOR-SRAM based architecture titled “Vesti” is designed to support configurable multibit activations and large-scale DNNs seamlessly. Vesti employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the write latency of IMC SRAMs. The Vesti accelerator in 65nm CMOS achieves energy consumption of <20 nJ for MNIST classification and <40μJ for CIFAR-10 classification at 1.0 V supply. More recently, a programmable IMC accelerator (PIMCA) integrating 108 C3SRAM macros of a total size of 3.4 Mb is proposed. The28nm prototype chip achieves system-level energy efficiency of 437/62 TOPS/W at 40 MHz, 1 V supply for DNNs with 1b/2b precision.
In addition to the IMC works, this dissertation also presents a convolutional neural network (CNN) learning processor, which accelerates the stochastic gradient descent (SGD) with momentum based training algorithm in 16-bit fixed-point precision. The65nm CNN learning processor achieves peak energy efficiency of 2.6 TOPS/W for16-bit fixed-point operations, consuming 10.45 mW at 0.55 V. In summary, in this dissertation, several hardware innovations from circuit to architecture level are presented, exploiting the reduced algorithm complexity with pruning and low-precision quantization techniques. In particular, macro-level and system-level SRAM based IMC works presented in this dissertation show that SRAM based IMC is one of the promising solutions for energy-efficient intelligent systems.
ContributorsYin, Shihui (Author) / Seo, Jae-sun Seo J. S. (Thesis advisor) / Cao, Yu Y. C. (Committee member) / Vrudhula, Sarma S. V. (Committee member) / Chakrabarti, Chaitali C. C. (Committee member) / Arizona State University (Publisher)
Created2020
190896-Thumbnail Image.png
Description
Deep neural networks (DNNs) have been successfully developed in many applications including computer vision, speech recognition, and others. As the complexity of DNN tasks increases, the number of weights or parameters in DNNs surges as well, leading to consistent demands for denser memories than SRAMs. Conventional DNN accelerator systems have

Deep neural networks (DNNs) have been successfully developed in many applications including computer vision, speech recognition, and others. As the complexity of DNN tasks increases, the number of weights or parameters in DNNs surges as well, leading to consistent demands for denser memories than SRAMs. Conventional DNN accelerator systems have used DRAM to store a large number of DNN weights, but DRAM requires cumbersome refresh operations and off-chip memory access consumes very high energy consumption. Instead of using off-chip memory, several recent accelerators employed embedded non-volatile memory (NVM) such as resistive RAM (RRAM) to store a large amount of weight fully on-chip and reduce the energy consumption for overall memory access. Non-volatile resistive devices such as RRAM can naturally support in-memory computing (IMC) operations with multiple rows turned on, where the weighted sum current between the wordline voltage (representing DNN activations) and RRAM conductance (representing DNN weights) represents the dot-product result. This dissertation first presents a circuit-/device-level optimization to improve the energy and density of RRAM-based in-memory computing architectures.experimental results are reported based on prototype chip design of 128$\times$64 RRAM arrays and CMOS peripheral circuits, where RRAM devices are monolithically integrated in a commercial 90nm CMOS technology. Next, this dissertation presents an IMC prototype with 2-bit-per-cell RRAM devices for area-/energy-efficient DNN inference. Optimizations on four-level conductance distribution and peripheral circuits with an input-splitting scheme have been performed, enabling high DNN accuracy and low area/energy consumption. Furthermore, this dissertation presents an investigation on the relaxation effects on multi-level resistive random access memory-based in-memory computing for deep neural network inference. Plus, this dissertation works on the Progressive-wRite In-memory program-VErify (PRIVE) scheme, which this thesis verify with an RRAM testchip for IMC-based hardware acceleration for DNNs. This dissertation optimizes the progressive write operations on different bit positions of RRAM weights to enable error compensation and reduce programming latency/energy while achieving high DNN accuracy. For the ongoing project, this dissertation includes the progress of the RRAM-based hybrid in-memory computing process and the progress on the ferroelectric capacitive devices for next-generation AI hardware.
ContributorsHe, Wangxin (Author) / Seo, Jae-sun J.S. (Thesis advisor) / Cao, Yu Y.C. (Committee member) / Fan, Deliang D.F. (Committee member) / Marinella, Matthew M.M. (Committee member) / Arizona State University (Publisher)
Created2023