RRAM Based In-Memory Computing for Area-/Energy-Efficient Deep Learning

He, Wangxin

Deep neural networks (DNNs) have been successfully developed in many applications including computer vision, speech recognition, and others. As the complexity of DNN tasks increases, the number of weights or parameters in DNNs surges as well, leading to consistent demands…

Deep neural networks (DNNs) have been successfully developed in many applications including computer vision, speech recognition, and others. As the complexity of DNN tasks increases, the number of weights or parameters in DNNs surges as well, leading to consistent demands for denser memories than SRAMs. Conventional DNN accelerator systems have used DRAM to store a large number of DNN weights, but DRAM requires cumbersome refresh operations and off-chip memory access consumes very high energy consumption. Instead of using off-chip memory, several recent accelerators employed embedded non-volatile memory (NVM) such as resistive RAM (RRAM) to store a large amount of weight fully on-chip and reduce the energy consumption for overall memory access. Non-volatile resistive devices such as RRAM can naturally support in-memory computing (IMC) operations with multiple rows turned on, where the weighted sum current between the wordline voltage (representing DNN activations) and RRAM conductance (representing DNN weights) represents the dot-product result. This dissertation first presents a circuit-/device-level optimization to improve the energy and density of RRAM-based in-memory computing architectures.experimental results are reported based on prototype chip design of 128$\times$64 RRAM arrays and CMOS peripheral circuits, where RRAM devices are monolithically integrated in a commercial 90nm CMOS technology. Next, this dissertation presents an IMC prototype with 2-bit-per-cell RRAM devices for area-/energy-efficient DNN inference. Optimizations on four-level conductance distribution and peripheral circuits with an input-splitting scheme have been performed, enabling high DNN accuracy and low area/energy consumption. Furthermore, this dissertation presents an investigation on the relaxation effects on multi-level resistive random access memory-based in-memory computing for deep neural network inference. Plus, this dissertation works on the Progressive-wRite In-memory program-VErify (PRIVE) scheme, which this thesis verify with an RRAM testchip for IMC-based hardware acceleration for DNNs. This dissertation optimizes the progressive write operations on different bit positions of RRAM weights to enable error compensation and reduce programming latency/energy while achieving high DNN accuracy. For the ongoing project, this dissertation includes the progress of the RRAM-based hybrid in-memory computing process and the progress on the ferroelectric capacitive devices for next-generation AI hardware.

Copyright Statement