Search Content

Characterization and Optimization of ReRAM-based Analog Crossbar Arrays for Neuromorphic Computing

Description

Machine learning advancements have led to increasingly complex algorithms, resulting in significant energy consumption due to heightened memory-transfer requirements and inefficient vector matrix multiplication (VMM). To address this issue, many have proposed ReRAM analog in-memory computing (AIMC) as a solution. AIMC enhances the time-energy efficiency of VMM operations beyond conventional…

Machine learning advancements have led to increasingly complex algorithms, resulting in significant energy consumption due to heightened memory-transfer requirements and inefficient vector matrix multiplication (VMM). To address this issue, many have proposed ReRAM analog in-memory computing (AIMC) as a solution. AIMC enhances the time-energy efficiency of VMM operations beyond conventional VMM digital hardware, such as a tensor processing unit (TPU), while substantially reducing memory-transfer demands through in-memory computing. As AIMC gains prominence as a solution, it becomes crucial to optimize ReRAM and analog crossbar architecture characteristics. This thesis introduces an application-specific integrated circuit (ASIC) tailored forcharacterizing ReRAM within a crossbar array architecture and discusses the interfacing techniques employed. It discusses ReRAM forming and programming techniques and showcases chip’s ability to utilize the write-verify programming method to write image pixels on a conductance heat map. Additionally, this thesis assesses the ASIC’s capability to characterize different aspects of ReRAM, including drift and noise characteristics. The research employs the chip to extract ReRAM data and models it within a crossbar array simulator, enabling its application in the classification of the CIFAR-10 dataset.

ContributorsShort, Jesse (Author) / Marinella, Matthew (Thesis advisor) / Barnaby, Hugh (Committee member) / Sanchez Esqueda, Ivan (Committee member) / Arizona State University (Publisher)

Created2023

Conductive Bridge Random Access Memory (CBRAM) as an Analog Synapse for Neuromorphic Computing

Description

The Deep Neural Network (DNN) is one type of a neuromorphic computing approach that has gained substantial interest today. To achieve continuous improvement in accuracy, the depth, and the size of the deep neural network needs to significantly increase. As the scale of the neural network increases, it poses a…

The Deep Neural Network (DNN) is one type of a neuromorphic computing approach that has gained substantial interest today. To achieve continuous improvement in accuracy, the depth, and the size of the deep neural network needs to significantly increase. As the scale of the neural network increases, it poses a severe challenge to its hardware implementation with conventional Computer Processing Unit (CPU) and Graphic Processing Unit (GPU) from the perspective of power, computation, and memory. To address this challenge, domain specific specialized digital neural network accelerators based on Field Programmable Gate Array (FPGAs) and Application Specific Integrated Circuits (ASICs) have been developed. However, limitations still exist in terms of on-chip memory capacity, and off-chip memory access. As an alternative, Resistive Random Access Memories (RRAMs), have been proposed to store weights on chip with higher density and enabling fast analog computation with low power consumption. Conductive Bridge Random Access Memories (CBRAMs) is a subset of RRAMs, whose conductance states is defined by the existence and modulation of a conductive metal filament. Ag-Chalcogenide based Conductive Bridge RAM (CBRAM) devices have demonstrated multiple resistive states making them potential candidates for use as analog synapses in neuromorphic hardware. In this work the use of Ag-Ge30Se70 device as an analog synaptic device has been explored. Ag-Ge30Se70 CBRAM crossbar array was fabricated. The fabricated crossbar devices were subjected to different pulsing schemes and conductance linearity response was analyzed. An improved linear response of the devices from a non-linearity factor of 6.65 to 1 for potentiation and -2.25 to -0.95 for depression with non-identical pulse application is observed. The effect of improved linearity was quantified by simulating the devices in an artificial neural network. Simulations for area, latency, and power consumption of the CBRAM device in a neural accelerator was conducted. Further, the changes caused by Total Ionizing Dose (TID) in the conductance of the analog response of Ag-Ge30Se70 Conductive Bridge Random Access Memory (CBRAM)-based synapses are studied. The effect of irradiation was further analyzed by simulating the devices in an artificial neural network. Material characterization was performed to understand the change in conductance observed due to TID.

ContributorsApsangi, Priyanka (Author) / Barnaby, Hugh (Thesis advisor) / Kozicki, Michael (Committee member) / Sanchez Esqueda, Ivan (Committee member) / Marinella, Matthew (Committee member) / Arizona State University (Publisher)

Created2022

Radiation Effects on In-Memory Computing Architectures

Description

Recently, the implementation of neuromorphic accelerator hardware has gradually changed from traditional Von Neumann architectures to non-Von Neumann architectures due to the “memory wall” and “power wall”. Near-memory computing (NMC) and In- memory computing (IMC) are two common types of non-Von Neumann approaches. NMC can help reduce data movements, yet…

Recently, the implementation of neuromorphic accelerator hardware has gradually changed from traditional Von Neumann architectures to non-Von Neumann architectures due to the “memory wall” and “power wall”. Near-memory computing (NMC) and In- memory computing (IMC) are two common types of non-Von Neumann approaches. NMC can help reduce data movements, yet it cannot fully address the challenge of improving computational efficiency as the neural network size grows. IMC has been proposed as a superior alternative. This architecture performs computation inside the memory array using stackable synaptic devices to improve the latency and the energy efficiency of neural network accelerators. Both volatile and non-volatile computational memory devices can achieve IMC. Fully complementary metal-oxide semiconductor (CMOS) in-memory computing cells can be realized by adding additional transistors in standard static random access memory (SRAM) bit-cell. The SRAM-based designs investigated in this dissertation perform bit-wise logical operation to obtain XNOR-and-accumulate computation (XAC) for deep neural networks (DNNs). Hybrid in-memory computing architectures combine CMOS with embedded non-volatile memory (eNVM). Resistive random access memory (RRAM) is one class of eNVM ideally suited for hybrid IMC. In a neural network, RRAM with programmable multi-level resistance/conductance states can naturally emulate weight transitions in the synaptic elements of neural networks. In this dissertation, the operation and effects of ionizing radiation effects on both fully CMOS and hybrid IMCs are investigated. The fully CMOS architectures preform SRAM-based XAC computations. The hybrid architectures use multi-state RRAM synapse with CMOS neurons to perform multiply-and-accumulate computation (MAC). In the SRAM XAC array, an 8×8 XNOR IMC array is modeled with flipped-well enhanced-gate super low threshold voltage (EGSLVT) metal-oxide semiconductor field-effect transistors (MOSFETs) from the GlobalFoundries 22nm fully depleted silicon on insulator (FDSOI) process. The impact of total ionizing dose (TID) on the XAC synaptic array is analyzed by using radiation-aware models to mimic TID-induced voltage shifts in MOSFETs. In multi- state RRAM MAC array, 4-state conductance has been programmed in hafnium-oxide (HfOx) RRAM 1-transistor-1-resistor (1T1R) array. The impact of total ionizing dose on the multi-state behavior of HfOx RRAM is evaluated by irradiating a 64kb 1T1R array with 90nm CMOS peripheral circuitry under Co-60 γ-ray irradiation.

ContributorsHan, Xu (Author) / Barnaby, Hugh (Thesis advisor) / Kozicki, Michael (Committee member) / Marinella, Matthew (Committee member) / Esqueda, Ivan (Committee member) / Arizona State University (Publisher)

Created2022

Modeling the Effects of Total Ionizing Dose for Bipolar Commercial Off the Shelf Circuits

Description

Bipolar commercial-off-the-shelf (COTS) circuits are increasingly used in spacemissions due to the low cost per part. In space environments these devices are exposed to ionizing radiation that degrades their performance. Testing to evaluate the performance of these devices is a costly and lengthy process. As such methods that can help predict a COTS…

Bipolar commercial-off-the-shelf (COTS) circuits are increasingly used in spacemissions due to the low cost per part. In space environments these devices are exposed to ionizing radiation that degrades their performance. Testing to evaluate the performance of these devices is a costly and lengthy process. As such methods that can help predict a COTS part’s performance help alleviate these downsides. A modeling software for predicting total ionizing dose (TID), enhanced low dose rate sensitivity (ELDRS), and hydrogen gas on bipolar parts is introduced and expanded upon. The model is then developed in several key ways that expand it’s features and usability in this field. A physics based methodology of simulating interface traps (NIT) to expand the previously experimental only database is detailed. This new methodology is also compared to experimental data and used to establish a link between hydrogen concentration in the oxide and packaged hydrogen gas. Links are established between Technology Computer Aided Design (TCAD), circuit simulation, and experimental data. These links are then used to establish a better foundation for the model. New methodologies are added to the modeling software so that it is possible to simulate transient based characteristics like slew rate.

ContributorsRoark, Samuel (Author) / Barnaby, Hugh (Thesis advisor) / Sanchez Esqueda, Ivan (Committee member) / Bakkaloglu, Bertan (Committee member) / Arizona State University (Publisher)

Created2022

A Scalable FPGA-based Multi-channel Data Acquisition System for Parallel Plate Ionization Chamber

Description

Proton beam therapy has been proven to be effective for cancer treatment. Protons allow for complete energy deposition to occur inside patients, rendering this a superior treatment compared to other types of radiotherapy based on photons or electrons. This same characteristic makes quality assurance critical driving the need for detectors…

Proton beam therapy has been proven to be effective for cancer treatment. Protons allow for complete energy deposition to occur inside patients, rendering this a superior treatment compared to other types of radiotherapy based on photons or electrons. This same characteristic makes quality assurance critical driving the need for detectors capable of direct beam positioning and fluence measurement. This work showcases a flexible and scalable data acquisition system for a multi-channel and segmented readout parallel plate ionization chamber instrument for proton beam fluence and positioning detection. Utilizing readily available, modern, off-the-shelf hardware components, including an FPGA with an embedded CPU in the same package, a data acquisition system for the detector was designed. The undemanding detector signal bandwidth allows the absence of ASICs and their associated costs and lead times in the system. The data acquisition system is showcased experimentally for a 96-readout channel detector demonstrating sub millisecond beam characteristics and beam reconstruction. The system demonstrated scalability up to 1064-readout channels, the limiting factor being FPGA I/O availability as well as amplification and sampling power consumption.

ContributorsAcuna Briceno, Rafael Andres (Author) / Barnaby, Hugh (Thesis advisor) / Brunhaver, John (Committee member) / Blyth, David (Committee member) / Arizona State University (Publisher)

Created2021

Hardware-Software Co-design for Light Transport Acquisition and Adaptive Non-Line-of-Sight Imaging

Description

In the rapidly evolving field of computer vision, propelled by advancements in deeplearning, the integration of hardware-software co-design has become crucial to overcome the limitations of traditional imaging systems. This dissertation explores the integration of hardware-software co-design in computational imaging, particularly in light transport acquisition and Non-Line-of-Sight (NLOS) imaging. By leveraging projector-camera systems and…

In the rapidly evolving field of computer vision, propelled by advancements in deeplearning, the integration of hardware-software co-design has become crucial to overcome the limitations of traditional imaging systems. This dissertation explores the integration of hardware-software co-design in computational imaging, particularly in light transport acquisition and Non-Line-of-Sight (NLOS) imaging. By leveraging projector-camera systems and computational techniques, this thesis address critical challenges in imaging complex environments, such as adverse weather conditions, low-light scenarios, and the imaging of reflective or transparent objects. The first contribution in this thesis is the theory, design, and implementation of a slope disparity gating system, which is a vertically aligned configuration of a synchronized raster scanning projector and rolling-shutter camera, facilitating selective imaging through disparity-based triangulation. This system introduces a novel, hardware-oriented approach to selective imaging, circumventing the limitations of post-capture processing. The second contribution of this thesis is the realization of two innovative approaches for spotlight optimization to improve localization and tracking for NLOS imaging. The first approach utilizes radiosity-based optimization to improve 3D localization and object identification for small-scale laboratory settings. The second approach introduces a learningbased illumination network along with a differentiable renderer and NLOS estimation network to optimize human 2D localization and activity recognition. This approach is validated on a large, room-scale scene with complex line-of-sight geometries and occluders. The third contribution of this thesis is an attention-based neural network for passive NLOS settings where there is no controllable illumination. The thesis demonstrates realtime, dynamic NLOS human tracking where the camera is moving on a mobile robotic platform. In addition, this thesis contains an appendix featuring temporally consistent relighting for portrait videos with applications in computer graphics and vision.

ContributorsChandran, Sreenithy (Author) / Jayasuriya, Suren (Thesis advisor) / Turaga, Pavan (Committee member) / Dasarathy, Gautam (Committee member) / Kubo, Hiroyuki (Committee member) / Arizona State University (Publisher)

Created2024

Development of Signal Analysis Synthesis Methods : Quantum Fourier Transforms and Quantum Linear Prediction Algorithms

Description

Quantum computing has the potential to revolutionize the signal-processing field by providing more efficient methods for analyzing signals. This thesis explores the application of quantum computing in signal analysis synthesis for compression applications. More specifically, the study focuses on two key approaches: quantum Fourier transform (QFT) and quantum linear prediction…

Quantum computing has the potential to revolutionize the signal-processing field by providing more efficient methods for analyzing signals. This thesis explores the application of quantum computing in signal analysis synthesis for compression applications. More specifically, the study focuses on two key approaches: quantum Fourier transform (QFT) and quantum linear prediction (QLP). The research is motivated by the potential advantages offered by quantum computing in massive signal processing tasks and presents novel quantum circuit designs for QFT, quantum autocorrelation, and QLP, enabling signal analysis synthesis using quantum algorithms. The two approaches are explained as follows. The Quantum Fourier transform (QFT) demonstrates the potential for improved speed in quantum computing compared to classical methods. This thesis focuses on quantum encoding of signals and designing quantum algorithms for signal analysis synthesis, and signal compression using QFTs. Comparative studies are conducted to evaluate quantum computations for Fourier transform applications, considering Signal-to-Noise-Ratio results. The effects of qubit precision and quantum noise are also analyzed. The QFT algorithm is also developed in the J-DSP simulation environment, providing hands-on laboratory experiences for signal-processing students. User-friendly simulation programs on QFT-based signal analysis synthesis using peak picking, and perceptual selection using psychoacoustics in the J-DSP are developed. Further, this research is extended to analyze the autocorrelation of the signal using QFTs and develop a quantum linear prediction (QLP) algorithm for speech processing applications. QFTs and IQFTs are used to compute the quantum autocorrelation of the signal, and the HHL algorithm is modified and used to compute the solutions of the linear equations using quantum computing. The performance of the QLP algorithm is evaluated for system identification, spectral estimation, and speech analysis synthesis, and comparisons are performed for QLP and CLP results. The results demonstrate the following: effective quantum circuits for accurate QFT-based speech analysis synthesis, evaluation of performance with quantum noise, design of accurate quantum autocorrelation, and development of a modified HHL algorithm for efficient QLP. Overall, this thesis contributes to the research on quantum computing for signal processing applications and provides a foundation for further exploration of quantum algorithms for signal analysis synthesis.

ContributorsSharma, Aradhita (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2023

Sensing for Wireless Communication: From Theory to Reality

Description

Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however, have several challenges that need to be addressed to realize…

Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however, have several challenges that need to be addressed to realize their gains in practice. First, they need to deploy large antenna arrays and use narrow beams to guarantee sufficient receive power. Adjusting the narrow beams of the large antenna arrays incurs massive beam training overhead. Second, the sensitivity to blockages is a key challenge for mmWave and THz networks. Since these networks mainly rely on line-of-sight (LOS) links, sudden link blockages highly threaten the reliability of the networks. Further, when the LOS link is blocked, the network typically needs to hand off the user to another LOS basestation, which may incur critical time latency, especially if a search over a large codebook of narrow beams is needed. A promising way to tackle both these challenges lies in leveraging additional side information such as visual, LiDAR, radar, and position data. These sensors provide rich information about the wireless environment, which can be utilized for fast beam and blockage prediction. This dissertation presents a machine-learning framework for sensing-aided beam and blockage prediction. In particular, for beam prediction, this work proposes to utilize visual and positional data to predict the optimal beam indices. For the first time, this work investigates the sensing-aided beam prediction task in a real-world vehicle-to-infrastructure and drone communication scenario. Similarly, for blockage prediction, this dissertation proposes a multi-modal wireless communication solution that utilizes bimodal machine learning to perform proactive blockage prediction and user hand-off. Evaluations on both real-world and synthetic datasets illustrate the promising performance of the proposed solutions and highlight their potential for next-generation communication and sensing systems.

ContributorsCharan, Gouranga (Author) / Alkhateeb, Ahmed (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Turaga, Pavan (Committee member) / Michelusi, Nicolò (Committee member) / Arizona State University (Publisher)

Created2024

Coarse-Fine VCO Design with a New Supply Noise Suppression Method

Description

VCO as a ubiquitous circuit in many systems is highly demanding for the phase noises. Lowering the noise migrated from the power supply has been the trending topics for many years. Considering the Ring Oscillator(RO) based VCO is more sensitive to the supply noise, it is more significant to find…

VCO as a ubiquitous circuit in many systems is highly demanding for the phase noises. Lowering the noise migrated from the power supply has been the trending topics for many years. Considering the Ring Oscillator(RO) based VCO is more sensitive to the supply noise, it is more significant to find out a useful technique to reduce the supply noise. Among the conventional supply noise reduction techniques such as filtering, channel length adjusting for the transistors, and the current noise mutual canceling, the new feature of the 28nm UTBB-FD-SOI process launched by the ST semiconductor offered a new method to reduce the noise, which is realized by allowing the circuit designer to dynamically control the threshold voltage. In this thesis, a new structure of the linear coarse-fine VCO with 1V supply voltage is designed for the ring typed VCO. The structure is also designed to be flexible to tune the frequency coverage by the fine and coarse tunable on-board resistors. The thesis has given the model of the phase noise reduction method. The model has also been proved to be meaningful with the newly designed VCO circuit. For instances, given 1μV/√Hz white noise coupled on the supply, the 3GHz VCO can have a more than 7dBc/Hz phase noise lowering at the 10MHz frequency offset.

ContributorsTang, Miao (Author) / Barnaby, Hugh (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Mikkola, Esko (Committee member) / Arizona State University (Publisher)

Created2018

Hardware Acceleration of Deep Convolutional Neural Networks on FPGA

Description

The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency…

The rapid improvement in computation capability has made deep convolutional neural networks (CNNs) a great success in recent years on many computer vision tasks with significantly improved accuracy. During the inference phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the deep learning algorithm inference. However, deploying CNNs on portable and embedded systems is still challenging due to large data volume, intensive computation, varying algorithm structures, and frequent memory accesses. This dissertation proposes a complete design methodology and framework to accelerate the inference process of various CNN algorithms on FPGA hardware with high performance, efficiency and flexibility.

As convolution contributes most operations in CNNs, the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate (MAC) operations with four levels of loops. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. memory access) of the CNN accelerator based on multiple design variables. An efficient dataflow and hardware architecture of CNN acceleration are proposed to minimize the data communication while maximizing the resource utilization to achieve high performance.

Although great performance and efficiency can be achieved by customizing the FPGA hardware for each CNN model, significant efforts and expertise are required leading to long development time, which makes it difficult to catch up with the rapid development of CNN algorithms. In this work, we present an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g. GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g. NiN, VGG, GoogLeNet and ResNet, on two different standalone FPGAs achieving state-of-the art performance.

Based on the optimized acceleration strategy, there are still a lot of design options, e.g. the degree and dimension of computation parallelism, the size of on-chip buffers, and the external memory bandwidth, which impact the utilization of computation resources and data communication efficiency, and finally affect the performance and energy consumption of the accelerator. The large design space of the accelerator makes it impractical to explore the optimal design choice during the real implementation phase. Therefore, a performance model is proposed in this work to quantitatively estimate the accelerator performance and resource utilization. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase.

ContributorsMa, Yufei (Author) / Vrudhula, Sarma (Thesis advisor) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Barnaby, Hugh (Committee member) / Arizona State University (Publisher)

Created2018

Filtering by