Search Content

Reliable arithmetic circuit design inspired by SNP systems

Description

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a…

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a certain kind of membrane systems, is inspired by the way the neurons in brain interact using electrical spikes. Compared to the traditional Boolean logic, SNP systems not only perform similar functions but also provide a more promising solution for reliable computation. Two basic neuron types, Low Pass (LP) neurons and High Pass (HP) neurons, are introduced. These two basic types of neurons are capable to build an arbitrary SNP neuron. This leads to the conclusion that these two basic neuron types are Turing complete since SNP systems has been proved Turing complete. These two basic types of neurons are further used as the elements to construct general-purpose arithmetic circuits, such as adder, subtractor and comparator. In this thesis, erroneous behaviors of neurons are discussed. Transmission error (spike loss) is proved to be equivalent to threshold error, which makes threshold error discussion more universal. To improve the reliability, a new structure called motif is proposed. Compared to Triple Modular Redundancy improvement, motif design presents its efficiency and effectiveness in both single neuron and arithmetic circuit analysis. DRAM-based CMOS circuits are used to implement the two basic types of neurons. Functionality of basic type neurons is proved using the SPICE simulations. The motif improved adder and the comparator, as compared to conventional Boolean logic design, are much more reliable with lower leakage, and smaller silicon area. This leads to the conclusion that SNP system could provide a more promising solution for reliable computation than the conventional Boolean logic.

ContributorsAn, Pei (Author) / Cao, Yu (Thesis advisor) / Barnaby, Hugh (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2013

Multilevel resistance programming in conductive bridge resistive memory

Description

This work focuses on the existence of multiple resistance states in a type of emerging non-volatile resistive memory device known commonly as Programmable Metallization Cell (PMC) or Conductive Bridge Random Access Memory (CBRAM), which can be important for applications such as multi-bit memory as well as non-volatile logic and neuromorphic…

This work focuses on the existence of multiple resistance states in a type of emerging non-volatile resistive memory device known commonly as Programmable Metallization Cell (PMC) or Conductive Bridge Random Access Memory (CBRAM), which can be important for applications such as multi-bit memory as well as non-volatile logic and neuromorphic computing. First, experimental data from small signal, quasi-static and pulsed mode electrical characterization of such devices are presented which clearly demonstrate the inherent multi-level resistance programmability property in CBRAM devices. A physics based analytical CBRAM compact model is then presented which simulates the ion-transport dynamics and filamentary growth mechanism that causes resistance change in such devices. Simulation results from the model are fitted to experimental dynamic resistance switching characteristics. The model designed using Verilog-a language is computation-efficient and can be integrated with industry standard circuit simulation tools for design and analysis of hybrid circuits involving both CMOS and CBRAM devices. Three main circuit applications for CBRAM devices are explored in this work. Firstly, the susceptibility of CBRAM memory arrays to single event induced upsets is analyzed via compact model simulation and experimental heavy ion testing data that show possibility of both high resistance to low resistance and low resistance to high resistance transitions due to ion strikes. Next, a non-volatile sense amplifier based flip-flop architecture is proposed which can help make leakage power consumption negligible by allowing complete shutdown of power supply while retaining its output data in CBRAM devices. Reliability and energy consumption of the flip-flop circuit for different CBRAM low resistance levels and supply voltage values are analyzed and compared to CMOS designs. Possible extension of this architecture for threshold logic function computation using the CBRAM devices as re-configurable resistive weights is also discussed. Lastly, Spike timing dependent plasticity (STDP) based gradual resistance change behavior in CBRAM device fabricated in back-end-of-line on a CMOS die containing integrate and fire CMOS neuron circuits is demonstrated for the first time which indicates the feasibility of using CBRAM devices as electronic synapses in spiking neural network hardware implementations for non-Boolean neuromorphic computing.

ContributorsMahalanabis, Debayan (Author) / Barnaby, Hugh J. (Thesis advisor) / Kozicki, Michael N. (Committee member) / Vrudhula, Sarma (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)

Created2015

Approximate neural networks for speech applications in resource-constrained environments

Description

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the…

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost

of keyword detection and speech recognition networks (or DNNs) are presented.

The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the

accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning,

where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory

footprint by 81 - 84% for speech recognition and keyword detection networks respectively.

Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces

hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without

loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory

requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate.

ContributorsArunachalam, Sairam (Author) / Chakrabarti, Chaitali (Thesis advisor) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2016

On-chip learning and inference acceleration of sparse representations

Description

The past decade has seen a tremendous surge in running machine learning (ML) functions on mobile devices, from mere novelty applications to now indispensable features for the next generation of devices.

While the mobile platform capabilities range widely, long battery life and reliability are common design concerns that are crucial to…

The past decade has seen a tremendous surge in running machine learning (ML) functions on mobile devices, from mere novelty applications to now indispensable features for the next generation of devices.

While the mobile platform capabilities range widely, long battery life and reliability are common design concerns that are crucial to remain competitive.

Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful CPUs with GPUs to accelerate the computation of deep neural networks (DNNs), which are the most common structures to perform ML operations.

But traditional von Neumann architectures are not optimized for the high memory bandwidth and massively parallel computation demands required by DNNs.

Hence, propelling research into non-von Neumann architectures to support the demands of DNNs.

The re-imagining of computer architectures to perform efficient DNN computations requires focusing on the prohibitive demands presented by DNNs and alleviating them. The two central challenges for efficient computation are (1) large memory storage and movement due to weights of the DNN and (2) massively parallel multiplications to compute the DNN output.

Introducing sparsity into the DNNs, where certain percentage of either the weights or the outputs of the DNN are zero, greatly helps with both challenges. This along with algorithm-hardware co-design to compress the DNNs is demonstrated to provide efficient solutions to greatly reduce the power consumption of hardware that compute DNNs. Additionally, exploring emerging technologies such as non-volatile memories and 3-D stacking of silicon in conjunction with algorithm-hardware co-design architectures will pave the way for the next generation of mobile devices.

Towards the objectives stated above, our specific contributions include (a) an architecture based on resistive crosspoint array that can update all values stored and compute matrix vector multiplication in parallel within a single cycle, (b) a framework of training DNNs with a block-wise sparsity to drastically reduce memory storage and total number of computations required to compute the output of DNNs, (c) the exploration of hardware implementations of sparse DNNs and architectural guidelines to reduce power consumption for the implementations in monolithic 3D integrated circuits, and (d) a prototype chip in 65nm CMOS accelerator for long-short term memory networks trained with the proposed block-wise sparsity scheme.

ContributorsKadetotad, Deepak Vinayak (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Vrudhula, Sarma (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by

Reliable arithmetic circuit design inspired by SNP systems

Multilevel resistance programming in conductive bridge resistive memory

Approximate neural networks for speech applications in resource-constrained environments

On-chip learning and inference acceleration of sparse representations