The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation.
Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR.
Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths.
Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.
In the second theme of this dissertation, a new model is presented to predict single event transients in 1T-1R memory arrays as an inverter, where the PMC is modeled as a constant resistance while the OFF transistor is model as a diode in parallel to a capacitance. The model divides the output voltage transient response of an inverter into three time segments, where an ionizing particle striking through the drain–body junction of the OFF-state NMOS is represented as a photocurrent pulse. If this current source is large enough, the output voltage can drop to a negative voltage. In this model, the OFF-state NMOS is represented as the parallel combination of an ideal diode and the intrinsic capacitance of the drain–body junction, while a resistance represents an ON-state NMOS. The proposed model is verified by 3-D TCAD mixed-mode device simulations. In order to investigate the flexibility of the model, the effects of important parameters, such as ON-state PMOS resistance, doping concentration of p-region in the diode, and the photocurrent pulse are scrutinized.
The third theme of this dissertation develops various models together with TCAD simulations to model the behavior of different diamond-based devices, including PIN diodes and bipolar junction transistors (BJTs). Diamond is a very attractive material for contemporary power semiconductor devices because of its excellent material properties, such as high breakdown voltage and superior thermal conductivity compared to other materials. Collectively, this research project enhances the development of high power and high temperature electronics using diamond-based semiconductors. During the fabrication process of diamond-based devices, structural defects particularly threading dislocations (TDs), may affect the device electrical properties, and models were developed to account of such defects. Recognition of their behavior helps us understand and predict the performance of diamond-based devices. Here, the electrical conductance through TD sites is shown to be governed by the Poole-Frenkel emission (PFE) for the temperature (T) range of 323 K ˂ T ˂ 423 K. Analytical models were performed to fit with experimental data over the aforementioned temperature range. Next, the Silvaco Atlas tool, a drift-diffusion based TCAD commercial software, was used to model diamond-based BJTs. Here, some field plate methods are proposed in order to decrease the surface electric field. The models used in Atlas are modified to account for both hopping transport in the impurity bands associated with high activation energies for boron doped and phosphorus doped diamond.
For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability.
Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance.
From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.
This thesis primarily focuses on two important hardware aspects of an IoT system: (a) an FPAA based reconfigurable sensing front-end system and (b) an FPGA based reconfigurable processing system. To enable reconfiguration capability for any sensor type, Programmable ANalog Device Array (PANDA), a transistor-level analog reconfigurable platform is proposed. CAD tools required for implementation of front-end circuits on the platform are also developed. To demonstrate the capability of the platform on silicon, a small-scale array of 24×25 PANDA cells is fabricated in 65nm technology. Several analog circuit building blocks including amplifiers, bias circuits and filters are prototyped on the platform, which demonstrates the effectiveness of the platform for rapid prototyping IoT sensor interfaces.
IoT systems typically use machine learning algorithms that run on the servers to process the data in order to make decisions. Recently, embedded processors are being used to preprocess the data at the energy-constrained sensor node or at IoT gateway, which saves considerable energy for transmission and bandwidth. Using conventional CPU based systems for implementing the machine learning algorithms is not energy-efficient. Hence an FPGA based hardware accelerator is proposed and an optimization methodology is developed to maximize throughput of any convolutional neural network (CNN) based machine learning algorithm on a resource-constrained FPGA.
Precise electrical manipulation of nanoscale defects such as vacancy nano-filaments is highly desired for the multi-level control of ReRAM. In this paper we present a systematic investigation on the pulse-train operation scheme for reliable multi-level control of conductive filament evolution. By applying the pulse-train scheme to a 3 bit per cell HfO2 ReRAM, the relative standard deviations of resistance levels are improved up to 80% compared to the single-pulse scheme. The observed exponential relationship between the saturated resistance and the pulse amplitude provides evidence for the gap-formation model of the filament-rupture process.
Hardware implementation of neuromorphic computing is attractive as a computing paradigm beyond the conventional digital computing. In this work, we show that the SET (off-to-on) transition of metal oxide resistive switching memory becomes probabilistic under a weak programming condition. The switching variability of the binary synaptic device implements a stochastic learning rule. Such stochastic SET transition was statistically measured and modeled for a simulation of a winner-take-all network for competitive learning. The simulation illustrates that with such stochastic learning, the orientation classification function of input patterns can be effectively realized. The system performance metrics were compared between the conventional approach using the analog synapse and the approach in this work that employs the binary synapse utilizing the stochastic learning. The feasibility of using binary synapse in the neurormorphic computing may relax the constraints to engineer continuous multilevel intermediate states and widens the material choice for the synaptic device design.