Matching Items (8)
Filtering by

Clear all filters

149956-Thumbnail Image.png
Description
CMOS technology is expected to enter the 10nm regime for future integrated circuits (IC). Such aggressive scaling leads to vastly increased variability, posing a grand challenge to robust IC design. Variations in CMOS are often divided into two types: intrinsic variations and process-induced variations. Intrinsic variations are limited by fundamental

CMOS technology is expected to enter the 10nm regime for future integrated circuits (IC). Such aggressive scaling leads to vastly increased variability, posing a grand challenge to robust IC design. Variations in CMOS are often divided into two types: intrinsic variations and process-induced variations. Intrinsic variations are limited by fundamental physics. They are inherent to CMOS structure, considered as one of the ultimate barriers to the continual scaling of CMOS devices. In this work the three primary intrinsic variations sources are studied, including random dopant fluctuation (RDF), line-edge roughness (LER) and oxide thickness fluctuation (OTF). The research is focused on the modeling and simulation of those variations and their scaling trends. Besides the three variations, a time dependent variation source, Random Telegraph Noise (RTN) is also studied. Different from the other three variations, RTN does not contribute much to the total variation amount, but aggregate the worst case of Vth variations in CMOS. In this work a TCAD based simulation study on RTN is presented, and a new SPICE based simulation method for RTN is proposed for time domain circuit analysis. Process-induced variations arise from the imperfection in silicon fabrication, and vary from foundries to foundries. In this work the layout dependent Vth shift due to Rapid-Thermal Annealing (RTA) are investigated. In this work, we develop joint thermal/TCAD simulation and compact modeling tools to analyze performance variability under various layout pattern densities and RTA conditions. Moreover, we propose a suite of compact models that bridge the underlying RTA process with device parameter change for efficient design optimization.
ContributorsYe, Yun, Ph.D (Author) / Cao, Yu (Thesis advisor) / Yu, Hongbin (Committee member) / Song, Hongjiang (Committee member) / Clark, Lawrence (Committee member) / Arizona State University (Publisher)
Created2011
152139-Thumbnail Image.png
Description
ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a certain kind of membrane systems, is inspired by the way the neurons in brain interact using electrical spikes. Compared to the traditional Boolean logic, SNP systems not only perform similar functions but also provide a more promising solution for reliable computation. Two basic neuron types, Low Pass (LP) neurons and High Pass (HP) neurons, are introduced. These two basic types of neurons are capable to build an arbitrary SNP neuron. This leads to the conclusion that these two basic neuron types are Turing complete since SNP systems has been proved Turing complete. These two basic types of neurons are further used as the elements to construct general-purpose arithmetic circuits, such as adder, subtractor and comparator. In this thesis, erroneous behaviors of neurons are discussed. Transmission error (spike loss) is proved to be equivalent to threshold error, which makes threshold error discussion more universal. To improve the reliability, a new structure called motif is proposed. Compared to Triple Modular Redundancy improvement, motif design presents its efficiency and effectiveness in both single neuron and arithmetic circuit analysis. DRAM-based CMOS circuits are used to implement the two basic types of neurons. Functionality of basic type neurons is proved using the SPICE simulations. The motif improved adder and the comparator, as compared to conventional Boolean logic design, are much more reliable with lower leakage, and smaller silicon area. This leads to the conclusion that SNP system could provide a more promising solution for reliable computation than the conventional Boolean logic.
ContributorsAn, Pei (Author) / Cao, Yu (Thesis advisor) / Barnaby, Hugh (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2013
149553-Thumbnail Image.png
Description
To extend the lifetime of complementary metal-oxide-semiconductors (CMOS), emerging process techniques are being proposed to conquer the manufacturing difficulties. New structures and materials are proposed with superior electrical properties to traditional CMOS, such as strain technology and feedback field-effect transistor (FB-FET). To continue the design success and make an impact

To extend the lifetime of complementary metal-oxide-semiconductors (CMOS), emerging process techniques are being proposed to conquer the manufacturing difficulties. New structures and materials are proposed with superior electrical properties to traditional CMOS, such as strain technology and feedback field-effect transistor (FB-FET). To continue the design success and make an impact on leading products, advanced circuit design exploration must begin concurrently with early silicon development. Therefore, an accurate and scalable model is desired to correctly capture those effects and flexible to extend to alternative process choices. For example, strain technology has been successfully integrated into CMOS fabrication to improve transistor performance but the stress is non-uniformly distributed in the channel, leading to systematic performance variations. In this dissertation, a new layout-dependent stress model is proposed as a function of layout, temperature, and other device parameters. Furthermore, a method of layout decomposition is developed to partition the layout into a set of simple patterns for model extraction. These solutions significantly reduce the complexity in stress modeling and simulation. On the other hand, semiconductor devices with self-feedback mechanisms are emerging as promising alternatives to CMOS. Fe-FET was proposed to improve the switching by integrating a ferroelectric material as gate insulator in a MOSFET structure. Under particular circumstances, ferroelectric capacitance is effectively negative, due to the negative slope of its polarization-electrical field curve. This property makes the ferroelectric layer a voltage amplifier to boost surface potential, achieving fast transition. A new threshold voltage model for Fe-FET is developed, and is further revealed that the impact of random dopant fluctuation (RDF) can be suppressed. Furthermore, through silicon via (TSV), a key technology that enables the 3D integration of chips, is studied. TSV structure is usually a cylindrical metal-oxide-semiconductors (MOS) capacitor. A piecewise capacitance model is proposed for 3D interconnect simulation. Due to the mismatch in coefficients of thermal expansion (CTE) among materials, thermal stress is observed in TSV process and impacts neighboring devices. The stress impact is investigated to support the interaction between silicon process and IC design at the early stage.
ContributorsWang, Chi-Chao (Author) / Cao, Yu (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Clark, Lawrence (Committee member) / Schroder, Dieter (Committee member) / Arizona State University (Publisher)
Created2011
157619-Thumbnail Image.png
Description
The past decade has seen a tremendous surge in running machine learning (ML) functions on mobile devices, from mere novelty applications to now indispensable features for the next generation of devices.

While the mobile platform capabilities range widely, long battery life and reliability are common design concerns that are crucial to

The past decade has seen a tremendous surge in running machine learning (ML) functions on mobile devices, from mere novelty applications to now indispensable features for the next generation of devices.

While the mobile platform capabilities range widely, long battery life and reliability are common design concerns that are crucial to remain competitive.

Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful CPUs with GPUs to accelerate the computation of deep neural networks (DNNs), which are the most common structures to perform ML operations.

But traditional von Neumann architectures are not optimized for the high memory bandwidth and massively parallel computation demands required by DNNs.

Hence, propelling research into non-von Neumann architectures to support the demands of DNNs.

The re-imagining of computer architectures to perform efficient DNN computations requires focusing on the prohibitive demands presented by DNNs and alleviating them. The two central challenges for efficient computation are (1) large memory storage and movement due to weights of the DNN and (2) massively parallel multiplications to compute the DNN output.

Introducing sparsity into the DNNs, where certain percentage of either the weights or the outputs of the DNN are zero, greatly helps with both challenges. This along with algorithm-hardware co-design to compress the DNNs is demonstrated to provide efficient solutions to greatly reduce the power consumption of hardware that compute DNNs. Additionally, exploring emerging technologies such as non-volatile memories and 3-D stacking of silicon in conjunction with algorithm-hardware co-design architectures will pave the way for the next generation of mobile devices.

Towards the objectives stated above, our specific contributions include (a) an architecture based on resistive crosspoint array that can update all values stored and compute matrix vector multiplication in parallel within a single cycle, (b) a framework of training DNNs with a block-wise sparsity to drastically reduce memory storage and total number of computations required to compute the output of DNNs, (c) the exploration of hardware implementations of sparse DNNs and architectural guidelines to reduce power consumption for the implementations in monolithic 3D integrated circuits, and (d) a prototype chip in 65nm CMOS accelerator for long-short term memory networks trained with the proposed block-wise sparsity scheme.
ContributorsKadetotad, Deepak Vinayak (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Vrudhula, Sarma (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2019
154803-Thumbnail Image.png
Description
Over decades, scientists have been scaling devices to increasingly smaller feature sizes for ever better performance of complementary metal-oxide semiconductor (CMOS) technology to meet requirements on speed, complexity, circuit density, power consumption and ultimately cost required by many advanced applications. However, going to these ultra-scaled CMOS devices also brings some

Over decades, scientists have been scaling devices to increasingly smaller feature sizes for ever better performance of complementary metal-oxide semiconductor (CMOS) technology to meet requirements on speed, complexity, circuit density, power consumption and ultimately cost required by many advanced applications. However, going to these ultra-scaled CMOS devices also brings some drawbacks. Aging due to bias-temperature-instability (BTI) and Hot carrier injection (HCI) is the dominant cause of functional failure in large scale logic circuits. The aging phenomena, on top of process variations, translate into complexity and reduced design margin for circuits. Such issues call for “Design for Reliability”. In order to increase the overall design efficiency, it is important to (i) study the impact of aging on circuit level along with the transistor level understanding (ii) calibrate the theoretical findings with measurement data (iii) implementing tools that analyze the impact of BTI and HCI reliability on circuit timing into VLSI design process at each stage. In this work, post silicon measurements of a 28nm HK-MG technology are done to study the effect of aging on Frequency Degradation of digital circuits. A novel voltage controlled ring oscillator (VCO) structure, developed by NIMO research group is used to determine the effect of aging mechanisms like NBTI, PBTI and SILC on circuit parameters. Accelerated aging mechanism is proposed to avoid the time consuming measurement process and extrapolation of data to the end of life thus instead of predicting the circuit behavior, one can measure it, within a short period of time. Finally, to bridge the gap between device level models and circuit level aging analysis, a System Level Reliability Analysis Flow (SyRA) developed by NIMO group, is implemented for a TSMC 65nm industrial level design to achieve one-step reliability prediction for digital design.
ContributorsBansal, Ankita (Author) / Cao, Yu (Thesis advisor) / Seo, Jae sun (Committee member) / Barnaby, Hugh (Committee member) / Arizona State University (Publisher)
Created2016
154317-Thumbnail Image.png
Description
Rail clamp circuits are widely used for electrostatic discharge (ESD) protection in semiconductor products today. A step-by-step design procedure for the traditional RC and single-inverter-based rail clamp circuit and the design, simulation, implementation, and operation of two novel rail clamp circuits are described for use in the ESD protection of

Rail clamp circuits are widely used for electrostatic discharge (ESD) protection in semiconductor products today. A step-by-step design procedure for the traditional RC and single-inverter-based rail clamp circuit and the design, simulation, implementation, and operation of two novel rail clamp circuits are described for use in the ESD protection of complementary metal-oxide-semiconductor (CMOS) circuits. The step-by-step design procedure for the traditional circuit is technology-node independent, can be fully automated, and aims to achieve a minimal area design that meets specified leakage and ESD specifications under all valid process, voltage, and temperature (PVT) conditions. The first novel rail clamp circuit presented employs a comparator inside the traditional circuit to reduce the value of the time constant needed. The second circuit uses a dynamic time constant approach in which the value of the time constant is dynamically adjusted after the clamp is triggered. Important metrics for the two new circuits such as ESD performance, latch-on immunity, clamp recovery time, supply noise immunity, fastest power-on time supported, and area are evaluated over an industry-standard PVT space using SPICE simulations and measurements on a fabricated 40 nm test chip.
ContributorsVenkatasubramanian, Ramachandran (Author) / Ozev, Sule (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Cao, Yu (Committee member) / Kitchen, Jennifer (Committee member) / Arizona State University (Publisher)
Created2016
153328-Thumbnail Image.png
Description
The aging process due to Bias Temperature Instability (both NBTI and PBTI) and Channel Hot Carrier (CHC) is a key limiting factor of circuit lifetime in CMOS design. Threshold voltage shift due to BTI is a strong function of stress voltage and temperature complicating stress and recovery prediction. This poses

The aging process due to Bias Temperature Instability (both NBTI and PBTI) and Channel Hot Carrier (CHC) is a key limiting factor of circuit lifetime in CMOS design. Threshold voltage shift due to BTI is a strong function of stress voltage and temperature complicating stress and recovery prediction. This poses a unique challenge for long-term aging prediction for wide range of stress patterns. Traditional approaches usually resort to an average stress waveform to simplify the lifetime prediction. They are efficient, but fail to capture circuit operation, especially under dynamic voltage scaling (DVS) or in analog/mixed signal designs where the stress waveform is much more random. This work presents a suite of modelling solutions for BTI that enable aging simulation under all possible stress conditions. Key features of this work are compact models to predict BTI aging based on Reaction-Diffusion theory when the stress voltage is varying. The results to both reaction-diffusion (RD) and trapping-detrapping (TD) mechanisms are presented to cover underlying physics. Silicon validation of these models is performed at 28nm, 45nm and 65nm technology nodes, at both device and circuit levels. Efficient simulation leveraging the BTI models under DVS and random input waveform is applied to both digital and analog representative circuits such as ring oscillators and LNA. Both physical mechanisms are combined into a unified model which improves prediction accuracy at 45nm and 65nm nodes. Critical failure condition is also illustrated based on NBTI and PBTI at 28nm. A comprehensive picture for duty cycle shift is shown. DC stress under clock gating schemes results in monotonic shift in duty cycle which an AC stress causes duty cycle to converge close to 50% value. Proposed work provides a general and comprehensive solution to aging analysis under random stress patterns under BTI.

Channel hot carrier (CHC) is another dominant degradation mechanism which affects analog and mixed signal circuits (AMS) as transistor operates continuously in saturation condition. New model is proposed to account for e-e scattering in advanced technology nodes due to high gate electric field. The model is validated with 28nm and 65nm thick oxide data for different stress voltages. It demonstrates shift in worst case CHC condition to Vgs=Vds from Vgs=0.5Vds. A novel iteration based aging simulation framework for AMS designs is proposed which eliminates limitation for conventional reliability tools. This approach helps us identify a unique positive feedback mechanism termed as Bias Runaway. Bias runaway, is rapid increase of the bias voltage in AMS circuits which occurs when the feedback between the bias current and the effect of channel hot carrier turns into positive. The degradation of CHC is a gradual process but under specific circumstances, the degradation rate can be dramatically accelerated. Such a catastrophic phenomenon is highly sensitive to the initial operation condition, as well as transistor gate length. Based on 65nm silicon data, our work investigates the critical condition that triggers bias runaway, and the impact of gate length tuning. We develop new compact models as well as the simulation methodology for circuit diagnosis, and propose design solutions and the trade-offs to avoid bias runaway, which is vitally important to reliable AMS designs.
ContributorsSutaria, Ketul (Author) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Chakrabarti, Chaitali (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)
Created2014
154757-Thumbnail Image.png
Description
Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost

of keyword detection and speech recognition networks (or DNNs) are presented.

The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the

accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning,

where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory

footprint by 81 - 84% for speech recognition and keyword detection networks respectively.

Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces

hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without

loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory

requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate.
ContributorsArunachalam, Sairam (Author) / Chakrabarti, Chaitali (Thesis advisor) / Seo, Jae-Sun (Thesis advisor) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2016