Matching Items (59)
Filtering by

Clear all filters

153935-Thumbnail Image.png
Description
Clock generation and distribution are essential to CMOS microchips, providing synchronization to external devices and between internal sequential logic. Clocks in microprocessors are highly vulnerable to single event effects and designing reliable energy efficient clock networks for mission critical applications is a major challenge. This dissertation studies the basics of

Clock generation and distribution are essential to CMOS microchips, providing synchronization to external devices and between internal sequential logic. Clocks in microprocessors are highly vulnerable to single event effects and designing reliable energy efficient clock networks for mission critical applications is a major challenge. This dissertation studies the basics of radiation hardening, essentials of clock design and impact of particle strikes on clocks in detail and presents design techniques for hardening complete clock systems in digital ICs.

Since the sequential elements play a key role in deciding the robustness of any clocking strategy, hardened-by-design implementations of triple-mode redundant (TMR) pulse clocked latches and physical design methodologies for using TMR master-slave flip-flops in application specific ICs (ASICs) are proposed. A novel temporal pulse clocked latch design for low power radiation hardened applications is also proposed. Techniques for designing custom RHBD clock distribution networks (clock spines) and ASIC clock trees for a radiation hardened microprocessor using standard CAD tools are presented. A framework for analyzing the vulnerabilities of clock trees in general, and study the parameters that contribute the most to the tree’s failure, including impact on controlled latches is provided. This is then used to design an integrated temporally redundant clock tree and pulse clocked flip-flop based clocking scheme that is robust to single event transients (SETs) and single event upsets (SEUs). Subsequently, designing robust clock delay lines for use in double data rate (DDRx) memory applications is studied in detail. Several modules of the proposed radiation hardened all-digital delay locked loop are designed and studied. Many of the circuits proposed in this entire body of work have been implemented and tested on a standard low-power 90-nm process.
ContributorsChellappa, Srivatsan (Author) / Clark, Lawrence T (Thesis advisor) / Holbert, Keith E. (Committee member) / Cao, Yu (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)
Created2015
153890-Thumbnail Image.png
Description
The recent flurry of security breaches have raised serious concerns about the security of data communication and storage. A promising way to enhance the security of the system is through physical root of trust, such as, through use of physical unclonable functions (PUF). PUF leverages the inherent randomness in physical

The recent flurry of security breaches have raised serious concerns about the security of data communication and storage. A promising way to enhance the security of the system is through physical root of trust, such as, through use of physical unclonable functions (PUF). PUF leverages the inherent randomness in physical systems to provide device specific authentication and encryption.

In this thesis, first the design of a highly reliable resistive random access memory (RRAM) PUF is presented. Compared to existing 1 cell/bit RRAM, here the sum of the read-out currents of multiple RRAM cells are used for generating one response bit. This method statistically minimizes any early-lifetime failure due to RRAM retention degradation at high temperature or under voltage stress. Using a device model that was calibrated using IMEC HfOx RRAM experimental data, it was shown that an 8 cells/bit architecture achieves 99.9999% reliability for a lifetime >10 years at 125℃ . Also, the hardware area overhead of the proposed 8 cells/bit RRAM PUF architecture was smaller than 1 cell/bit RRAM PUF that requires error correction coding to achieve the same reliability.

Next, a basic security primitive is presented, where the RRAM PUF is embedded in the cryptographic module, SHA-256. This architecture is referred to as Embedded PUF or EPUF. EPUF has a security advantage over SHA-256 as it never exposes the PUF response to the outside world. Instead, in each round, the PUF response is used to change a few bits of the message word to produce a unique message digest for each IC. The use of EPUF as a key generation module for AES is also shown. The hardware area requirement for SHA-256 and AES-128 is then analyzed using synthesis results based on TSMC 65nm library. It is shown that the area overhead of 8 cells/bit RRAM PUF is only 1.08% of the SHA-256 module and 0.04% of the AES-128 module. The security analysis of the PUF based systems is also presented. It is shown that the EPUF-based systems are resistant towards standard attacks on PUFs, and that the security of the cryptographic modules is not compromised.
ContributorsShrivastava, Ayush (Author) / Chakrabarti, Chaitali (Thesis advisor) / Yu, Shimeng (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2015
154267-Thumbnail Image.png
Description
Internet of Things (IoT) has become a popular topic in industry over the recent years, which describes an ecosystem of internet-connected devices or things that enrich the everyday life by improving our productivity and efficiency. The primary components of the IoT ecosystem are hardware, software and services. While the software

Internet of Things (IoT) has become a popular topic in industry over the recent years, which describes an ecosystem of internet-connected devices or things that enrich the everyday life by improving our productivity and efficiency. The primary components of the IoT ecosystem are hardware, software and services. While the software and services of IoT system focus on data collection and processing to make decisions, the underlying hardware is responsible for sensing the information, preprocess and transmit it to the servers. Since the IoT ecosystem is still in infancy, there is a great need for rapid prototyping platforms that would help accelerate the hardware design process. However, depending on the target IoT application, different sensors are required to sense the signals such as heart-rate, temperature, pressure, acceleration, etc., and there is a great need for reconfigurable platforms that can prototype different sensor interfacing circuits.

This thesis primarily focuses on two important hardware aspects of an IoT system: (a) an FPAA based reconfigurable sensing front-end system and (b) an FPGA based reconfigurable processing system. To enable reconfiguration capability for any sensor type, Programmable ANalog Device Array (PANDA), a transistor-level analog reconfigurable platform is proposed. CAD tools required for implementation of front-end circuits on the platform are also developed. To demonstrate the capability of the platform on silicon, a small-scale array of 24×25 PANDA cells is fabricated in 65nm technology. Several analog circuit building blocks including amplifiers, bias circuits and filters are prototyped on the platform, which demonstrates the effectiveness of the platform for rapid prototyping IoT sensor interfaces.

IoT systems typically use machine learning algorithms that run on the servers to process the data in order to make decisions. Recently, embedded processors are being used to preprocess the data at the energy-constrained sensor node or at IoT gateway, which saves considerable energy for transmission and bandwidth. Using conventional CPU based systems for implementing the machine learning algorithms is not energy-efficient. Hence an FPGA based hardware accelerator is proposed and an optimization methodology is developed to maximize throughput of any convolutional neural network (CNN) based machine learning algorithm on a resource-constrained FPGA.
ContributorsSuda, Naveen (Author) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Ozev, Sule (Committee member) / Yu, Shimeng (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)
Created2016
154185-Thumbnail Image.png
Description
Due to high level of integration in RF System on Chip (SOC), the test access points are limited to the baseband and RF inputs/outputs of the system. This limited access poses a big challenge particularly for advanced RF architectures where calibration of internal parameters is necessary and ensure proper operation.

Due to high level of integration in RF System on Chip (SOC), the test access points are limited to the baseband and RF inputs/outputs of the system. This limited access poses a big challenge particularly for advanced RF architectures where calibration of internal parameters is necessary and ensure proper operation. Therefore low-overhead built-in Self-Test (BIST) solution for advanced RF transceiver is proposed. In this dissertation. Firstly, comprehensive BIST solution for RF polar transceivers using on-chip resources is presented. In the receiver, phase and gain mismatches degrade sensitivity and error vector magnitude (EVM). In the transmitter, delay skew between the envelope and phase signals and the finite envelope bandwidth can create intermodulation distortion (IMD) that leads to violation of spectral mask requirements. Characterization and calibration of these parameters with analytical model would reduce the test time and cost considerably. Hence, a technique to measure and calibrate impairments of the polar transceiver in the loop-back mode is proposed.

Secondly, robust amplitude measurement technique for RF BIST application and BIST circuits for loop-back connection are discussed. Test techniques using analytical model are explained and BIST circuits are introduced.

Next, a self-compensating built-in self-test solution for RF Phased Array Mismatch is proposed. In the proposed method, a sinusoidal test signal with unknown amplitude is applied to the inputs of two adjacent phased array elements and measure the baseband output signal after down-conversion. Mathematical modeling of the circuit impairments and phased array behavior indicates that by using two distinct input amplitudes, both of which can remain unknown, it is possible to measure the important parameters of the phased array, such as gain and phase mismatch. In addition, proposed BIST system is designed and fabricated using IBM 180nm process and a prototype four-element phased-array PCB is also designed and fabricated for verifying the proposed method.

Finally, process independent gain measurement via BIST/DUT co-design is explained. Design methodology how to reduce performance impact significantly is discussed.

Simulation and hardware measurements results for the proposed techniques show that the proposed technique can characterize the targeted impairments accurately.
ContributorsJeong, Jae Woong (Author) / Ozev, Sule (Thesis advisor) / Kitchen, Jennifer (Committee member) / Cao, Yu (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)
Created2015
154317-Thumbnail Image.png
Description
Rail clamp circuits are widely used for electrostatic discharge (ESD) protection in semiconductor products today. A step-by-step design procedure for the traditional RC and single-inverter-based rail clamp circuit and the design, simulation, implementation, and operation of two novel rail clamp circuits are described for use in the ESD protection of

Rail clamp circuits are widely used for electrostatic discharge (ESD) protection in semiconductor products today. A step-by-step design procedure for the traditional RC and single-inverter-based rail clamp circuit and the design, simulation, implementation, and operation of two novel rail clamp circuits are described for use in the ESD protection of complementary metal-oxide-semiconductor (CMOS) circuits. The step-by-step design procedure for the traditional circuit is technology-node independent, can be fully automated, and aims to achieve a minimal area design that meets specified leakage and ESD specifications under all valid process, voltage, and temperature (PVT) conditions. The first novel rail clamp circuit presented employs a comparator inside the traditional circuit to reduce the value of the time constant needed. The second circuit uses a dynamic time constant approach in which the value of the time constant is dynamically adjusted after the clamp is triggered. Important metrics for the two new circuits such as ESD performance, latch-on immunity, clamp recovery time, supply noise immunity, fastest power-on time supported, and area are evaluated over an industry-standard PVT space using SPICE simulations and measurements on a fabricated 40 nm test chip.
ContributorsVenkatasubramanian, Ramachandran (Author) / Ozev, Sule (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Cao, Yu (Committee member) / Kitchen, Jennifer (Committee member) / Arizona State University (Publisher)
Created2016
152459-Thumbnail Image.png
Description
Non-volatile memories (NVM) are widely used in modern electronic devices due to their non-volatility, low static power consumption and high storage density. While Flash memories are the dominant NVM technology, resistive memories such as phase change access memory (PRAM) and spin torque transfer random access memory (STT-MRAM) are gaining ground.

Non-volatile memories (NVM) are widely used in modern electronic devices due to their non-volatility, low static power consumption and high storage density. While Flash memories are the dominant NVM technology, resistive memories such as phase change access memory (PRAM) and spin torque transfer random access memory (STT-MRAM) are gaining ground. All these technologies suffer from reliability degradation due to process variations, structural limits and material property shift. To address the reliability concerns of these NVM technologies, multi-level low cost solutions are proposed for each of them. My approach consists of first building a comprehensive error model. Next the error characteristics are exploited to develop low cost multi-level strategies to compensate for the errors. For instance, for NAND Flash memory, I first characterize errors due to threshold voltage variations as a function of the number of program/erase cycles. Next a flexible product code is designed to migrate to a stronger ECC scheme as program/erase cycles increases. An adaptive data refresh scheme is also proposed to improve memory reliability with low energy cost for applications with different data update frequencies. For PRAM, soft errors and hard errors models are built based on shifts in the resistance distributions. Next I developed a multi-level error control approach involving bit interleaving and subblock flipping at the architecture level, threshold resistance tuning at the circuit level and programming current profile tuning at the device level. This approach helped reduce the error rate significantly so that it was now sufficient to use a low cost ECC scheme to satisfy the memory reliability constraint. I also studied the reliability of a PRAM+DRAM hybrid memory system and analyzed the tradeoffs between memory performance, programming energy and lifetime. For STT-MRAM, I first developed an error model based on process variations. I developed a multi-level approach to reduce the error rates that consisted of increasing the W/L ratio of the access transistor, increasing the voltage difference across the memory cell and adjusting the current profile during write operation. This approach enabled use of a low cost BCH based ECC scheme to achieve very low block failure rates.
ContributorsYang, Chengen (Author) / Chakrabarti, Chaitali (Thesis advisor) / Cao, Yu (Committee member) / Ogras, Umit Y. (Committee member) / Bakkaloglu, Bertan (Committee member) / Arizona State University (Publisher)
Created2014
152691-Thumbnail Image.png
Description
Animals learn to choose a proper action among alternatives according to the circumstance. Through trial-and-error, animals improve their odds by making correct association between their behavioral choices and external stimuli. While there has been an extensive literature on the theory of learning, it is still unclear how individual neurons and

Animals learn to choose a proper action among alternatives according to the circumstance. Through trial-and-error, animals improve their odds by making correct association between their behavioral choices and external stimuli. While there has been an extensive literature on the theory of learning, it is still unclear how individual neurons and a neural network adapt as learning progresses. In this dissertation, single units in the medial and lateral agranular (AGm and AGl) cortices were recorded as rats learned a directional choice task. The task required the rat to make a left/right side lever press if a light cue appeared on the left/right side of the interface panel. Behavior analysis showed that rat's movement parameters during performance of directional choices became stereotyped very quickly (2-3 days) while learning to solve the directional choice problem took weeks to occur. The entire learning process was further broken down to 3 stages, each having similar number of recording sessions (days). Single unit based firing rate analysis revealed that 1) directional rate modulation was observed in both cortices; 2) the averaged mean rate between left and right trials in the neural ensemble each day did not change significantly among the three learning stages; 3) the rate difference between left and right trials of the ensemble did not change significantly either. Besides, for either left or right trials, the trial-to-trial firing variability of single neurons did not change significantly over the three stages. To explore the spatiotemporal neural pattern of the recorded ensemble, support vector machines (SVMs) were constructed each day to decode the direction of choice in single trials. Improved classification accuracy indicated enhanced discriminability between neural patterns of left and right choices as learning progressed. When using a restricted Boltzmann machine (RBM) model to extract features from neural activity patterns, results further supported the idea that neural firing patterns adapted during the three learning stages to facilitate the neural codes of directional choices. Put together, these findings suggest a spatiotemporal neural coding scheme in a rat AGl and AGm neural ensemble that may be responsible for and contributing to learning the directional choice task.
ContributorsMao, Hongwei (Author) / Si, Jennie (Thesis advisor) / Buneo, Christopher (Committee member) / Cao, Yu (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)
Created2014
153328-Thumbnail Image.png
Description
The aging process due to Bias Temperature Instability (both NBTI and PBTI) and Channel Hot Carrier (CHC) is a key limiting factor of circuit lifetime in CMOS design. Threshold voltage shift due to BTI is a strong function of stress voltage and temperature complicating stress and recovery prediction. This poses

The aging process due to Bias Temperature Instability (both NBTI and PBTI) and Channel Hot Carrier (CHC) is a key limiting factor of circuit lifetime in CMOS design. Threshold voltage shift due to BTI is a strong function of stress voltage and temperature complicating stress and recovery prediction. This poses a unique challenge for long-term aging prediction for wide range of stress patterns. Traditional approaches usually resort to an average stress waveform to simplify the lifetime prediction. They are efficient, but fail to capture circuit operation, especially under dynamic voltage scaling (DVS) or in analog/mixed signal designs where the stress waveform is much more random. This work presents a suite of modelling solutions for BTI that enable aging simulation under all possible stress conditions. Key features of this work are compact models to predict BTI aging based on Reaction-Diffusion theory when the stress voltage is varying. The results to both reaction-diffusion (RD) and trapping-detrapping (TD) mechanisms are presented to cover underlying physics. Silicon validation of these models is performed at 28nm, 45nm and 65nm technology nodes, at both device and circuit levels. Efficient simulation leveraging the BTI models under DVS and random input waveform is applied to both digital and analog representative circuits such as ring oscillators and LNA. Both physical mechanisms are combined into a unified model which improves prediction accuracy at 45nm and 65nm nodes. Critical failure condition is also illustrated based on NBTI and PBTI at 28nm. A comprehensive picture for duty cycle shift is shown. DC stress under clock gating schemes results in monotonic shift in duty cycle which an AC stress causes duty cycle to converge close to 50% value. Proposed work provides a general and comprehensive solution to aging analysis under random stress patterns under BTI.

Channel hot carrier (CHC) is another dominant degradation mechanism which affects analog and mixed signal circuits (AMS) as transistor operates continuously in saturation condition. New model is proposed to account for e-e scattering in advanced technology nodes due to high gate electric field. The model is validated with 28nm and 65nm thick oxide data for different stress voltages. It demonstrates shift in worst case CHC condition to Vgs=Vds from Vgs=0.5Vds. A novel iteration based aging simulation framework for AMS designs is proposed which eliminates limitation for conventional reliability tools. This approach helps us identify a unique positive feedback mechanism termed as Bias Runaway. Bias runaway, is rapid increase of the bias voltage in AMS circuits which occurs when the feedback between the bias current and the effect of channel hot carrier turns into positive. The degradation of CHC is a gradual process but under specific circumstances, the degradation rate can be dramatically accelerated. Such a catastrophic phenomenon is highly sensitive to the initial operation condition, as well as transistor gate length. Based on 65nm silicon data, our work investigates the critical condition that triggers bias runaway, and the impact of gate length tuning. We develop new compact models as well as the simulation methodology for circuit diagnosis, and propose design solutions and the trade-offs to avoid bias runaway, which is vitally important to reliable AMS designs.
ContributorsSutaria, Ketul (Author) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Chakrabarti, Chaitali (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)
Created2014
153033-Thumbnail Image.png
Description
Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else

Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else structure and select outcome of either branch to commit based on the

result of the conditional. This results in poor utilization of CGRA s computational

resources. Dual-issue scheme which is the state of the art technique for control flow

fetches instructions from both paths of the branch and selects one to execute at

runtime based on the result of the conditional. This technique has an overhead in

instruction fetch bandwidth. In this thesis, to improve performance of control flow

execution in CGRAs, I propose a solution in which the result of the conditional

expression that decides the branch outcome is communicated to the instruction fetch

unit to selectively issue instructions from the path taken by the branch at run time.

Experimental results show that my solution can achieve 34.6% better performance

and 52.1% improvement in energy efficiency on an average compared to state of the

art dual issue scheme without imposing any overhead in instruction fetch bandwidth.
ContributorsRajendran Radhika, Shri Hari (Author) / Shrivastava, Aviral (Thesis advisor) / Christen, Jennifer Blain (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2014
156189-Thumbnail Image.png
Description
Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of a network of combinational logic cells and clocked sequential

elements, such as latches and flip-flops that are used for sequencing computations

over

Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of a network of combinational logic cells and clocked sequential

elements, such as latches and flip-flops that are used for sequencing computations

over time. The majority of the digital design techniques to reduce power, area, and

leakage over the past four decades have focused almost entirely on optimizing the

combinational logic. This work explores alternate architectures for the flip-flops for

improving the overall circuit performance, power and area. It consists of three main

sections.

First, is the design of a multi-input configurable flip-flop structure with embedded

logic. A conventional D-type flip-flop may be viewed as realizing an identity function,

in which the output is simply the value of the input sampled at the clock edge. In

contrast, the proposed multi-input flip-flop, named PNAND, can be configured to

realize one of a family of Boolean functions called threshold functions. In essence,

the PNAND is a circuit implementation of the well-known binary perceptron. Unlike

other reconfigurable circuits, a PNAND can be configured by simply changing the

assignment of signals to its inputs. Using a standard cell library of such gates, a technology

mapping algorithm can be applied to transform a given netlist into one with

an optimal mixture of conventional logic gates and threshold gates. This approach

was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier

in 65nm LP technology. Simulation and chip measurements show more than 30%

improvement in dynamic power and more than 20% reduction in core area.

The functional yield of the PNAND reduces with geometry and voltage scaling.

The second part of this research investigates the use of two mechanisms to improve

the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM

devices for low voltage operation.

The third part of this research focused on the design of flip-flops with non-volatile

storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated

with both conventional D-flipflop and the PNAND circuits to implement non-volatile

logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of

system locally when a power interruption occurs. However, manufacturing variations

in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading

to an overly pessimistic design and consequently, higher energy consumption. A

detailed analysis of the design trade-offs in the driver circuitry for performing backup

and restore, and a novel method to design the energy optimal driver for a given yield is

presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,

in which the backup time is determined on a per-chip basis, resulting in minimizing

the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,

the conventional approach would have to expend nearly 5X more energy than the

minimum required, whereas the proposed tunable approach expends only 26% more

energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are

designed with the same backup and restore circuitry in 65nm technology. The embedded

logic in NV-TLFF compensates performance overhead of NVL. This leads to the

possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-

accumulate (MAC) unit is designed to demonstrate the performance benefits of the

proposed architecture. Based on the results of HSPICE simulations, the MAC circuit

with the proposed NV-TLFF cells is shown to consume at least 20% less power and

area as compared to the circuit designed with conventional DFFs, without sacrificing

any performance.
ContributorsYang, Jinghua (Author) / Vrudhula, Sarma (Thesis advisor) / Barnaby, Hugh (Committee member) / Cao, Yu (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)
Created2018