Matching Items (57)
Filtering by

Clear all filters

153033-Thumbnail Image.png
Description
Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else

Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else structure and select outcome of either branch to commit based on the

result of the conditional. This results in poor utilization of CGRA s computational

resources. Dual-issue scheme which is the state of the art technique for control flow

fetches instructions from both paths of the branch and selects one to execute at

runtime based on the result of the conditional. This technique has an overhead in

instruction fetch bandwidth. In this thesis, to improve performance of control flow

execution in CGRAs, I propose a solution in which the result of the conditional

expression that decides the branch outcome is communicated to the instruction fetch

unit to selectively issue instructions from the path taken by the branch at run time.

Experimental results show that my solution can achieve 34.6% better performance

and 52.1% improvement in energy efficiency on an average compared to state of the

art dual issue scheme without imposing any overhead in instruction fetch bandwidth.
ContributorsRajendran Radhika, Shri Hari (Author) / Shrivastava, Aviral (Thesis advisor) / Christen, Jennifer Blain (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2014
150167-Thumbnail Image.png
Description
Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for

Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for implementing high-throughput DSP systems that are also energy-efficient. Data-path components such as adders and multipliers are evaluated with respect to critical path delay, energy and Energy-Delay Product (EDP). A new design for a RBR adder with very good EDP performance has been proposed. The corresponding RBR parallel adder has a much lower critical path delay and EDP compared to two's complement carry select and carry look-ahead adder implementations. Next, several RBR multiplier architectures are investigated and their performance compared to two's complement systems. These include two new multiplier architectures: a purely RBR multiplier where both the operands are in RBR form, and a hybrid multiplier where the multiplicand is in RBR form and the other operand is represented in conventional two's complement form. Both the RBR and hybrid designs are demonstrated to have better EDP performance compared to conventional two's complement multipliers. The hybrid multiplier is also shown to have a superior EDP performance compared to the RBR multiplier, with much lower implementation area. Analysis on the effect of bit-precision is also performed, and it is shown that the performance gain of RBR systems improves for higher bit precision. Next, in order to demonstrate the efficacy of the RBR representation at the system-level, the performance of RBR and hybrid implementations of some common DSP kernels such as Discrete Cosine Transform, edge detection using Sobel operator, complex multiplication, Lifting-based Discrete Wavelet Transform (9, 7) filter, and FIR filter, is compared with two's complement systems. It is shown that for relatively large computation modules, the RBR to two's complement conversion overhead gets amortized. In case of systems with high complexity, for iso-throughput, both the hybrid and RBR implementations are demonstrated to be superior with lower average energy consumption. For low complexity systems, the conversion overhead is significant, and overpowers the EDP performance gain obtained from the RBR computation operation.
ContributorsMahadevan, Rupa (Author) / Chakrabarti, Chaitali (Thesis advisor) / Kiaei, Sayfe (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2011
150375-Thumbnail Image.png
Description
Current sensing ability is one of the most desirable features of contemporary current or voltage mode controlled DC-DC converters. Current sensing can be used for over load protection, multi-stage converter load balancing, current-mode control, multi-phase converter current-sharing, load independent control, power efficiency improvement etc. There are handful existing approaches for

Current sensing ability is one of the most desirable features of contemporary current or voltage mode controlled DC-DC converters. Current sensing can be used for over load protection, multi-stage converter load balancing, current-mode control, multi-phase converter current-sharing, load independent control, power efficiency improvement etc. There are handful existing approaches for current sensing such as external resistor sensing, triode mode current mirroring, observer sensing, Hall-Effect sensors, transformers, DC Resistance (DCR) sensing, Gm-C filter sensing etc. However, each method has one or more issues that prevent them from being successfully applied in DC-DC converter, e.g. low accuracy, discontinuous sensing nature, high sensitivity to switching noise, high cost, requirement of known external power filter components, bulky size, etc. In this dissertation, an offset-independent inductor Built-In Self Test (BIST) architecture is proposed which is able to measure the inductor inductance and DCR. The measured DCR enables the proposed continuous, lossless, average current sensing scheme. A digital Voltage Mode Control (VMC) DC-DC buck converter with the inductor BIST and current sensing architecture is designed, fabricated, and experimentally tested. The average measurement errors for inductance, DCR and current sensing are 2.1%, 3.6%, and 1.5% respectively. For the 3.5mm by 3.5mm die area, inductor BIST and current sensing circuits including related pins only consume 5.2% of the die area. BIST mode draws 40mA current for a maximum time period of 200us upon start-up and the continuous current sensing consumes about 400uA quiescent current. This buck converter utilizes an adaptive compensator. It could update compensator internally so that the overall system has a proper loop response for large range inductance and load current. Next, a digital Average Current Mode Control (ACMC) DC-DC buck converter with the proposed average current sensing circuits is designed and tested. To reduce chip area and power consumption, a 9 bits hybrid Digital Pulse Width Modulator (DPWM) which uses a Mixed-mode DLL (MDLL) is also proposed. The DC-DC converter has a maximum of 12V input, 1-11 V output range, and a maximum of 3W output power. The maximum error of one least significant bit (LSB) delay of the proposed DPWM is less than 1%.
ContributorsLiu, Tao (Author) / Bakkaloglu, Bertan (Thesis advisor) / Ozev, Sule (Committee member) / Vermeire, Bert (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2011
150211-Thumbnail Image.png
Description
The first part describes Metal Semiconductor Field Effect Transistor (MESFET) based fundamental analog building blocks designed and fabricated in a single poly, 3-layer metal digital CMOS technology utilizing fully depletion mode MESFET devices. DC characteristics were measured by varying the power supply from 2.5V to 5.5V. The measured DC transfer

The first part describes Metal Semiconductor Field Effect Transistor (MESFET) based fundamental analog building blocks designed and fabricated in a single poly, 3-layer metal digital CMOS technology utilizing fully depletion mode MESFET devices. DC characteristics were measured by varying the power supply from 2.5V to 5.5V. The measured DC transfer curves of amplifiers show good agreement with the simulated ones with extracted models from the same process. The accuracy of the current mirror showing inverse operation is within ±15% for the current from 0 to 1.5mA with the power supply from 2.5 to 5.5V. The second part presents a low-power image recognition system with a novel MESFET device fabricated on a CMOS substrate. An analog image recognition system with power consumption of 2.4mW/cell and a response time of 6µs is designed, fabricated and characterized. The experimental results verified the accuracy of the extracted SPICE model of SOS MESFETs. The response times of 4µs and 6µs for one by four and one by eight arrays, respectively, are achieved with the line recognition. Each core cell for both arrays consumes only 2.4mW. The last part presents a CMOS low-power transceiver in MICS band is presented. The LNA core has an integrated mixer in a folded configuration. The baseband strip consists of a pseudo differential MOS-C band-pass filter achieving demodulation of 150kHz-offset BFSK signals. The SRO is used in a wakeup RX for the wake-up signal reception. The all digital frequency-locked loop drives a class AB power amplifier in a transmitter. The sensitivity of -85dBm in the wakeup RX is achieved with the power consumption of 320µW and 400µW at the data rates of 100kb/s and 200kb/s from 1.8V, respectively. The sensitivities of -70dBm and -98dBm in the data-link RX are achieved with NF of 40dB and 11dB at the data rate of 100kb/s while consuming only 600µW and 1.5mW at 1.2V and 1.8V, respectively.
ContributorsKim, Sung (Author) / Bakkaloglu, Bertan (Thesis advisor) / Christen, Jennifer Blain (Committee member) / Cao, Yu (Committee member) / Thornton, Trevor (Committee member) / Arizona State University (Publisher)
Created2011
150360-Thumbnail Image.png
Description
A workload-aware low-power neuromorphic controller for dynamic power and thermal management in VLSI systems is presented. The neuromorphic controller predicts future workload and temperature values based on the past values and CPU performance counters and preemptively regulates supply voltage and frequency. System-level measurements from stateof-the-art commercial microprocessors are used to

A workload-aware low-power neuromorphic controller for dynamic power and thermal management in VLSI systems is presented. The neuromorphic controller predicts future workload and temperature values based on the past values and CPU performance counters and preemptively regulates supply voltage and frequency. System-level measurements from stateof-the-art commercial microprocessors are used to get workload, temperature and CPU performance counter values. The controller is designed and simulated using circuit-design and synthesis tools. At device-level, on-chip planar inductors suffer from low inductance occupying large chip area. On-chip inductors with integrated magnetic materials are designed, simulated and fabricated to explore performance-efficiency trade offs and explore potential applications such as resonant clocking and on-chip voltage regulation. A system level study is conducted to evaluate the effect of on-chip voltage regulator employing magnetic inductors as the output filter. It is concluded that neuromorphic power controller is beneficial for fine-grained per-core power management in conjunction with on-chip voltage regulators utilizing scaled magnetic inductors.
ContributorsSinha, Saurabh (Author) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Yu, Hongbin (Committee member) / Christen, Jennifer B. (Committee member) / Arizona State University (Publisher)
Created2011
149852-Thumbnail Image.png
Description
Negative bias temperature instability (NBTI) and channel hot carrier (CHC) are important reliability issues impacting analog circuit performance and lifetime. Compact reliability models and efficient simulation methods are essential for circuit level reliability prediction. This work proposes a set of compact models of NBTI and CHC effects for analog and

Negative bias temperature instability (NBTI) and channel hot carrier (CHC) are important reliability issues impacting analog circuit performance and lifetime. Compact reliability models and efficient simulation methods are essential for circuit level reliability prediction. This work proposes a set of compact models of NBTI and CHC effects for analog and mixed-signal circuit, and a direct prediction method which is different from conventional simulation methods. This method is applied in circuit benchmarks and evaluated. This work helps with improving efficiency and accuracy of circuit aging prediction.
ContributorsZheng, Rui (Author) / Cao, Yu (Thesis advisor) / Yu, Hongyu (Committee member) / Bakkaloglu, Bertan (Committee member) / Arizona State University (Publisher)
Created2011
149956-Thumbnail Image.png
Description
CMOS technology is expected to enter the 10nm regime for future integrated circuits (IC). Such aggressive scaling leads to vastly increased variability, posing a grand challenge to robust IC design. Variations in CMOS are often divided into two types: intrinsic variations and process-induced variations. Intrinsic variations are limited by fundamental

CMOS technology is expected to enter the 10nm regime for future integrated circuits (IC). Such aggressive scaling leads to vastly increased variability, posing a grand challenge to robust IC design. Variations in CMOS are often divided into two types: intrinsic variations and process-induced variations. Intrinsic variations are limited by fundamental physics. They are inherent to CMOS structure, considered as one of the ultimate barriers to the continual scaling of CMOS devices. In this work the three primary intrinsic variations sources are studied, including random dopant fluctuation (RDF), line-edge roughness (LER) and oxide thickness fluctuation (OTF). The research is focused on the modeling and simulation of those variations and their scaling trends. Besides the three variations, a time dependent variation source, Random Telegraph Noise (RTN) is also studied. Different from the other three variations, RTN does not contribute much to the total variation amount, but aggregate the worst case of Vth variations in CMOS. In this work a TCAD based simulation study on RTN is presented, and a new SPICE based simulation method for RTN is proposed for time domain circuit analysis. Process-induced variations arise from the imperfection in silicon fabrication, and vary from foundries to foundries. In this work the layout dependent Vth shift due to Rapid-Thermal Annealing (RTA) are investigated. In this work, we develop joint thermal/TCAD simulation and compact modeling tools to analyze performance variability under various layout pattern densities and RTA conditions. Moreover, we propose a suite of compact models that bridge the underlying RTA process with device parameter change for efficient design optimization.
ContributorsYe, Yun, Ph.D (Author) / Cao, Yu (Thesis advisor) / Yu, Hongbin (Committee member) / Song, Hongjiang (Committee member) / Clark, Lawrence (Committee member) / Arizona State University (Publisher)
Created2011
149992-Thumbnail Image.png
Description
Process variations have become increasingly important for scaled technologies starting at 45nm. The increased variations are primarily due to random dopant fluctuations, line-edge roughness and oxide thickness fluctuation. These variations greatly impact all aspects of circuit performance and pose a grand challenge to future robust IC design. To improve robustness,

Process variations have become increasingly important for scaled technologies starting at 45nm. The increased variations are primarily due to random dopant fluctuations, line-edge roughness and oxide thickness fluctuation. These variations greatly impact all aspects of circuit performance and pose a grand challenge to future robust IC design. To improve robustness, efficient methodology is required that considers effect of variations in the design flow. Analyzing timing variability of complex circuits with HSPICE simulations is very time consuming. This thesis proposes an analytical model to predict variability in CMOS circuits that is quick and accurate. There are several analytical models to estimate nominal delay performance but very little work has been done to accurately model delay variability. The proposed model is comprehensive and estimates nominal delay and variability as a function of transistor width, load capacitance and transition time. First, models are developed for library gates and the accuracy of the models is verified with HSPICE simulations for 45nm and 32nm technology nodes. The difference between predicted and simulated σ/μ for the library gates is less than 1%. Next, the accuracy of the model for nominal delay is verified for larger circuits including ISCAS'85 benchmark circuits. The model predicted results are within 4% error of HSPICE simulated results and take a small fraction of the time, for 45nm technology. Delay variability is analyzed for various paths and it is observed that non-critical paths can become critical because of Vth variation. Variability on shortest paths show that rate of hold violations increase enormously with increasing Vth variation.
ContributorsGummalla, Samatha (Author) / Chakrabarti, Chaitali (Thesis advisor) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Arizona State University (Publisher)
Created2011
150476-Thumbnail Image.png
Description
Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns.

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.
ContributorsYu, Chi-Li (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2012
151070-Thumbnail Image.png
Description
Built-in-Self-Test (BiST) for transmitters is a desirable choice since it eliminates the reliance on expensive instrumentation to do RF signal analysis. Existing on-chip resources, such as power or envelope detectors, or small additional circuitry can be used for BiST purposes. However, due to limited bandwidth, measurement of complex specifications, such

Built-in-Self-Test (BiST) for transmitters is a desirable choice since it eliminates the reliance on expensive instrumentation to do RF signal analysis. Existing on-chip resources, such as power or envelope detectors, or small additional circuitry can be used for BiST purposes. However, due to limited bandwidth, measurement of complex specifications, such as IQ imbalance, is challenging. In this work, a BiST technique to compute transmitter IQ imbalances using measurements out of a self-mixing envelope detector is proposed. Both the linear and non linear parameters of the RF transmitter path are extracted successfully. We first derive an analytical expression for the output signal. Using this expression, we devise test signals to isolate the effects of gain and phase imbalance, DC offsets, time skews and system nonlinearity from other parameters of the system. Once isolated, these parameters are calculated easily with a few mathematical operations. Simulations and hardware measurements show that the technique can provide accurate characterization of IQ imbalances. One of the glaring advantages of this method is that, the impairments are extracted from analyzing the response at baseband frequency and thereby eliminating the need of high frequency ATE (Automated Test Equipment).
ContributorsByregowda, Srinath (Author) / Ozev, Sule (Thesis advisor) / Cao, Yu (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)
Created2012