Search Content

System level power and thermal management on embedded processors

Description

Semiconductor scaling technology has led to a sharp growth in transistor counts. This has resulted in an exponential increase on both power dissipation and heat flux (or power density) in modern microprocessors. These microprocessors are integrated as the major components in many modern embedded devices, which offer richer features and…

Semiconductor scaling technology has led to a sharp growth in transistor counts. This has resulted in an exponential increase on both power dissipation and heat flux (or power density) in modern microprocessors. These microprocessors are integrated as the major components in many modern embedded devices, which offer richer features and attain higher performance than ever before. Therefore, power and thermal management have become the significant design considerations for modern embedded devices. Dynamic voltage/frequency scaling (DVFS) and dynamic power management (DPM) are two well-known hardware capabilities offered by modern embedded processors. However, the power or thermal aware performance optimization is not fully explored for the mainstream embedded processors with discrete DVFS and DPM capabilities. Many key problems have not been answered yet. What is the maximum performance that an embedded processor can achieve under power or thermal constraint for a periodic application? Does there exist an efficient algorithm for the power or thermal management problems with guaranteed quality bound? These questions are hard to be answered because the discrete settings of DVFS and DPM enhance the complexity of many power and thermal management problems, which are generally NP-hard. The dissertation presents a comprehensive study on these NP-hard power and thermal management problems for embedded processors with discrete DVFS and DPM capabilities. In the domain of power management, the dissertation addresses the power minimization problem for real-time schedules, the energy-constrained make-span minimization problem on homogeneous and heterogeneous chip multiprocessors (CMP) architectures, and the battery aware energy management problem with nonlinear battery discharging model. In the domain of thermal management, the work addresses several thermal-constrained performance maximization problems for periodic embedded applications. All the addressed problems are proved to be NP-hard or strongly NP-hard in the study. Then the work focuses on the design of the off-line optimal or polynomial time approximation algorithms as solutions in the problem design space. Several addressed NP-hard problems are tackled by dynamic programming with optimal solutions and pseudo-polynomial run time complexity. Because the optimal algorithms are not efficient in worst case, the fully polynomial time approximation algorithms are provided as more efficient solutions. Some efficient heuristic algorithms are also presented as solutions to several addressed problems. The comprehensive study answers the key questions in order to fully explore the power and thermal management potentials on embedded processors with discrete DVFS and DPM capabilities. The provided solutions enable the theoretical analysis of the maximum performance for periodic embedded applications under power or thermal constraints.

ContributorsZhang, Sushu (Author) / Chatha, Karam S (Thesis advisor) / Cao, Yu (Committee member) / Konjevod, Goran (Committee member) / Vrudhula, Sarma (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2012

Standalone mild hybrid system development and application for non-hybrid vehicles

Description

While the implementation of both mild hybrid and start-stop technology is widespread as a factory option in newer vehicles, the adaptation of hybrid technology to older or unequipped vehicles has not been fully realized. As such, a straight forward hybrid conversion system that is easily adapted to different vehicles regardless…

While the implementation of both mild hybrid and start-stop technology is widespread as a factory option in newer vehicles, the adaptation of hybrid technology to older or unequipped vehicles has not been fully realized. As such, a straight forward hybrid conversion system that is easily adapted to different vehicles regardless of drivetrain configuration, has been developed and applied to a test vehicle for less than $2,000. System performance was recorded both before and after hybridization using real world drive cycle tracking charts. The vehicle established a fuel economy baseline of 22.93 mpg, and achieved 26.58 mpg after the conversion. This corresponds to a 15.92% increase in fuel economy. Accounting for initial system costs and annual fuel saving, this corresponds to a 6-year payback period. Based on these results, it can be concluded that an inexpensive aftermarket hybrid system is both feasible and effective at improving fuel economy.

ContributorsBeeney, Tyler (Author) / Rogers, Bradley (Thesis advisor) / Madakannan, Arunachalanadar (Committee member) / Henderson, Mark (Committee member) / Arizona State University (Publisher)

Created2012

Analysis of pulsed gas metal arc welding (P-GMAW) heat input on UNS-S32101 lean duplex stainless steel

Description

This report presents the effects and analysis of the effects of Pulsed-Gas Metal Arc Welding's (P-GMAW) on Lean Duplex stainless steel. Although the welding of Duplex and Super Duplex Stainless steels have been well documented in both the laboratory and construction industry, the use of Lean Duplex has not. The…

This report presents the effects and analysis of the effects of Pulsed-Gas Metal Arc Welding's (P-GMAW) on Lean Duplex stainless steel. Although the welding of Duplex and Super Duplex Stainless steels have been well documented in both the laboratory and construction industry, the use of Lean Duplex has not. The purpose for conducting this research is to ensure that the correct Ferrite-Austenite phase balance along with the correct welding procedures are used in the creation of reactor cores for new construction nuclear power generation stations. In this project the effects of Lincoln Electrics ER-2209 GMAW wire are studied. Suggestions and improvements to the welding process are then proposed in order to increase the weldability, strength, gas selection, and ferrite count. The weldability will be measured using X-Ray photography in order to determine if any inclusions, lack of fusion, or voids are found post welding, along with welder feedback. The ferritic point count method in accordance with ASTM A562-08, is employed so that the amount of ferrite and austenite can be calculated in the same manor that is currently being used in industry. These will then be correlated to the tensile strength and impact toughness in the heat-affected zone (HAZ) of the weld based on the ASTM A923 testing method.

ContributorsCarter, Roger (Author) / Rogers, Bradley (Thesis advisor) / Gintz, Jerry (Committee member) / Georgeou, Trian (Committee member) / Arizona State University (Publisher)

Created2012

Digitally controlled average current mode buck converter

Description

During the past decade, different kinds of fancy functions are developed in portable electronic devices. This trend triggers the research of how to enhance battery lifetime to meet the requirement of fast growing demand of power in portable devices. DC-DC converter is the connection configuration between the battery and the…

During the past decade, different kinds of fancy functions are developed in portable electronic devices. This trend triggers the research of how to enhance battery lifetime to meet the requirement of fast growing demand of power in portable devices. DC-DC converter is the connection configuration between the battery and the functional circuitry. A good design of DC-DC converter will maximize the power efficiency and stabilize the power supply of following stages. As the representative of the DC-DC converter, Buck converter, which is a step down DC-DC converter that the output voltage level is smaller than the input voltage level, is the best-fit sample to start with. Digital control for DC-DC converters reduces noise sensitivity and enhances process, voltage and temperature (PVT) tolerance compared with analog control method. Also it will reduce the chip area and cost correspondingly. In battery-friendly perspective, current mode control has its advantage in over-current protection and parallel current sharing, which can form different structures to extend battery lifetime. In the thesis, the method to implement digitally average current mode control is introduced; including the FPGA based digital controller design flow. Based on the behavioral model of the close loop Buck converter with digital current control, the first FPGA based average current mode controller is burned into board and tested. With the analysis, the design metric of average current mode control is provided in the study. This will be the guideline of the parallel structure of future research.

ContributorsFu, Chao (Author) / Bakkaloglu, Bertan (Thesis advisor) / Cao, Yu (Committee member) / Vermeire, Bert (Committee member) / Arizona State University (Publisher)

Created2011

Built-in-self test of transmitter I/Q mismatch and nonlinearities using self-mixing envelope detector

Description

Built-in-Self-Test (BiST) for transmitters is a desirable choice since it eliminates the reliance on expensive instrumentation to do RF signal analysis. Existing on-chip resources, such as power or envelope detectors, or small additional circuitry can be used for BiST purposes. However, due to limited bandwidth, measurement of complex specifications, such…

Built-in-Self-Test (BiST) for transmitters is a desirable choice since it eliminates the reliance on expensive instrumentation to do RF signal analysis. Existing on-chip resources, such as power or envelope detectors, or small additional circuitry can be used for BiST purposes. However, due to limited bandwidth, measurement of complex specifications, such as IQ imbalance, is challenging. In this work, a BiST technique to compute transmitter IQ imbalances using measurements out of a self-mixing envelope detector is proposed. Both the linear and non linear parameters of the RF transmitter path are extracted successfully. We first derive an analytical expression for the output signal. Using this expression, we devise test signals to isolate the effects of gain and phase imbalance, DC offsets, time skews and system nonlinearity from other parameters of the system. Once isolated, these parameters are calculated easily with a few mathematical operations. Simulations and hardware measurements show that the technique can provide accurate characterization of IQ imbalances. One of the glaring advantages of this method is that, the impairments are extracted from analyzing the response at baseband frequency and thereby eliminating the need of high frequency ATE (Automated Test Equipment).

ContributorsByregowda, Srinath (Author) / Ozev, Sule (Thesis advisor) / Cao, Yu (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2012

Scalable surface-potential-based compact model of high-voltage LDMOS transistors

Description

Lateral Double-diffused (LDMOS) transistors are commonly used in power management, high voltage/current, and RF circuits. Their characteristics include high breakdown voltage, low on-resistance, and compatibility with standard CMOS and BiCMOS manufacturing processes. As with other semiconductor devices, an accurate and physical compact model is critical for LDMOS-based circuit design. The…

Lateral Double-diffused (LDMOS) transistors are commonly used in power management, high voltage/current, and RF circuits. Their characteristics include high breakdown voltage, low on-resistance, and compatibility with standard CMOS and BiCMOS manufacturing processes. As with other semiconductor devices, an accurate and physical compact model is critical for LDMOS-based circuit design. The goal of this research work is to advance the state-of-the-art by developing a physics-based scalable compact model of LDMOS transistors. The new model, SP-HV, is constructed from a surface-potential-based bulk MOSFET model, PSP, and a nonlinear resistor model, R3. The use of independently verified and mature sub-models leads to increased accuracy and robustness of an overall LDMOS model. Improved geometry scaling and simplified statistical modeling are other useful and practical consequences of the approach. Extensions are made to both PSP and R3 for improved modeling of LDMOS devices, and one internal node is introduced to connect the two component models. The presence of the lightly-doped drift region in LDMOS transistors causes some characteristic device effects which are usually not observed in conventional MOSFETs. These include quasi-saturation, a sharp peak in transconductance at low VD, gate capacitance exceeding oxide capacitance at positive VD, negative transcapacitances CBG and CGB at positive VD, a "double-hump" IB(VG) current and expansion effects. SP-HV models these effects accurately. It also includes a scalable self-heating model which is important to model the geometry dependence of the expansion effect. SP-HV, including its scalability, is verified extensively by comparison both to TCAD simulations and experimental data. The close agreement confirms the validity of the model structure. Circuit simulation examples are presented to demonstrate its convergence.

ContributorsYao, Wei (Author) / Gildenblat, Gennady (Thesis advisor) / Barnaby, Hugh (Committee member) / Cao, Yu (Committee member) / McAndrew, Colin (Committee member) / Arizona State University (Publisher)

Created2012

Multidimensional DFT IP generators for FPGA platforms

Description

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns.…

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.

ContributorsYu, Chi-Li (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2012

Energy and quality-aware multimedia signal processing

Description

Today's mobile devices have to support computation-intensive multimedia applications with a limited energy budget. In this dissertation, we present architecture level and algorithm-level techniques that reduce energy consumption of these devices with minimal impact on system quality. First, we present novel techniques to mitigate the effects of SRAM memory failures…

Today's mobile devices have to support computation-intensive multimedia applications with a limited energy budget. In this dissertation, we present architecture level and algorithm-level techniques that reduce energy consumption of these devices with minimal impact on system quality. First, we present novel techniques to mitigate the effects of SRAM memory failures in JPEG2000 implementations operating in scaled voltages. We investigate error control coding schemes and propose an unequal error protection scheme tailored for JPEG2000 that reduces overhead without affecting the performance. Furthermore, we propose algorithm-specific techniques for error compensation that exploit the fact that in JPEG2000 the discrete wavelet transform outputs have larger values for low frequency subband coefficients and smaller values for high frequency subband coefficients. Next, we present use of voltage overscaling to reduce the data-path power consumption of JPEG codecs. We propose an algorithm-specific technique which exploits the characteristics of the quantized coefficients after zig-zag scan to mitigate errors introduced by aggressive voltage scaling. Third, we investigate the effect of reducing dynamic range for datapath energy reduction. We analyze the effect of truncation error and propose a scheme that estimates the mean value of the truncation error during the pre-computation stage and compensates for this error. Such a scheme is very effective for reducing the noise power in applications that are dominated by additions and multiplications such as FIR filter and transform computation. We also present a novel sum of absolute difference (SAD) scheme that is based on most significant bit truncation. The proposed scheme exploits the fact that most of the absolute difference (AD) calculations result in small values, and most of the large AD values do not contribute to the SAD values of the blocks that are selected. Such a scheme is highly effective in reducing the energy consumption of motion estimation and intra-prediction kernels in video codecs. Finally, we present several hybrid energy-saving techniques based on combination of voltage scaling, computation reduction and dynamic range reduction that further reduce the energy consumption while keeping the performance degradation very low. For instance, a combination of computation reduction and dynamic range reduction for Discrete Cosine Transform shows on average, 33% to 46% reduction in energy consumption while incurring only 0.5dB to 1.5dB loss in PSNR.

ContributorsEmre, Yunus (Author) / Chakrabarti, Chaitali (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Cao, Yu (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2012

45-nm radiation hardened cache design

Description

Circuits on smaller technology nodes become more vulnerable to radiation-induced upset. Since this is a major problem for electronic circuits used in space applications, designers have a variety of solutions in hand. Radiation hardening by design (RHBD) is an approach, where electronic components are designed to work properly in certain…

Circuits on smaller technology nodes become more vulnerable to radiation-induced upset. Since this is a major problem for electronic circuits used in space applications, designers have a variety of solutions in hand. Radiation hardening by design (RHBD) is an approach, where electronic components are designed to work properly in certain radiation environments without the use of special fabrication processes. This work focuses on the cache design for a high performance microprocessor. The design tries to mitigate radiation effects like SEE, on a commercial foundry 45 nm SOI process. The design has been ported from a previously done cache design at the 90 nm process node. The cache design is a 16 KB, 4 way set associative, write-through design that uses a no-write allocate policy. The cache has been tested to write and read at above 2 GHz at VDD = 0.9 V. Interleaved layout, parity protection, dual redundancy, and checking circuits are used in the design to achieve radiation hardness. High speed is accomplished through the use of dynamic circuits and short wiring routes wherever possible. Gated clocks and optimized wire connections are used to reduce power. Structured methodology is used to build up the entire cache.

ContributorsXavier, Jerin (Author) / Clark, Lawrence T (Thesis advisor) / Cao, Yu (Committee member) / Allee, David R. (Committee member) / Arizona State University (Publisher)

Created2012

Compact modeling and simulation for digital circuit aging

Description

Negative bias temperature instability (NBTI) is a leading aging mechanism in modern digital and analog circuits. Recent NBTI data exhibits an excessive amount of randomness and fast recovery, which are difficult to be handled by conventional power-law model (tn). Such discrepancies further pose the challenge on long-term reliability prediction under…

Negative bias temperature instability (NBTI) is a leading aging mechanism in modern digital and analog circuits. Recent NBTI data exhibits an excessive amount of randomness and fast recovery, which are difficult to be handled by conventional power-law model (tn). Such discrepancies further pose the challenge on long-term reliability prediction under statistical variations and Dynamic Voltage Scaling (DVS) in real circuit operation. To overcome these barriers, the modeling effort in this work (1) practically explains the aging statistics due to randomness in number of traps with log(t) model, accurately predicting the mean and variance shift; (2) proposes cycle-to-cycle model (from the first-principle of trapping) to handle aging under multiple supply voltages, predicting the non-monotonic behavior under DVS (3) presents a long-term model to estimate a tight upper bound of dynamic aging over multiple cycles, and (4) comprehensively validates the new set of aging models with 65nm statistical silicon data. Compared to previous models, the new set of aging models capture the aging variability and the essential role of the recovery phase under DVS, reducing unnecessary guard-banding during the design stage. With CMOS technology scaling, design for reliability has become an important step in the design cycle, and increased the need for efficient and accurate aging simulation methods during the design stage. NBTI induced delay shifts in logic paths are asymmetric in nature, as opposed to averaging effect due to recovery assumed in traditional aging analysis. Timing violations due to aging, in particular, are very sensitive to the standby operation regime of a digital circuit. In this report, by identifying the critical moments in circuit operation and considering the asymmetric aging effects, timing violations under NBTI effect are correctly predicted. The unique contributions of the simulation flow include: (1) accurate modeling of aging induced delay shift due to threshold voltage (Vth) shift using only the delay dependence on supply voltage from cell library; (2) simulation flow for asymmetric aging analysis is proposed and conducted at critical points in circuit operation; (3) setup and hold timing violations due to NBTI aging in logic and clock buffer are investigated in sequential circuits and (4) proposed framework is tested in VLSI applications such DDR memory circuits. This methodology is comprehensively demonstrated with ISCAS89 benchmark circuits using a 45nm Nangate standard cell library characterized using predictive technology models. Our proposed design margin assessment provides design insights and enables resilient techniques for mitigating digital circuit aging.

ContributorsVelamala, Jyothi Bhaskarr (Author) / Cao, Yu (Thesis advisor) / Clark, Lawrence (Committee member) / Chakrabarti, Chaitali (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2012