Search Content

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

Unified framework for energy-proportional computing in multicore processors: novel algorithms and practical implementation

Description

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need…

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need for dynamic energy management (DEM), much more than for single-core processors, as DEM for multi-cores is no more a mechanism just to ensure that a processor is kept under specified temperature limits, but also a set of techniques that manage various processor controls like dynamic voltage and frequency scaling (DVFS), task migration, fan speed, etc. to achieve a stated objective. The objectives span a wide range from maximizing throughput, minimizing power consumption, reducing peak temperature, maximizing energy efficiency, maximizing processor reliability, and so on, along with much more wider constraints of temperature, power, timing, and reliability constraints. Thus DEM can be very complex and challenging to achieve. Since often times many DEMs operate together on a single processor, there is a need to unify various DEM techniques. This dissertation address such a need. In this work, a framework for DEM is proposed that provides a unifying processor model that includes processor power, thermal, timing, and reliability models, supports various DEM control mechanisms, many different objective functions along with equally diverse constraint specifications. Using the framework, a range of novel solutions is derived for instances of DEM problems, that include maximizing processor performance, energy efficiency, or minimizing power consumption, peak temperature under constraints of maximum temperature, memory reliability and task deadlines. Finally, a robust closed-loop controller to implement the above solutions on a real processor platform with a very low operational overhead is proposed. Along with the controller design, a model identification methodology for obtaining the required power and thermal models for the controller is also discussed. The controller is architecture independent and hence easily portable across many platforms. The controller has been successfully deployed on Intel Sandy Bridge processor and the use of the controller has increased the energy efficiency of the processor by over 30%

ContributorsHanumaiah, Vinay (Author) / Vrudhula, Sarma (Thesis advisor) / Chatha, Karamvir (Committee member) / Chakrabarti, Chaitali (Committee member) / Rodriguez, Armando (Committee member) / Askin, Ronald (Committee member) / Arizona State University (Publisher)

Created2013

Power management interface circuit for MEMS (Micro-Electro-Mechanical-Systems) bio-sensing and chemical sensing applications

Description

Power supply management is important for MEMS (Micro-Electro-Mechanical-Systems) bio-sensing and chemical sensing applications. The dissertation focuses on discussion of accessibility to different power sources and supply tuning in sensing applications. First, the dissertation presents a high efficiency DC-DC converter for a miniaturized Microbial Fuel Cell (MFC). The miniaturized MFC produces…

Power supply management is important for MEMS (Micro-Electro-Mechanical-Systems) bio-sensing and chemical sensing applications. The dissertation focuses on discussion of accessibility to different power sources and supply tuning in sensing applications. First, the dissertation presents a high efficiency DC-DC converter for a miniaturized Microbial Fuel Cell (MFC). The miniaturized MFC produces up to approximately 10µW with an output voltage of 0.4-0.7V. Such a low voltage, which is also load dependent, prevents the MFC to directly drive low power electronics. A PFM (Pulse Frequency Modulation) type DC-DC converter in DCM (Discontinuous Conduction Mode) is developed to address the challenges and provides a load independent output voltage with high conversion efficiency. The DC-DC converter, implemented in UMC 0.18µm technology, has been thoroughly characterized, coupled with the MFC. At 0.9V output, the converter has a peak efficiency of 85% with 9µW load, highest efficiency over prior publication. Energy could be harvested wirelessly and often has profound impacts on system performance. The dissertation reports a side-by-side comparison of two wireless and passive sensing systems: inductive and electromagnetic (EM) couplings for an application of in-situ and real-time monitoring of wafer cleanliness in semiconductor facilities. The wireless system, containing the MEMS sensor works with battery-free operations. Two wireless systems based on inductive and EM couplings have been implemented. The working distance of the inductive coupling system is limited by signal-to-noise-ratio (SNR) while that of the EM coupling is limited by the coupled power. The implemented on-wafer transponders achieve a working distance of 6 cm and 25 cm with a concentration resolution of less than 2% (4 ppb for a 200 ppb solution) for inductive and EM couplings, respectively. Finally, the supply tuning is presented in bio-sensing application to mitigate temperature sensitivity. The FBAR (film bulk acoustic resonator) based oscillator is an attractive method in label-free sensing application. Molecular interactions on FBAR surface induce mass change, which results in resonant frequency shift of FBAR. While FBAR has a high-Q to be sensitive to the molecular interactions, FBAR has finite temperature sensitivity. A temperature compensation technique is presented that improves the temperature coefficient of a 1.625 GHz FBAR-based oscillator from -118 ppm/K to less than 1 ppm/K by tuning the supply voltage of the oscillator. The tuning technique adds no additional component and has a large frequency tunability of -4305 ppm/V.

ContributorsZhang, Xu (Author) / Chae, Junseok (Thesis advisor) / Kiaei, Sayfe (Committee member) / Bakkaloglu, Bertan (Committee member) / Kozicki, Michael (Committee member) / Phillips, Stephen (Committee member) / Arizona State University (Publisher)

Created2012

Power-Performance Modeling and Adaptive Management of Heterogeneous Mobile Platforms

Description

Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by…

Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful SoC with numerous other resources, including display, memory, power management IC, battery and wireless modems. Furthermore, the SoC itself is a heterogeneous resource that integrates many processing elements, such as CPU cores, GPU, video, image, and audio processors. Therefore, CPU cores do not dominate the platform power consumption under many application scenarios.

Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads.

ContributorsGupta, Ujjwala (Author) / Ogras, Umit Y. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kishinevsky, Michael (Committee member) / Dutt, Nikil (Committee member) / Arizona State University (Publisher)

Created2018

Power management IC for single solar cell

Description

A single solar cell provides close to 0.5 V output at its maximum power point, which is very

low for any electronic circuit to operate. To get rid of this problem, traditionally multiple

solar cells are connected in series to get higher voltage. The disadvantage of this approach

is the efficiency loss for…

A single solar cell provides close to 0.5 V output at its maximum power point, which is very

low for any electronic circuit to operate. To get rid of this problem, traditionally multiple

solar cells are connected in series to get higher voltage. The disadvantage of this approach

is the efficiency loss for partial shading or mismatch. Even as low as 6-7% of shading can

result in more than 90% power loss. Therefore, Maximum Power Point Tracking (MPPT)

at single solar cell level is the most efficient way to extract power from solar cell.

Power Management IC (MPIC) used to extract power from single solar cell, needs to

start at 0.3 V input. MPPT circuitry should be implemented with minimal power and area

overhead. To start the PMIC at 0.3 V, a switch capacitor charge pump is utilized as an

auxiliary start up circuit for generating a regulated 1.8 V auxiliary supply from 0.3 V input.

The auxiliary supply powers up a MPPT converter followed by a regulated converter. At

the start up both the converters operate at 100 kHz clock with 80% duty cycle and system

output voltage starts rising. When the system output crosses 2.7 V, the auxiliary start up

circuit is turned off and the supply voltage for both the converters is derived from the system

output itself. In steady-state condition the system output is regulated to 3.0 V.

A fully integrated analog MPPT technique is proposed to extract maximum power from

the solar cell. This technique does not require Analog to Digital Converter (ADC) and

Digital Signal Processor (DSP), thus reduces area and power overhead. The proposed

MPPT techniques includes a switch capacitor based power sensor which senses current of

boost converter without using any sense resistor. A complete system is designed which

starts from 0.3 V solar cell voltage and provides regulated 3.0 V system output.

ContributorsSingh, Shrikant (Author) / Kiaei, Sayfe (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Kitchen, Jennifer (Committee member) / Arizona State University (Publisher)

Created2015

An Inductor Emulator Approach to Peak Current-mode Control in a 4-Phase Buck Regulator

Description

High-efficiency DC-DC converters make up one of the important blocks of state-of-the-art power supplies. The trend toward high level of transistor integration has caused load current demands to grow significantly. Supplying high output current and minimizing output current ripple has been a driving force behind the evolution of Multi-phase topologies.…

High-efficiency DC-DC converters make up one of the important blocks of state-of-the-art power supplies. The trend toward high level of transistor integration has caused load current demands to grow significantly. Supplying high output current and minimizing output current ripple has been a driving force behind the evolution of Multi-phase topologies. Ability to supply large output current with improved efficiency, reduction in the size of filter components, improved transient response make multi-phase topologies a preferred choice for low voltage-high current applications.

Current sensing capability inside a system is much sought after for applications which include Peak-current mode control, Current limiting, Overload protection. Current sensing is extremely important for current sharing in Multi-phase topologies. Existing approaches such as Series resistor, SenseFET, inductor DCR based current sensing are simple but their drawbacks such low efficiency, low accuracy, limited bandwidth demand a novel current sensing scheme.

This research presents a systematic design procedure of a 5V - 1.8V, 8A 4-Phase Buck regulator with a novel current sensing scheme based on replication of the inductor current. The proposed solution consists of detailed system modeling in PLECS which includes modification of the peak current mode model to accommodate the new current sensing element, derivation of power-stage and Plant transfer functions, Controller design. The proposed model has been verified through PLECS simulations and compared with a transistor-level implementation of the system. The time-domain parameters such as overshoot and settling-time simulated through transistor-level

implementation is in close agreement with the results obtained from the PLECS model.

ContributorsBurli, Venkatesh (Author) / Bakkaloglu, Bertan (Thesis advisor) / Garrity, Douglas (Committee member) / Kitchen, Jennifer (Committee member) / Arizona State University (Publisher)

Created2017

Monitoring for Reliable and Secure Power Management Integrated Circuits via Built-In Self-Test

Description

Power management circuits are employed in most electronic integrated systems, including applications for automotive, IoT, and smart wearables. Oftentimes, these power management circuits become a single point of system failure, and since they are present in most modern electronic devices, they become a target for hardware security attacks. Digital circuits…

Power management circuits are employed in most electronic integrated systems, including applications for automotive, IoT, and smart wearables. Oftentimes, these power management circuits become a single point of system failure, and since they are present in most modern electronic devices, they become a target for hardware security attacks. Digital circuits are typically more prone to security attacks compared to analog circuits, but malfunctions in digital circuitry can affect the analog performance/parameters of power management circuits. This research studies the effect that these hacks will have on the analog performance of power circuits, specifically linear and switching power regulators/converters. Apart from security attacks, these circuits suffer from performance degradations due to temperature, aging, and load stress. Power management circuits usually consist of regulators or converters that regulate the load’s voltage supply by employing a feedback loop, and the stability of the feedback loop is a critical parameter in the system design. Oftentimes, the passive components employed in these circuits shift in value over varying conditions and may cause instability within the power converter. Therefore, variations in the passive components, as well as malicious hardware security attacks, can degrade regulator performance and affect the system’s stability. The traditional ways of detecting phase margin, which indicates system stability, employ techniques that require the converter to be in open loop, and hence can’t be used while the system is deployed in-the-field under normal operation. Aging of components and security attacks may occur after the power management systems have completed post-production test and have been deployed, and they may not cause catastrophic failure of the system, hence making them difficult to detect. These two issues of component variations and security attacks can be detected during normal operation over the product lifetime, if the frequency response of the power converter can be monitored in-situ and in-field. This work presents a method to monitor the phase margin (stability) of a power converter without affecting its normal mode of operation by injecting a white noise/ pseudo random binary sequence (PRBS). Furthermore, this work investigates the analog performance parameters, including phase margin, that are affected by various digital hacks on the control circuitry associated with power converters. A case study of potential hardware attacks is completed for a linear low-dropout regulator (LDO).

ContributorsMalakar, Pragya Priya (Author) / Kitchen, Jennifer (Thesis advisor) / Ozev, Sule (Committee member) / Brunhaver, John (Committee member) / Arizona State University (Publisher)

Created2019

Power, Performance, and Energy Management of Heterogeneous Architectures

Description

Many core modern multiprocessor systems-on-chip offers tremendous power and performance

optimization opportunities by tuning thousands of potential voltage, frequency

and core configurations. Applications running on these architectures are becoming increasingly

complex. As the basic building blocks, which make up the application, change during

runtime, different configurations may become optimal with respect to power, performance

or…

Many core modern multiprocessor systems-on-chip offers tremendous power and performance

optimization opportunities by tuning thousands of potential voltage, frequency

and core configurations. Applications running on these architectures are becoming increasingly

complex. As the basic building blocks, which make up the application, change during

runtime, different configurations may become optimal with respect to power, performance

or other metrics. Identifying the optimal configuration at runtime is a daunting task due

to a large number of workloads and configurations. Therefore, there is a strong need to

evaluate the metrics of interest as a function of the supported configurations.

This thesis focuses on two different types of modern multiprocessor systems-on-chip

(SoC): Mobile heterogeneous systems and tile based Intel Xeon Phi architecture.

For mobile heterogeneous systems, this thesis presents a novel methodology that can

accurately instrument different types of applications with specific performance monitoring

calls. These calls provide a rich set of performance statistics at a basic block level while the

application runs on the target platform. The target architecture used for this work (Odroid

XU3) is capable of running at 4940 different frequency and core combinations. With the

help of instrumented application vast amount of characterization data is collected that provides

details about performance, power and CPU state at every instrumented basic block

across 19 different types of applications. The vast amount of data collected has enabled

two runtime schemes. The first work provides a methodology to find optimal configurations

in heterogeneous architecture using classifiers and demonstrates an average increase

of 93%, 81% and 6% in performance per watt compared to the interactive, ondemand and

powersave governors, respectively. The second work using same data shows a novel imitation

learning framework for dynamically controlling the type, number, and the frequencies

of active cores to achieve an average of 109% PPW improvement compared to the default

governors.

This work also presents how to accurately profile tile based Intel Xeon Phi architecture

while training different types of neural networks using open image dataset on deep learning

framework. The data collected allows deep exploratory analysis. It also showcases how

different hardware parameters affect performance of Xeon Phi.

ContributorsPatil, Chetan Arvind (Author) / Ogras, Umit Y. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by