Search Content

System Identification, Diagnosis, and Built-In Self-Test of High Switching Frequency DC-DC Converters

Description

Complex electronic systems include multiple power domains and drastically varying dynamic power consumption patterns, requiring the use of multiple power conversion and regulation units. High frequency switching converters have been gaining prominence in the DC-DC converter market due to smaller solution size (higher power density) and higher efficiency. As the…

Complex electronic systems include multiple power domains and drastically varying dynamic power consumption patterns, requiring the use of multiple power conversion and regulation units. High frequency switching converters have been gaining prominence in the DC-DC converter market due to smaller solution size (higher power density) and higher efficiency. As the filter components become smaller in value and size, they are unfortunately also subject to higher process variations and worse degradation profiles jeopardizing stable operation of the power supply. This dissertation presents techniques to track changes in the dynamic loop characteristics of the DC-DC converters without disturbing the normal mode of operation. A digital pseudo-noise (PN) based stimulus is used to excite the DC-DC system at various circuit nodes to calculate the corresponding closed-loop impulse response. The test signal energy is spread over a wide bandwidth and the signal analysis is achieved by correlating the PN input sequence with the disturbed output generated, thereby

accumulating the desired behavior over time. A mixed-signal cross-correlation circuit is used to derive on-chip impulse responses, with smaller memory and lower computational requirement in comparison to a digital correlator approach. Model reference based parametric and non-parametric techniques are discussed to analyze the impulse response results in both time and frequency domain. The proposed techniques can extract open-loop phase margin and closed-loop unity-gain frequency within 5.2% and 4.1% error, respectively, for the load current range of 30-200mA. Converter parameters such as natural frequency (ω_n ), quality factor (Q), and center frequency (ω_c ) can be estimated within 3.6%, 4.7%, and 3.8% error respectively, over load inductance of 4.7-10.3µH, and filter capacitance of 200-400nF. A 5-MHz switching frequency, 5-8.125V input voltage range, voltage-mode controlled DC-DC buck converter is designed for the proposed built-in self-test (BIST) analysis. The converter output voltage range is 3.3-5V and the supported maximum

load current is 450mA. The peak efficiency of the converter is 87.93%. The proposed converter is fabricated on a 0.6µm 6-layer-metal Silicon-On-Insulator (SOI) technology with a die area of 9mm^2 . The area impact due to the system identification blocks including related I/O structures is 3.8% and they consume 530µA quiescent current during operation.

ContributorsBeohar, Navankur (Author) / Bakkaloglu, Bertan (Thesis advisor) / Ozev, Sule (Committee member) / Ayyanar, Raja (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2017

High Performance Power Management Integrated Circuits for Portable Devices

Description

Portable devices often require multiple power management IC (PMIC) to power different sub-modules, Li-ion batteries are well suited for portable devices because of its small size, high energy density and long life cycle. Since Li-ion battery is the major power source for portable device, fast and high-efficiency battery charging solution…

Portable devices often require multiple power management IC (PMIC) to power different sub-modules, Li-ion batteries are well suited for portable devices because of its small size, high energy density and long life cycle. Since Li-ion battery is the major power source for portable device, fast and high-efficiency battery charging solution has become a major requirement in portable device application.

In the first part of dissertation, a high performance Li-ion switching battery charger is proposed. Cascaded two loop (CTL) control architecture is used for seamless CC-CV transition, time based technique is utilized to minimize controller area and power consumption. Time domain controller is implemented by using voltage controlled oscillator (VCO) and voltage controlled delay line (VCDL). Several efficiency improvement techniques such as segmented power-FET, quasi-zero voltage switching (QZVS) and switching frequency reduction are proposed. The proposed switching battery charger is able to provide maximum 2 A charging current and has an peak efficiency of 93.3%. By configure the charger as boost converter, the charger is able to provide maximum 1.5 A charging current while achieving 96.3% peak efficiency.

The second part of dissertation presents a digital low dropout regulator (DLDO) for system on a chip (SoC) in portable devices application. The proposed DLDO achieve fast transient settling time, lower undershoot/overshoot and higher PSR performance compared to state of the art. By having a good PSR performance, the proposed DLDO is able to power mixed signal load. To achieve a fast load transient response, a load transient detector (LTD) enables boost mode operation of the digital PI controller. The boost mode operation achieves sub microsecond settling time, and reduces the settling time by 50% to 250 ns, undershoot/overshoot by 35% to 250 mV and 17% to 125 mV without compromising the system stability.

ContributorsLim, Chai Yong (Author) / Kiaei, Sayfe (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Ogras, Umit Y. (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2018

Ultra-low Quiescent Current NMOS Low Dropout Regulator With Fast Transient response for Always-On Internet-of-Things Applications

Description

The increased adoption of Internet-of-Things (IoT) for various applications like smart home, industrial automation, connected vehicles, medical instrumentation, etc. has resulted in a large scale distributed network of sensors, accompanied by their power supply regulator modules, control and data transfer circuitry. Depending on the application, the sensor location can be…

The increased adoption of Internet-of-Things (IoT) for various applications like smart home, industrial automation, connected vehicles, medical instrumentation, etc. has resulted in a large scale distributed network of sensors, accompanied by their power supply regulator modules, control and data transfer circuitry. Depending on the application, the sensor location can be virtually anywhere and therefore they are typically powered by a localized battery. To ensure long battery-life without replacement, the power consumption of the sensor nodes, the supply regulator and, control and data transmission unit, needs to be very low. Reduction in power consumption in the sensor, control and data transmission is typically done by duty-cycled operation such that they are on periodically only for short bursts of time or turn on only based on a trigger event and are otherwise powered down. These approaches reduce their power consumption significantly and therefore the overall system power is dominated by the consumption in the always-on supply regulator.

Besides having low power consumption, supply regulators for such IoT systems also need to have fast transient response to load current changes during a duty-cycled operation. Supply regulation using low quiescent current low dropout (LDO) regulators helps in extending the battery life of such power aware always-on applications with very long standby time. To serve as a supply regulator for such applications, a 1.24 µA quiescent current NMOS low dropout (LDO) is presented in this dissertation. This LDO uses a hybrid bias current generator (HBCG) to boost its bias current and improve the transient response. A scalable bias-current error amplifier with an on-demand buffer drives the NMOS pass device. The error amplifier is powered with an integrated dynamic frequency charge pump to ensure low dropout voltage. A low-power relaxation oscillator (LPRO) generates the charge pump clocks. Switched-capacitor pole tracking (SCPT) compensation scheme is proposed to ensure stability up to maximum load current of 150 mA for a low-ESR output capacitor range of 1 - 47µF. Designed in a 0.25 µm CMOS process, the LDO has an output voltage range of 1V – 3V, a dropout voltage of 240 mV, and a core area of 0.11 mm2.

ContributorsMagod Ramakrishna, Raveesh (Author) / Bakkaloglu, Bertan (Thesis advisor) / Garrity, Douglas (Committee member) / Kitchen, Jennifer (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2018

Semiconductor Memory Applications in Radiation Environment, Hardware Security and Machine Learning System

Description

Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications.

In the radiation environment, e.g. aerospace, the memory devices face different…

Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications.

In the radiation environment, e.g. aerospace, the memory devices face different energetic particles. The strike of these energetic particles can generate electron-hole pairs (directly or indirectly) as they pass through the semiconductor device, resulting in photo-induced current, and may change the memory state. First, the trend of radiation effects of the mainstream memory technologies with technology node scaling is reviewed. Then, single event effects of the oxide based resistive switching random memory (RRAM), one of eNVM technologies, is investigated from the circuit-level to the system level.

Physical Unclonable Function (PUF) has been widely investigated as a promising hardware security primitive, which employs the inherent randomness in a physical system (e.g. the intrinsic semiconductor manufacturing variability). In the dissertation, two RRAM-based PUF implementations are proposed for cryptographic key generation (weak PUF) and device authentication (strong PUF), respectively. The performance of the RRAM PUFs are evaluated with experiment and simulation. The impact of non-ideal circuit effects on the performance of the PUFs is also investigated and optimization strategies are proposed to solve the non-ideal effects. Besides, the security resistance against modeling and machine learning attacks is analyzed as well.

Deep neural networks (DNNs) have shown remarkable improvements in various intelligent applications such as image classification, speech classification and object localization and detection. Increasing efforts have been devoted to develop hardware accelerators. In this dissertation, two types of compute-in-memory (CIM) based hardware accelerator designs with SRAM and eNVM technologies are proposed for two binary neural networks, i.e. hybrid BNN (HBNN) and XNOR-BNN, respectively, which are explored for the hardware resource-limited platforms, e.g. edge devices.. These designs feature with high the throughput, scalability, low latency and high energy efficiency. Finally, we have successfully taped-out and validated the proposed designs with SRAM technology in TSMC 65 nm.

Overall, this dissertation paves the paths for memory technologies’ new applications towards the secure and energy-efficient artificial intelligence system.

ContributorsLiu, Rui (Author) / Yu, Shimeng (Thesis advisor, Committee member) / Cao, Yu (Committee member) / Barnaby, Hugh (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2018

Algorithm and Hardware Design for Efficient Deep Learning Inference

Description

Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data,…

Deep learning (DL) has proved itself be one of the most important developements till date with far reaching impacts in numerous fields like robotics, computer vision, surveillance, speech processing, machine translation, finance, etc. They are now widely used for countless applications because of their ability to generalize real world data, robustness to noise in previously unseen data and high inference accuracy. With the ability to learn useful features from raw sensor data, deep learning algorithms have out-performed tradinal AI algorithms and pushed the boundaries of what can be achieved with AI. In this work, we demonstrate the power of deep learning by developing a neural network to automatically detect cough instances from audio recorded in un-constrained environments. For this, 24 hours long recordings from 9 dierent patients is collected and carefully labeled by medical personel. A pre-processing algorithm is proposed to convert event based cough dataset to a more informative dataset with start and end of coughs and also introduce data augmentation for regularizing the training procedure. The proposed neural network achieves 92.3% leave-one-out accuracy on data captured in real world.

Deep neural networks are composed of multiple layers that are compute/memory intensive. This makes it difficult to execute these algorithms real-time with low power consumption using existing general purpose computers. In this work, we propose hardware accelerators for a traditional AI algorithm based on random forest trees and two representative deep convolutional neural networks (AlexNet and VGG). With the proposed acceleration techniques, ~ 30x performance improvement was achieved compared to CPU for random forest trees. For deep CNNS, we demonstrate that much higher performance can be achieved with architecture space exploration using any optimization algorithms with system level performance and area models for hardware primitives as inputs and goal of minimizing latency with given resource constraints. With this method, ~30GOPs performance was achieved for Stratix V FPGA boards.

Hardware acceleration of DL algorithms alone is not always the most ecient way and sucient to achieve desired performance. There is a huge headroom available for performance improvement provided the algorithms are designed keeping in mind the hardware limitations and bottlenecks. This work achieves hardware-software co-optimization for Non-Maximal Suppression (NMS) algorithm. Using the proposed algorithmic changes and hardware architecture

With CMOS scaling coming to an end and increasing memory bandwidth bottlenecks, CMOS based system might not scale enough to accommodate requirements of more complicated and deeper neural networks in future. In this work, we explore RRAM crossbars and arrays as compact, high performing and energy efficient alternative to CMOS accelerators for deep learning training and inference. We propose and implement RRAM periphery read and write circuits and achieved ~3000x performance improvement in online dictionary learning compared to CPU.

This work also examines the realistic RRAM devices and their non-idealities. We do an in-depth study of the effects of RRAM non-idealities on inference accuracy when a pretrained model is mapped to RRAM based accelerators. To mitigate this issue, we propose Random Sparse Adaptation (RSA), a novel scheme aimed at tuning the model to take care of the faults of the RRAM array on which it is mapped. Our proposed method can achieve inference accuracy much higher than what traditional Read-Verify-Write (R-V-W) method could achieve. RSA can also recover lost inference accuracy 100x ~ 1000x faster compared to R-V-W. Using 32-bit high precision RSA cells, we achieved ~10% higher accuracy using fautly RRAM arrays compared to what can be achieved by mapping a deep network to an 32 level RRAM array with no variations.

ContributorsMohanty, Abinash (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2018

Wide input common-mode range fully integrated low-dropout voltage regulators

Description

The modern era of consumer electronics is dominated by compact, portable, affordable smartphones and wearable computing devices. Power management integrated circuits (PMICs) play a crucial role in on-chip power management, extending battery life and efficiency of integrated analog, radio-frequency (RF), and mixed-signal cores. Low-dropout (LDO) regulators are commonly used to…

The modern era of consumer electronics is dominated by compact, portable, affordable smartphones and wearable computing devices. Power management integrated circuits (PMICs) play a crucial role in on-chip power management, extending battery life and efficiency of integrated analog, radio-frequency (RF), and mixed-signal cores. Low-dropout (LDO) regulators are commonly used to provide clean supply for low voltage integrated circuits, where point-of-load regulation is important. In System-On-Chip (SoC) applications, digital circuits can change their mode of operation regularly at a very high speed, imposing various load transient conditions for the regulator. These quick changes of load create a glitch in LDO output voltage, which hamper performance of the digital circuits unfavorably. For an LDO designer, minimizing output voltage variation and speeding up voltage glitch settling is an important task.

The presented research introduces two fully integrated LDO voltage regulators for SoC applications. N-type Metal-Oxide-Semiconductor (NMOS) power transistor based operation achieves high bandwidth owing to the source follower configuration of the regulation loop. A low input impedance and high output impedance error amplifier ensures wide regulation loop bandwidth and high gain. Current-reused dynamic biasing technique has been employed to increase slew-rate at the gate of power transistor during full-load variations, by a factor of two. Three design variations for a 1-1.8 V, 50 mA NMOS LDO voltage regulator have been implemented in a 180 nm Mixed-mode/RF process. The whole LDO core consumes 0.130 mA of nominal quiescent ground current at 50 mA load and occupies 0.21 mm x mm. LDO has a dropout voltage of 200 mV and is able to recover in 30 ns from a 65 mV of undershoot for 0-50 pF of on-chip load capacitance.

ContributorsDesai, Chirag (Author) / Kiaei, Sayfe (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2016

Energy-efficient digital circuit design using threshold logic gates

Description

Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have…

Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical.

The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation.

Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR.

Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths.

Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.

ContributorsKulkarni, Niranjan (Author) / Vrudhula, Sarma (Thesis advisor) / Colbourn, Charles (Committee member) / Seo, Jae-Sun (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)

Created2015

Mixed-mode adaptive ripple canceller for switching regulators

Description

State of art modern System-On-Chip architectures often require very low noise supplies without overhead on high efficiencies. Low noise supplies are especially important in noise sensitive analog blocks such as high precision Analog-to-Digital Converters, Phase Locked Loops etc., and analog signal processing blocks. Switching regulators, while providing high efficiency power…

State of art modern System-On-Chip architectures often require very low noise supplies without overhead on high efficiencies. Low noise supplies are especially important in noise sensitive analog blocks such as high precision Analog-to-Digital Converters, Phase Locked Loops etc., and analog signal processing blocks. Switching regulators, while providing high efficiency power conversion suffer from inherent ripple on their output. A typical solution for high efficiency low noise supply is to cascade switching regulators with Low Dropout linear regulators (LDO) which generate inherently quiet supplies. The switching frequencies of switching regulators keep scaling to higher values in order to reduce the sizes of the passive inductor and capacitors at the output of switching regulators. This poses a challenge for existing solutions of switching regulators followed by LDO since the Power Supply Rejection (PSR) of LDOs are band-limited. In order to achieve high PSR over a wideband, the penalty would be to increase the quiescent power consumed to increase the bandwidth of the LDO and increase in solution area of the LDO. Hence, an alternative to the existing approach is required which improves the ripple cancellation at the output of switching regulator while overcoming the deficiencies of the LDO.

This research focuses on developing an innovative technique to cancel the ripple at the output of switching regulator which is scalable across a wide range of switching frequencies. The proposed technique consists of a primary ripple canceller and an auxiliary ripple canceller, both of which facilitate in the generation of a quiet supply and help to attenuate the ripple at the output of buck converter by over 22dB. These techniques can be applied to any DC-DC converter and are scalable across frequency, load current, output voltage as compared to LDO without significant overhead on efficiency or area. The proposed technique also presents a fully integrated solution without the need of additional off-chip components which, considering the push for full-integration of Power Management Integrated Circuits, is a big advantage over using LDOs.

ContributorsJoshi, Kishan (Author) / Bakkaloglu, Bertan (Thesis advisor) / Garrity, Douglas (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2016

The feasibility of domain specific compilation for spatially programmable architectures

Description

Integrated circuits must be energy efficient. This efficiency affects all aspects of chip design, from the battery life of embedded devices to thermal heating on high performance servers. As technology scaling slows, future generations of transistors will lack the energy efficiency gains as it has had in previous generations. Therefore,…

Integrated circuits must be energy efficient. This efficiency affects all aspects of chip design, from the battery life of embedded devices to thermal heating on high performance servers. As technology scaling slows, future generations of transistors will lack the energy efficiency gains as it has had in previous generations. Therefore, other sources of energy efficiency will be much more important. Many computations have the potential to be executed for extreme energy efficiency but are not instigated because the platforms they run on are not optimized for efficient execution. ASICs improve energy efficiency by reducing flexibility and leveraging the properties of a specific computation. However, ASICs are fixed in function and therefore have incredible opportunity cost. FPGAs offer a reconfigurable solution but are 25x less energy efficient than ASIC implementation. Spatially programmable architectures (SPAs) are similar in design and structure to ASICs and FPGAs but are able bridge the ASIC-FPGA energy efficiency gap by trading flexibility for efficiency. However, SPAs are difficult to program because they do not share the same programming model as normal architectures that execute in time. This work addresses compiler challenges for coarse grained, locally interconnected SPA for domain efficiency (SPADE). A novel SPADE topology, called the wave pipeline, is introduced that is designed for the image signal processing domain that is both efficient and simple to compile to. A compiler for the wave pipeline is created that solves for maximum energy and area efficiency using low complexity, greedy methods. The wave pipeline topology and compiler allow for us to investigate and experiment with image signal processing applications to prove the feasibility of SPADE compilers.

ContributorsMackay, Curtis (Author) / Brunhaver, John (Thesis advisor) / Karam, Lina J (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2016

Visual Saliency Application in Object Detection for Search Space Reduction

Description

Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount…

Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount of sensory information from the retina – estimated at up to 100 Mbps per optic nerve. Parallel processing of the entire visual field in real time is likely impossible for even the most sophisticated brains due to the high computational complexity of the task [1]. Yet, organisms can efficiently process this information to parse complex scenes in real time. This amazing feat of nature relies on selective attention which allows the brain to filter sensory information to select only a small subset of it for further processing.

Today, Computer Vision has become ubiquitous in our society with several in image understanding, medicine, drones, self-driving cars and many more. With the advent of GPUs and the availability of huge datasets like ImageNet, Convolutional Neural Networks (CNNs) have come to play a very important role in solving computer vision tasks, e.g object detection. However, the size of the networks become

prohibitive when higher accuracies are needed, which in turn demands more hardware. This hinders the application of CNNs to mobile platforms and stops them from hitting the real-time mark. The computational efficiency of a computer vision task, like object detection, can be enhanced by adopting a selective attention mechanism into the algorithm. In this work, this idea is explored by using Visual Proto Object Saliency algorithm [1] to crop out the areas of an image without relevant objects before a computationally intensive network like the Faster R-CNN [2] processes it.

ContributorsGorthy, Sai Rama Srivatsava (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Vrudhula, Sarma (Committee member) / Arizona State University (Publisher)

Created2017

ASU Electronic Theses and Dissertations