ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
accumulating the desired behavior over time. A mixed-signal cross-correlation circuit is used to derive on-chip impulse responses, with smaller memory and lower computational requirement in comparison to a digital correlator approach. Model reference based parametric and non-parametric techniques are discussed to analyze the impulse response results in both time and frequency domain. The proposed techniques can extract open-loop phase margin and closed-loop unity-gain frequency within 5.2% and 4.1% error, respectively, for the load current range of 30-200mA. Converter parameters such as natural frequency (ω_n ), quality factor (Q), and center frequency (ω_c ) can be estimated within 3.6%, 4.7%, and 3.8% error respectively, over load inductance of 4.7-10.3µH, and filter capacitance of 200-400nF. A 5-MHz switching frequency, 5-8.125V input voltage range, voltage-mode controlled DC-DC buck converter is designed for the proposed built-in self-test (BIST) analysis. The converter output voltage range is 3.3-5V and the supported maximum
load current is 450mA. The peak efficiency of the converter is 87.93%. The proposed converter is fabricated on a 0.6µm 6-layer-metal Silicon-On-Insulator (SOI) technology with a die area of 9mm^2 . The area impact due to the system identification blocks including related I/O structures is 3.8% and they consume 530µA quiescent current during operation.
In the first part of dissertation, a high performance Li-ion switching battery charger is proposed. Cascaded two loop (CTL) control architecture is used for seamless CC-CV transition, time based technique is utilized to minimize controller area and power consumption. Time domain controller is implemented by using voltage controlled oscillator (VCO) and voltage controlled delay line (VCDL). Several efficiency improvement techniques such as segmented power-FET, quasi-zero voltage switching (QZVS) and switching frequency reduction are proposed. The proposed switching battery charger is able to provide maximum 2 A charging current and has an peak efficiency of 93.3%. By configure the charger as boost converter, the charger is able to provide maximum 1.5 A charging current while achieving 96.3% peak efficiency.
The second part of dissertation presents a digital low dropout regulator (DLDO) for system on a chip (SoC) in portable devices application. The proposed DLDO achieve fast transient settling time, lower undershoot/overshoot and higher PSR performance compared to state of the art. By having a good PSR performance, the proposed DLDO is able to power mixed signal load. To achieve a fast load transient response, a load transient detector (LTD) enables boost mode operation of the digital PI controller. The boost mode operation achieves sub microsecond settling time, and reduces the settling time by 50% to 250 ns, undershoot/overshoot by 35% to 250 mV and 17% to 125 mV without compromising the system stability.
Besides having low power consumption, supply regulators for such IoT systems also need to have fast transient response to load current changes during a duty-cycled operation. Supply regulation using low quiescent current low dropout (LDO) regulators helps in extending the battery life of such power aware always-on applications with very long standby time. To serve as a supply regulator for such applications, a 1.24 µA quiescent current NMOS low dropout (LDO) is presented in this dissertation. This LDO uses a hybrid bias current generator (HBCG) to boost its bias current and improve the transient response. A scalable bias-current error amplifier with an on-demand buffer drives the NMOS pass device. The error amplifier is powered with an integrated dynamic frequency charge pump to ensure low dropout voltage. A low-power relaxation oscillator (LPRO) generates the charge pump clocks. Switched-capacitor pole tracking (SCPT) compensation scheme is proposed to ensure stability up to maximum load current of 150 mA for a low-ESR output capacitor range of 1 - 47µF. Designed in a 0.25 µm CMOS process, the LDO has an output voltage range of 1V – 3V, a dropout voltage of 240 mV, and a core area of 0.11 mm2.
In the radiation environment, e.g. aerospace, the memory devices face different energetic particles. The strike of these energetic particles can generate electron-hole pairs (directly or indirectly) as they pass through the semiconductor device, resulting in photo-induced current, and may change the memory state. First, the trend of radiation effects of the mainstream memory technologies with technology node scaling is reviewed. Then, single event effects of the oxide based resistive switching random memory (RRAM), one of eNVM technologies, is investigated from the circuit-level to the system level.
Physical Unclonable Function (PUF) has been widely investigated as a promising hardware security primitive, which employs the inherent randomness in a physical system (e.g. the intrinsic semiconductor manufacturing variability). In the dissertation, two RRAM-based PUF implementations are proposed for cryptographic key generation (weak PUF) and device authentication (strong PUF), respectively. The performance of the RRAM PUFs are evaluated with experiment and simulation. The impact of non-ideal circuit effects on the performance of the PUFs is also investigated and optimization strategies are proposed to solve the non-ideal effects. Besides, the security resistance against modeling and machine learning attacks is analyzed as well.
Deep neural networks (DNNs) have shown remarkable improvements in various intelligent applications such as image classification, speech classification and object localization and detection. Increasing efforts have been devoted to develop hardware accelerators. In this dissertation, two types of compute-in-memory (CIM) based hardware accelerator designs with SRAM and eNVM technologies are proposed for two binary neural networks, i.e. hybrid BNN (HBNN) and XNOR-BNN, respectively, which are explored for the hardware resource-limited platforms, e.g. edge devices.. These designs feature with high the throughput, scalability, low latency and high energy efficiency. Finally, we have successfully taped-out and validated the proposed designs with SRAM technology in TSMC 65 nm.
Overall, this dissertation paves the paths for memory technologies’ new applications towards the secure and energy-efficient artificial intelligence system.
Deep neural networks are composed of multiple layers that are compute/memory intensive. This makes it difficult to execute these algorithms real-time with low power consumption using existing general purpose computers. In this work, we propose hardware accelerators for a traditional AI algorithm based on random forest trees and two representative deep convolutional neural networks (AlexNet and VGG). With the proposed acceleration techniques, ~ 30x performance improvement was achieved compared to CPU for random forest trees. For deep CNNS, we demonstrate that much higher performance can be achieved with architecture space exploration using any optimization algorithms with system level performance and area models for hardware primitives as inputs and goal of minimizing latency with given resource constraints. With this method, ~30GOPs performance was achieved for Stratix V FPGA boards.
Hardware acceleration of DL algorithms alone is not always the most ecient way and sucient to achieve desired performance. There is a huge headroom available for performance improvement provided the algorithms are designed keeping in mind the hardware limitations and bottlenecks. This work achieves hardware-software co-optimization for Non-Maximal Suppression (NMS) algorithm. Using the proposed algorithmic changes and hardware architecture
With CMOS scaling coming to an end and increasing memory bandwidth bottlenecks, CMOS based system might not scale enough to accommodate requirements of more complicated and deeper neural networks in future. In this work, we explore RRAM crossbars and arrays as compact, high performing and energy efficient alternative to CMOS accelerators for deep learning training and inference. We propose and implement RRAM periphery read and write circuits and achieved ~3000x performance improvement in online dictionary learning compared to CPU.
This work also examines the realistic RRAM devices and their non-idealities. We do an in-depth study of the effects of RRAM non-idealities on inference accuracy when a pretrained model is mapped to RRAM based accelerators. To mitigate this issue, we propose Random Sparse Adaptation (RSA), a novel scheme aimed at tuning the model to take care of the faults of the RRAM array on which it is mapped. Our proposed method can achieve inference accuracy much higher than what traditional Read-Verify-Write (R-V-W) method could achieve. RSA can also recover lost inference accuracy 100x ~ 1000x faster compared to R-V-W. Using 32-bit high precision RSA cells, we achieved ~10% higher accuracy using fautly RRAM arrays compared to what can be achieved by mapping a deep network to an 32 level RRAM array with no variations.
The presented research introduces two fully integrated LDO voltage regulators for SoC applications. N-type Metal-Oxide-Semiconductor (NMOS) power transistor based operation achieves high bandwidth owing to the source follower configuration of the regulation loop. A low input impedance and high output impedance error amplifier ensures wide regulation loop bandwidth and high gain. Current-reused dynamic biasing technique has been employed to increase slew-rate at the gate of power transistor during full-load variations, by a factor of two. Three design variations for a 1-1.8 V, 50 mA NMOS LDO voltage regulator have been implemented in a 180 nm Mixed-mode/RF process. The whole LDO core consumes 0.130 mA of nominal quiescent ground current at 50 mA load and occupies 0.21 mm x mm. LDO has a dropout voltage of 200 mV and is able to recover in 30 ns from a 65 mV of undershoot for 0-50 pF of on-chip load capacitance.
The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation.
Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR.
Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths.
Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.
This research focuses on developing an innovative technique to cancel the ripple at the output of switching regulator which is scalable across a wide range of switching frequencies. The proposed technique consists of a primary ripple canceller and an auxiliary ripple canceller, both of which facilitate in the generation of a quiet supply and help to attenuate the ripple at the output of buck converter by over 22dB. These techniques can be applied to any DC-DC converter and are scalable across frequency, load current, output voltage as compared to LDO without significant overhead on efficiency or area. The proposed technique also presents a fully integrated solution without the need of additional off-chip components which, considering the push for full-integration of Power Management Integrated Circuits, is a big advantage over using LDOs.
Today, Computer Vision has become ubiquitous in our society with several in image understanding, medicine, drones, self-driving cars and many more. With the advent of GPUs and the availability of huge datasets like ImageNet, Convolutional Neural Networks (CNNs) have come to play a very important role in solving computer vision tasks, e.g object detection. However, the size of the networks become
prohibitive when higher accuracies are needed, which in turn demands more hardware. This hinders the application of CNNs to mobile platforms and stops them from hitting the real-time mark. The computational efficiency of a computer vision task, like object detection, can be enhanced by adopting a selective attention mechanism into the algorithm. In this work, this idea is explored by using Visual Proto Object Saliency algorithm [1] to crop out the areas of an image without relevant objects before a computationally intensive network like the Faster R-CNN [2] processes it.