ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability.
Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance.
From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.
few decades, and that has led to an exponential increase in the creation of digital images and
videos. Constantly, all digital images go through some image processing algorithm for
various reasons like compression, transmission, storage, etc. There is data loss during this
process which leaves us with a degraded image. Hence, to ensure minimal degradation of
images, the requirement for quality assessment has become mandatory. Image Quality
Assessment (IQA) has been researched and developed over the last several decades to
predict the quality score in a manner that agrees with human judgments of quality. Modern
image quality assessment (IQA) algorithms are quite effective at prediction accuracy, and
their development has not focused on improving computational performance. The existing
serial implementation requires a relatively large run-time on the order of seconds for a single
frame. Hardware acceleration using Field programmable gate arrays (FPGAs) provides
reconfigurable computing fabric that can be tailored for a broad range of applications.
Usually, programming FPGAs has required expertise in hardware descriptive languages
(HDLs) or high-level synthesis (HLS) tool. OpenCL is an open standard for cross-platform,
parallel programming of heterogeneous systems along with Altera OpenCL SDK, enabling
developers to use FPGA's potential without extensive hardware knowledge. Hence, this
thesis focuses on accelerating the computationally intensive part of the most apparent
distortion (MAD) algorithm on FPGA using OpenCL. The results are compared with CPU
implementation to evaluate performance and efficiency gains.
Q-learning is one of the model-free reinforcement directed learning strategies which uses temporal differences to estimate the performances of state-action pairs called Q values. A simple implementation of Q-learning algorithm can be done using a Q table memory to store and update the Q values. However, with an increase in state space data due to a complex environment, and with an increase in possible number of actions an agent can perform, Q table reaches its space limit and would be difficult to scale well. Q-learning with neural networks eliminates the use of Q table by approximating the Q function using neural networks.
Autonomous agents need to develop cognitive properties and become self-adaptive to be deployable in any environment. Reinforcement learning with Q-learning have been very efficient in solving such problems. However, embedded systems like space rovers and autonomous robots rarely implement such techniques due to the constraints faced like processing power, chip area, convergence rate and cost of the chip. These problems present a need for a portable, low power, area efficient hardware accelerator to accelerate the process of such learning.
This problem is targeted by implementing a hardware schematic architecture for Q-learning using Artificial Neural networks. This architecture exploits the massive parallelism provided by neural network with a dedicated fine grain parallelism provided by a Field Programmable Gate Array (FPGA) thereby processing the Q values at a high throughput. Mars exploration rovers currently use Xilinx-Space-grade FPGA devices for image processing, pyrotechnic operation control and obstacle avoidance. The hardware resource consumption for the architecture has been synthesized considering Xilinx Virtex7 FPGA as the target device.
In this thesis, a novel mixed signal adaptive ripple cancellation technique is presented. The idea is to generate an artificial ripple current with the same amplitude as inductor current ripple but opposite phase that has high linearity tracking behavior. To generate the artificial triangular current, duty cycle information and inductor current ripple amplitude information are needed. By sensing switching node SW, the duty cycle information can be obtained; by using feedback the amplitude of the artificial ripple current can be regulated. The artificial ripple current cancels out the inductor current, and results in a very low ripple output current flowing to load. In top level simulation, 19.3dB ripple rejection can be achieved.
Typical LDOs achieve higher PSR within their loop-bandwidth; however, their supply rejection performance degrades with reduced loop-gain outside their loop- bandwidth. The LDOs with external filtering capacitors may also have spectral peaking in their PSR response, causing excess system- level supply noise. This work presents an LDO design approach, which achieves a PSR of higher than 68 dB up to 2 MHz frequency and over a wide range of loads up to 250 mA. The wide PSR bandwidth is achieved using a current-mode feedforward ripple canceller (CFFRC) amplifier which provides up to 25 dB of PSR improvement. The feedforward path gain is inherently matched to the forward gain of the LDO, not requiring calibration. The LDO has a fast load transient response with a recovery time of 6.1μs and has a quiescent current of 5.6μA. For a full load transition, the LDO achieves settling with overshoot and undershoot voltages below 27.6 mV and 36.36 mV, respectively. The LDO is designed and fabricated in a 180 nm bipolar/CMOS/DMOS (BCD) technology. The CFFRC amplifier helps to achieve low quiescent power due to its inherent current mode nature, eliminating the need for supply ripple summing amplifiers and adaptive biasing.