ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: Computer Engineering
- Creators: Ogras, Umit Y.
To develop and implement flexible protocols, a flexible hardware very similar to a Software Defined Radio (SDR) is required. In this thesis, the Intel T2200 board is chosen as the SDR platform. It is a heterogeneous platform with ARM, CEVA DSP and several accelerators. A wide range of protocols is mapped onto this platform and their performance evaluated. These include two OFDM based protocols (WiFi-Lite-A, WiFi-Lite-B), one DFT-spread OFDM based protocol (SCFDM-Lite) and one single carrier based protocol (SC-Lite). The transmitter and receiver blocks of the different protocols are first mapped on ARM in the T2200 board. The timing results show that IFFT, FFT, and Viterbi decoder blocks take most of the transmitter and receiver execution time and so in the next step these are mapped onto CEVA DSP. Mapping onto CEVA DSP resulted in significant execution time savings. The savings for WiFi-Lite-A were 60%, for WiFi-Lite-B were 64%, and for SCFDM-Lite were 71.5%. No savings are reported for SC-Lite since it was not mapped onto CEVA DSP.
Significant reduction in execution time is achieved for WiFi-Lite-A and WiFi-Lite-B protocols by implementing the entire transmitter and receiver chains on CEVA DSP. For instance, for WiFi-Lite-A, the savings were as large as 90%. Such huge savings are because the entire transmitter or receiver chain are implemented on CEVA and the timing overhead due to ARM-CEVA communication is completely eliminated. Finally, over-the-air testing was done for WiFi-Lite-A and WiFi-Lite-B protocols. Data was sent over the air using one Intel T2200 WBS board and received using another Intel T2200 WBS board. The received frames were decoded with no errors, thereby validating the over-the-air-communications.
In this thesis, a low cost object tracking system is implemented on a hardware accelerator that is a warp based processor for SIMD/Vector style computations. First, the different foreground detection techniques are explored to figure out the best technique that involves the least number of computations without compromising on the performance. It is found that the Gaussian Mixture Model proposed by Zivkovic gives the best performance with respect to both accuracy and number of computations. Pixel level parallelization is applied to this algorithm and it is mapped onto the hardware accelerator.
Next, the different clustering algorithms are studied and it is found that while DBSCAN is highly accurate and robust to outliers, it is very computationally intensive. In contrast, K-means is computationally simple, but it requires that the number of means to be specified beforehand. So, a new clustering algorithm is proposed that uses a combination of both DBSCAN and K-means algorithm along with a diagnostic algorithm on K-means to estimate the right number of centroids. The proposed hybrid algorithm is shown to be faster than the DBSCAN algorithm by ~2.5x with minimal loss in accuracy. Also, the 1D Kalman filter is implemented assuming constant acceleration model. Since the computations involved in Kalman filter is just a set of recursive equations, the sequential model in itself exhibits good performance, thereby alleviating the need for parallelization. The tracking performance of the low cost implementation is evaluated against the sequential version. It is found that the proposed hybrid algorithm performs very close to the reference algorithm based on the DBSCAN algorithm.
of accelerating even non-parallel loops and loops with low trip-counts. One challenge
in compiling for CGRAs is to manage both recurring and nonrecurring variables in
the register file (RF) of the CGRA. Although prior works have managed recurring
variables via rotating RF, they access the nonrecurring variables through either a
global RF or from a constant memory. The former does not scale well, and the latter
degrades the mapping quality. This work proposes a hardware-software codesign
approach in order to manage all the variables in a local nonrotating RF. Hardware
provides modulo addition based indexing mechanism to enable correct addressing
of recurring variables in a nonrotating RF. The compiler determines the number of
registers required for each recurring variable and configures the boundary between the
registers used for recurring and nonrecurring variables. The compiler also pre-loads
the read-only variables and constants into the local registers in the prologue of the
schedule. Synthesis and place-and-route results of the previous and the proposed RF
design show that proposed solution achieves 17% better cycle time. Experiments of
mapping several important and performance-critical loops collected from MiBench
show proposed approach improves performance (through better mapping) by 18%,
compared to using constant memory.
Platform energy consumption and responsiveness are two major considerations for mobile systems since they determine the battery life and user satisfaction, respectively. In this work, the models for power consumption, response time, and energy consumption of heterogeneous mobile platforms are presented. Then, these models are used to optimize the energy consumption of baseline platforms under power, response time, and temperature constraints with and without introducing new resources. It is shown, the optimal design choices depend on dynamic power management algorithm, and adding new resources is more energy efficient than scaling existing resources alone. The framework is verified through actual experiments on Qualcomm Snapdragon 800 based tablet MDP/T. Furthermore, usage of the framework at both design and runtime optimization is also presented.
In this dissertation, algorithm-architecture co-design techniques that aim to make hand-held 3-D ultrasound a reality are presented. First, image enhancement methods to improve signal-to-noise ratio (SNR) are proposed. These include virtual source firing techniques and a low overhead digital front-end architecture using orthogonal chirps and orthogonal Golay codes.
Second, algorithm-architecture co-design techniques to reduce the power consumption of 3-D SAU imaging systems is presented. These include (i) a subaperture multiplexing strategy and the corresponding apodization method to alleviate the signal bandwidth bottleneck, and (ii) a highly efficient iterative delay calculation method to eliminate complex operations such as multiplications, divisions and square-root in delay calculation during beamforming. These techniques were used to define Sonic Millip3De, a 3-D die stacked architecture for digital beamforming in SAU systems. Sonic Millip3De produces 3-D high resolution images at 2 frames per second with system power consumption of 15W in 45nm technology.
Third, a new beamforming method based on separable delay decomposition is proposed to reduce the computational complexity of the beamforming unit in an SAU system. The method is based on minimizing the root-mean-square error (RMSE) due to delay decomposition. It reduces the beamforming complexity of a SAU system by 19x while providing high image fidelity that is comparable to non-separable beamforming. The resulting modified Sonic Millip3De architecture supports a frame rate of 32 volumes per second while maintaining power consumption of 15W in 45nm technology.
Next a 3-D plane-wave imaging system that utilizes both separable beamforming and coherent compounding is presented. The resulting system has computational complexity comparable to that of a non-separable non-compounding baseline system while significantly improving contrast-to-noise ratio and SNR. The modified Sonic Millip3De architecture is now capable of generating high resolution images at 1000 volumes per second with 9-fire-angle compounding.
In the first part, unmanned aerial system-to-ground communications channel models are derived from empirical data collected from software-defined radio transceivers in residential and mountainous desert environments using a small (< 20 kg) unmanned aerial system during low-altitude flight (< 130 m). The Kullback-Leibler divergence measure was employed to characterize model mismatch from the empirical data. Using this measure the derived models accurately describe the underlying data.
In the second part, an experimental joint sensing and communications system is implemented using a network of software-defined radio transceivers. A novel co-design receiver architecture is presented and demonstrated within a three-node joint multiple access system topology consisting of an independent radar and communications transmitter along with a joint radar and communications receiver. The receiver tracks an emulated target moving along a predefined path and simultaneously decodes a communications message. Experimental system performance bounds are characterized jointly using the communications channel capacity and novel estimation information rate.
Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads.