Matching Items (2)
150476-Thumbnail Image.png
Description
Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns.

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.
ContributorsYu, Chi-Li (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2012
135132-Thumbnail Image.png
Description
The purpose of the Very Long Instruction Word (VLIW) Remotely Reconfigurable DSP Element is to use VLIW as a design process and to design hardware components of a reconfigurable DSP Element and ascertaining the overall length of the Very Long Instruction Word. This project is focused solely on hardware components

The purpose of the Very Long Instruction Word (VLIW) Remotely Reconfigurable DSP Element is to use VLIW as a design process and to design hardware components of a reconfigurable DSP Element and ascertaining the overall length of the Very Long Instruction Word. This project is focused solely on hardware components being designed by hand with regards to certain specifications deemed by General Dynamics Mission Systems, and using the designs, finding the overall length of the VLIW for use in future work. To design each of the elements, General Dynamics had specified several requirements. Each element was then designed individually according to the requirements. After the initial design, each was sent back for a design review from General Dynamics, and after revision, all parts were linked together for an overall calculation on the length of the VLIW. VLIW Reconfigurable DSP Elements is not a new concept, but has yet to have a proof of concept published. Future work includes a proof of concept with software (done by the ASU Capstone team), then future development by General Dynamics. Should they choose to continue with this project, they will continue testing on FPGA boards, and perhaps future development into an ASIC. Overall the purpose of General Dynamics for proposing this project is for deep space payloads, for which this project has the most applications.
ContributorsYiin, Nathan Kehan (Author) / Clark, Lawrence (Thesis director) / Aberle, James (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12