Matching Items (2)
Filtering by

Clear all filters

150167-Thumbnail Image.png
Description
Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for

Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for implementing high-throughput DSP systems that are also energy-efficient. Data-path components such as adders and multipliers are evaluated with respect to critical path delay, energy and Energy-Delay Product (EDP). A new design for a RBR adder with very good EDP performance has been proposed. The corresponding RBR parallel adder has a much lower critical path delay and EDP compared to two's complement carry select and carry look-ahead adder implementations. Next, several RBR multiplier architectures are investigated and their performance compared to two's complement systems. These include two new multiplier architectures: a purely RBR multiplier where both the operands are in RBR form, and a hybrid multiplier where the multiplicand is in RBR form and the other operand is represented in conventional two's complement form. Both the RBR and hybrid designs are demonstrated to have better EDP performance compared to conventional two's complement multipliers. The hybrid multiplier is also shown to have a superior EDP performance compared to the RBR multiplier, with much lower implementation area. Analysis on the effect of bit-precision is also performed, and it is shown that the performance gain of RBR systems improves for higher bit precision. Next, in order to demonstrate the efficacy of the RBR representation at the system-level, the performance of RBR and hybrid implementations of some common DSP kernels such as Discrete Cosine Transform, edge detection using Sobel operator, complex multiplication, Lifting-based Discrete Wavelet Transform (9, 7) filter, and FIR filter, is compared with two's complement systems. It is shown that for relatively large computation modules, the RBR to two's complement conversion overhead gets amortized. In case of systems with high complexity, for iso-throughput, both the hybrid and RBR implementations are demonstrated to be superior with lower average energy consumption. For low complexity systems, the conversion overhead is significant, and overpowers the EDP performance gain obtained from the RBR computation operation.
ContributorsMahadevan, Rupa (Author) / Chakrabarti, Chaitali (Thesis advisor) / Kiaei, Sayfe (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2011
152360-Thumbnail Image.png
Description
In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders

In this work, we present approximate adders and multipliers to reduce data-path complexity of specialized hardware for various image processing systems. These approximate circuits have a lower area, latency and power consumption compared to their accurate counterparts and produce fairly accurate results. We build upon the work on approximate adders and multipliers presented in [23] and [24]. First, we show how choice of algorithm and parallel adder design can be used to implement 2D Discrete Cosine Transform (DCT) algorithm with good performance but low area. Our implementation of the 2D DCT has comparable PSNR performance with respect to the algorithm presented in [23] with ~35-50% reduction in area. Next, we use the approximate 2x2 multiplier presented in [24] to implement parallel approximate multipliers. We demonstrate that if some of the 2x2 multipliers in the design of the parallel multiplier are accurate, the accuracy of the multiplier improves significantly, especially when two large numbers are multiplied. We choose Gaussian FIR Filter and Fast Fourier Transform (FFT) algorithms to illustrate the efficacy of our proposed approximate multiplier. We show that application of the proposed approximate multiplier improves the PSNR performance of 32x32 FFT implementation by 4.7 dB compared to the implementation using the approximate multiplier described in [24]. We also implement a state-of-the-art image enlargement algorithm, namely Segment Adaptive Gradient Angle (SAGA) [29], in hardware. The algorithm is mapped to pipelined hardware blocks and we synthesized the design using 90 nm technology. We show that a 64x64 image can be processed in 496.48 µs when clocked at 100 MHz. The average PSNR performance of our implementation using accurate parallel adders and multipliers is 31.33 dB and that using approximate parallel adders and multipliers is 30.86 dB, when evaluated against the original image. The PSNR performance of both designs is comparable to the performance of the double precision floating point MATLAB implementation of the algorithm.
ContributorsVasudevan, Madhu (Author) / Chakrabarti, Chaitali (Thesis advisor) / Frakes, David (Committee member) / Gupta, Sandeep (Committee member) / Arizona State University (Publisher)
Created2013