Matching Items (4)

135132-Thumbnail Image.png

VLIW Remotely Reconfigurable DSP Element

Description

The purpose of the Very Long Instruction Word (VLIW) Remotely Reconfigurable DSP Element is to use VLIW as a design process and to design hardware components of a reconfigurable DSP

The purpose of the Very Long Instruction Word (VLIW) Remotely Reconfigurable DSP Element is to use VLIW as a design process and to design hardware components of a reconfigurable DSP Element and ascertaining the overall length of the Very Long Instruction Word. This project is focused solely on hardware components being designed by hand with regards to certain specifications deemed by General Dynamics Mission Systems, and using the designs, finding the overall length of the VLIW for use in future work. To design each of the elements, General Dynamics had specified several requirements. Each element was then designed individually according to the requirements. After the initial design, each was sent back for a design review from General Dynamics, and after revision, all parts were linked together for an overall calculation on the length of the VLIW. VLIW Reconfigurable DSP Elements is not a new concept, but has yet to have a proof of concept published. Future work includes a proof of concept with software (done by the ASU Capstone team), then future development by General Dynamics. Should they choose to continue with this project, they will continue testing on FPGA boards, and perhaps future development into an ASIC. Overall the purpose of General Dynamics for proposing this project is for deep space payloads, for which this project has the most applications.

Contributors

Agent

Created

Date Created
  • 2016-12

156962-Thumbnail Image.png

Algorithm Architecture Co-design for Dense and Sparse Matrix Computations

Description

With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications

With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications of linear algebra

are ubiquitous in the field of scientific computing, machine learning, statistics,

etc. with matrix computations being fundamental to these linear algebra based solutions.

Design of multiple dense (or sparse) matrix computation routines on the

same platform is quite challenging. Added to the complexity is the fact that dense

and sparse matrix computations have large differences in their storage and access

patterns and are difficult to optimize on the same architecture. This thesis addresses

this challenge and introduces a reconfigurable accelerator that supports both dense

and sparse matrix computations efficiently.

The reconfigurable architecture has been optimized to execute the following linear

algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM

(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),

LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),

SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where

each core consists of a 2D array of processing elements (PE).

The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix

updates efficiently. A sequence of such updates is used to solve a larger problem inside

a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)

is used to perform sparse kernel updates. Scalable partitioning and mapping schemes

are presented that map input matrices of any given size to the multicore architecture.

Design trade-offs related to the PE array dimension, size of local memory inside a core

and the bandwidth between on-chip memories and the cores have been presented. An

optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto

32 GOPS using a single core.

Contributors

Agent

Created

Date Created
  • 2018

157717-Thumbnail Image.png

CMOS integrated power amplifiers for RF reconfigurable and digital transmitters

Description

This dissertation focuses on three different efficiency enhancement methods that are applicable to handset applications. These proposed designs are based on three critical requirements for handset application: 1) Small form

This dissertation focuses on three different efficiency enhancement methods that are applicable to handset applications. These proposed designs are based on three critical requirements for handset application: 1) Small form factor, 2) CMOS compatibility and 3) high power handling. The three presented methodologies are listed below:

1) A transformer-based power combiner architecture for out-phasing transmitters

2) A current steering DAC-based average power tracking circuit for on-chip power amplifiers (PA)

3) A CMOS-based driver stage for GaN-based switched-mode power amplifiers applicable to fully digital transmitters

This thesis highlights the trends in wireless handsets, the motivates the need for fully-integrated CMOS power amplifier solutions and presents the three novel techniques for reconfigurable and digital CMOS-based PAs. Chapter 3, presents the transformer-based power combiner for out-phasing transmitters. The simulation results reveal that this technique is able to shrink the power combiner area, which is one of the largest parts of the transmitter, by about 50% and as a result, enhances the output power density by 3dB.

The average power tracking technique (APT) integrated with an on-chip CMOS-based power amplifier is explained in Chapter 4. This system is able to achieve up to 32dBm saturated output power with a linear power gain of 20dB in a 45nm CMOS SOI process. The maximum efficiency improvement is about ∆η=15% compared to the same PA without APT. Measurement results show that the proposed method is able to amplify an enhanced-EDGE modulated input signal with a data rate of 70.83kb/sec and generate more than 27dBm of average output power with EVM<5%.

Although small form factor, high battery lifetime, and high volume integration motivate the need for fully digital CMOS transmitters, the output power generated by this type of transmitter is not high enough to satisfy the communication standards. As a result, compound materials such as GaN or GaAs are usually being used in handset applications to increase the output power. Chapter 5 focuses on the analysis and design of two CMOS based driver architectures (cascode and house of cards) for driving a GaN power amplifier. The presented results show that the drivers are able to generate ∆Vout=5V, which is required by the compound transistor, and operate up to 2GHz. Since the CMOS driver is expected to drive an off-chip capacitive load, the interface components, such as bond wires, and decoupling and pad capacitors, play a critical role in the output transient response. Therefore, extensive analysis and simulation results have been done on the interface circuits to investigate their effects on RF transmitter performance. The presented results show that the maximum operating frequency when the driver is connected to a 4pF capacitive load is about 2GHz, which is perfectly matched with the reported values in prior literature.

Contributors

Agent

Created

Date Created
  • 2019

156189-Thumbnail Image.png

Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

Description

Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of

Static CMOS logic has remained the dominant design style of digital systems for

more than four decades due to its robustness and near zero standby current. Static

CMOS logic circuits consist of a network of combinational logic cells and clocked sequential

elements, such as latches and flip-flops that are used for sequencing computations

over time. The majority of the digital design techniques to reduce power, area, and

leakage over the past four decades have focused almost entirely on optimizing the

combinational logic. This work explores alternate architectures for the flip-flops for

improving the overall circuit performance, power and area. It consists of three main

sections.

First, is the design of a multi-input configurable flip-flop structure with embedded

logic. A conventional D-type flip-flop may be viewed as realizing an identity function,

in which the output is simply the value of the input sampled at the clock edge. In

contrast, the proposed multi-input flip-flop, named PNAND, can be configured to

realize one of a family of Boolean functions called threshold functions. In essence,

the PNAND is a circuit implementation of the well-known binary perceptron. Unlike

other reconfigurable circuits, a PNAND can be configured by simply changing the

assignment of signals to its inputs. Using a standard cell library of such gates, a technology

mapping algorithm can be applied to transform a given netlist into one with

an optimal mixture of conventional logic gates and threshold gates. This approach

was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier

in 65nm LP technology. Simulation and chip measurements show more than 30%

improvement in dynamic power and more than 20% reduction in core area.

The functional yield of the PNAND reduces with geometry and voltage scaling.

The second part of this research investigates the use of two mechanisms to improve

the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM

devices for low voltage operation.

The third part of this research focused on the design of flip-flops with non-volatile

storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated

with both conventional D-flipflop and the PNAND circuits to implement non-volatile

logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of

system locally when a power interruption occurs. However, manufacturing variations

in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading

to an overly pessimistic design and consequently, higher energy consumption. A

detailed analysis of the design trade-offs in the driver circuitry for performing backup

and restore, and a novel method to design the energy optimal driver for a given yield is

presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,

in which the backup time is determined on a per-chip basis, resulting in minimizing

the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,

the conventional approach would have to expend nearly 5X more energy than the

minimum required, whereas the proposed tunable approach expends only 26% more

energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are

designed with the same backup and restore circuitry in 65nm technology. The embedded

logic in NV-TLFF compensates performance overhead of NVL. This leads to the

possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-

accumulate (MAC) unit is designed to demonstrate the performance benefits of the

proposed architecture. Based on the results of HSPICE simulations, the MAC circuit

with the proposed NV-TLFF cells is shown to consume at least 20% less power and

area as compared to the circuit designed with conventional DFFs, without sacrificing

any performance.

Contributors

Agent

Created

Date Created
  • 2018