Matching Items (389)
ContributorsWard, Geoffrey Harris (Performer) / ASU Library. Music Library (Publisher)
Created2018-03-18
Description
Static CMOS logic has remained the dominant design style of digital systems for
more than four decades due to its robustness and near zero standby current. Static
CMOS logic circuits consist of a network of combinational logic cells and clocked sequential
elements, such as latches and flip-flops that are used for sequencing computations
over time. The majority of the digital design techniques to reduce power, area, and
leakage over the past four decades have focused almost entirely on optimizing the
combinational logic. This work explores alternate architectures for the flip-flops for
improving the overall circuit performance, power and area. It consists of three main
sections.
First, is the design of a multi-input configurable flip-flop structure with embedded
logic. A conventional D-type flip-flop may be viewed as realizing an identity function,
in which the output is simply the value of the input sampled at the clock edge. In
contrast, the proposed multi-input flip-flop, named PNAND, can be configured to
realize one of a family of Boolean functions called threshold functions. In essence,
the PNAND is a circuit implementation of the well-known binary perceptron. Unlike
other reconfigurable circuits, a PNAND can be configured by simply changing the
assignment of signals to its inputs. Using a standard cell library of such gates, a technology
mapping algorithm can be applied to transform a given netlist into one with
an optimal mixture of conventional logic gates and threshold gates. This approach
was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier
in 65nm LP technology. Simulation and chip measurements show more than 30%
improvement in dynamic power and more than 20% reduction in core area.
The functional yield of the PNAND reduces with geometry and voltage scaling.
The second part of this research investigates the use of two mechanisms to improve
the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM
devices for low voltage operation.
The third part of this research focused on the design of flip-flops with non-volatile
storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated
with both conventional D-flipflop and the PNAND circuits to implement non-volatile
logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of
system locally when a power interruption occurs. However, manufacturing variations
in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading
to an overly pessimistic design and consequently, higher energy consumption. A
detailed analysis of the design trade-offs in the driver circuitry for performing backup
and restore, and a novel method to design the energy optimal driver for a given yield is
presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,
in which the backup time is determined on a per-chip basis, resulting in minimizing
the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,
the conventional approach would have to expend nearly 5X more energy than the
minimum required, whereas the proposed tunable approach expends only 26% more
energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are
designed with the same backup and restore circuitry in 65nm technology. The embedded
logic in NV-TLFF compensates performance overhead of NVL. This leads to the
possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-
accumulate (MAC) unit is designed to demonstrate the performance benefits of the
proposed architecture. Based on the results of HSPICE simulations, the MAC circuit
with the proposed NV-TLFF cells is shown to consume at least 20% less power and
area as compared to the circuit designed with conventional DFFs, without sacrificing
any performance.
more than four decades due to its robustness and near zero standby current. Static
CMOS logic circuits consist of a network of combinational logic cells and clocked sequential
elements, such as latches and flip-flops that are used for sequencing computations
over time. The majority of the digital design techniques to reduce power, area, and
leakage over the past four decades have focused almost entirely on optimizing the
combinational logic. This work explores alternate architectures for the flip-flops for
improving the overall circuit performance, power and area. It consists of three main
sections.
First, is the design of a multi-input configurable flip-flop structure with embedded
logic. A conventional D-type flip-flop may be viewed as realizing an identity function,
in which the output is simply the value of the input sampled at the clock edge. In
contrast, the proposed multi-input flip-flop, named PNAND, can be configured to
realize one of a family of Boolean functions called threshold functions. In essence,
the PNAND is a circuit implementation of the well-known binary perceptron. Unlike
other reconfigurable circuits, a PNAND can be configured by simply changing the
assignment of signals to its inputs. Using a standard cell library of such gates, a technology
mapping algorithm can be applied to transform a given netlist into one with
an optimal mixture of conventional logic gates and threshold gates. This approach
was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier
in 65nm LP technology. Simulation and chip measurements show more than 30%
improvement in dynamic power and more than 20% reduction in core area.
The functional yield of the PNAND reduces with geometry and voltage scaling.
The second part of this research investigates the use of two mechanisms to improve
the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM
devices for low voltage operation.
The third part of this research focused on the design of flip-flops with non-volatile
storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated
with both conventional D-flipflop and the PNAND circuits to implement non-volatile
logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of
system locally when a power interruption occurs. However, manufacturing variations
in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading
to an overly pessimistic design and consequently, higher energy consumption. A
detailed analysis of the design trade-offs in the driver circuitry for performing backup
and restore, and a novel method to design the energy optimal driver for a given yield is
presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,
in which the backup time is determined on a per-chip basis, resulting in minimizing
the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,
the conventional approach would have to expend nearly 5X more energy than the
minimum required, whereas the proposed tunable approach expends only 26% more
energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are
designed with the same backup and restore circuitry in 65nm technology. The embedded
logic in NV-TLFF compensates performance overhead of NVL. This leads to the
possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-
accumulate (MAC) unit is designed to demonstrate the performance benefits of the
proposed architecture. Based on the results of HSPICE simulations, the MAC circuit
with the proposed NV-TLFF cells is shown to consume at least 20% less power and
area as compared to the circuit designed with conventional DFFs, without sacrificing
any performance.
ContributorsYang, Jinghua (Author) / Vrudhula, Sarma (Thesis advisor) / Barnaby, Hugh (Committee member) / Cao, Yu (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)
Created2018
Description
With the end of Dennard scaling and Moore's law, architects have moved towards
heterogeneous designs consisting of specialized cores to achieve higher performance
and energy efficiency for a target application domain. Applications of linear algebra
are ubiquitous in the field of scientific computing, machine learning, statistics,
etc. with matrix computations being fundamental to these linear algebra based solutions.
Design of multiple dense (or sparse) matrix computation routines on the
same platform is quite challenging. Added to the complexity is the fact that dense
and sparse matrix computations have large differences in their storage and access
patterns and are difficult to optimize on the same architecture. This thesis addresses
this challenge and introduces a reconfigurable accelerator that supports both dense
and sparse matrix computations efficiently.
The reconfigurable architecture has been optimized to execute the following linear
algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM
(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),
LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),
SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where
each core consists of a 2D array of processing elements (PE).
The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix
updates efficiently. A sequence of such updates is used to solve a larger problem inside
a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)
is used to perform sparse kernel updates. Scalable partitioning and mapping schemes
are presented that map input matrices of any given size to the multicore architecture.
Design trade-offs related to the PE array dimension, size of local memory inside a core
and the bandwidth between on-chip memories and the cores have been presented. An
optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto
32 GOPS using a single core.
heterogeneous designs consisting of specialized cores to achieve higher performance
and energy efficiency for a target application domain. Applications of linear algebra
are ubiquitous in the field of scientific computing, machine learning, statistics,
etc. with matrix computations being fundamental to these linear algebra based solutions.
Design of multiple dense (or sparse) matrix computation routines on the
same platform is quite challenging. Added to the complexity is the fact that dense
and sparse matrix computations have large differences in their storage and access
patterns and are difficult to optimize on the same architecture. This thesis addresses
this challenge and introduces a reconfigurable accelerator that supports both dense
and sparse matrix computations efficiently.
The reconfigurable architecture has been optimized to execute the following linear
algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM
(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),
LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),
SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where
each core consists of a 2D array of processing elements (PE).
The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix
updates efficiently. A sequence of such updates is used to solve a larger problem inside
a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)
is used to perform sparse kernel updates. Scalable partitioning and mapping schemes
are presented that map input matrices of any given size to the multicore architecture.
Design trade-offs related to the PE array dimension, size of local memory inside a core
and the bandwidth between on-chip memories and the cores have been presented. An
optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto
32 GOPS using a single core.
ContributorsAnimesh, Saurabh (Author) / Chakrabarti, Chaitali (Thesis advisor) / Brunhaver, John (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)
Created2018
ContributorsBolari, John (Performer) / ASU Library. Music Library (Publisher)
Created2018-10-04
ContributorsOftedahl, Paul (Performer) / ASU Library. Music Library (Publisher)
Created2018-09-29
ContributorsMarshall, Kimberly (Performer) / Meszler, Alexander (Performer) / Yatso, Toby (Narrator) / ASU Library. Music Library (Publisher)
Created2018-09-16
Description
The purpose of the Very Long Instruction Word (VLIW) Remotely Reconfigurable DSP Element is to use VLIW as a design process and to design hardware components of a reconfigurable DSP Element and ascertaining the overall length of the Very Long Instruction Word. This project is focused solely on hardware components being designed by hand with regards to certain specifications deemed by General Dynamics Mission Systems, and using the designs, finding the overall length of the VLIW for use in future work. To design each of the elements, General Dynamics had specified several requirements. Each element was then designed individually according to the requirements. After the initial design, each was sent back for a design review from General Dynamics, and after revision, all parts were linked together for an overall calculation on the length of the VLIW. VLIW Reconfigurable DSP Elements is not a new concept, but has yet to have a proof of concept published. Future work includes a proof of concept with software (done by the ASU Capstone team), then future development by General Dynamics. Should they choose to continue with this project, they will continue testing on FPGA boards, and perhaps future development into an ASIC. Overall the purpose of General Dynamics for proposing this project is for deep space payloads, for which this project has the most applications.
ContributorsYiin, Nathan Kehan (Author) / Clark, Lawrence (Thesis director) / Aberle, James (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
ContributorsTaylor, Karen Stephens (Performer) / ASU Library. Music Library (Publisher)
Created2018-04-21
ContributorsCramer, Craig (Performer) / ASU Library. Music Library (Publisher)
Created1997-02-16
ContributorsMarshall, Kimberly (Performer) / ASU Library. Music Library (Publisher)
Created2019-03-17