ASU Electronic Theses and Dissertations
This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.
In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.
Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.
Filtering by
- All Subjects: accelerator
- Creators: Chakrabarti, Chaitali
heterogeneous designs consisting of specialized cores to achieve higher performance
and energy efficiency for a target application domain. Applications of linear algebra
are ubiquitous in the field of scientific computing, machine learning, statistics,
etc. with matrix computations being fundamental to these linear algebra based solutions.
Design of multiple dense (or sparse) matrix computation routines on the
same platform is quite challenging. Added to the complexity is the fact that dense
and sparse matrix computations have large differences in their storage and access
patterns and are difficult to optimize on the same architecture. This thesis addresses
this challenge and introduces a reconfigurable accelerator that supports both dense
and sparse matrix computations efficiently.
The reconfigurable architecture has been optimized to execute the following linear
algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM
(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),
LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),
SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where
each core consists of a 2D array of processing elements (PE).
The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix
updates efficiently. A sequence of such updates is used to solve a larger problem inside
a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)
is used to perform sparse kernel updates. Scalable partitioning and mapping schemes
are presented that map input matrices of any given size to the multicore architecture.
Design trade-offs related to the PE array dimension, size of local memory inside a core
and the bandwidth between on-chip memories and the cores have been presented. An
optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto
32 GOPS using a single core.
Motivated by the aforementioned concerns, this dissertation comprehensively investigates the emerging efficiency and security issues of DNNs, from both software and hardware design perspectives. From the efficiency perspective, as the foundation technique for efficient inference of target DNN, the model compression via quantization is elaborated. In order to maximize the inference performance boost, the deployment of quantized DNN on the revolutionary Computing-in-Memory based neural accelerator is presented in a cross-layer (device/circuit/system) fashion. From the security perspective, the well known adversarial attack is investigated spanning from its original input attack form (aka. Adversarial example generation) to its parameter attack variant.
In this thesis, algorithm and hardware design techniques to support a hand-held 3-D ultrasound imaging system are proposed. Synthetic aperture sequential beamforming (SASB) is chosen since its computations can be split into two stages, where the output generated of Stage 1 is significantly smaller in size compared to the input. This characteristic enables Stage 1 to be done in the front end while Stage 2 can be sent out to be processed elsewhere.
The contributions of this thesis are as follows. First, 2-D SASB is extended to 3-D. Techniques to increase the volume rate of 3-D SASB through a new multi-line firing scheme and use of linear chirp as the excitation waveform, are presented. A new sparse array design that not only reduces the number of active transducers but also avoids the imaging degradation caused by grating lobes, is proposed. A combination of these techniques increases the volume rate of 3-D SASB by 4\texttimes{} without introducing extra computations at the front end.
Next, algorithmic techniques to further reduce the Stage 1 computations in the front end are presented. These include reducing the number of distinct apodization coefficients and operating with narrow-bit-width fixed-point data. A 3-D die stacked architecture is designed for the front end. This highly parallel architecture enables the signals received by 961 active transducers to be digitalized, routed by a network-on-chip, and processed in parallel. The processed data are accumulated through a bus-based structure. This architecture is synthesized using TSMC 28 nm technology node and the estimated power consumption of the front end is less than 2 W.
Finally, the Stage 2 computations are mapped onto a reconfigurable multi-core architecture, TRANSFORMER, which supports different types of on-chip memory banks and run-time reconfigurable connections between general processing elements and memory banks. The matched filtering step and the beamforming step in Stage 2 are mapped onto TRANSFORMER with different memory configurations. Gem5 simulations show that the private cache mode generates shorter execution time and higher computation efficiency compared to other cache modes. The overall execution time for Stage 2 is 14.73 ms. The average power consumption and the average Giga-operations-per-second/Watt in 14 nm technology node are 0.14 W and 103.84, respectively.