Matching Items (2)
158894-Thumbnail Image.png
Description
QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square
on-square matrix. It has a wide range
of applications especially in Multiple Input-Multiple Output (MIMO) communication
systems. Unfortunately it has high computation complexity { for matrix size of nxn,
QRD has O(n3) complexity

QR decomposition (QRD) of a matrix is one of the most common linear algebra operationsused for the decomposition of a square
on-square matrix. It has a wide range
of applications especially in Multiple Input-Multiple Output (MIMO) communication
systems. Unfortunately it has high computation complexity { for matrix size of nxn,
QRD has O(n3) complexity and back substitution, which is used to solve a system
of linear equations, has O(n2) complexity. Thus, as the matrix size increases, the
hardware resource requirement for QRD and back substitution increases signicantly.
This thesis presents the design and implementation of a
exible QRD and back substitution accelerator using a folded architecture. It can support matrix sizes of
4x4, 8x8, 12x12, 16x16, and 20x20 with low hardware resource requirement.
The proposed architecture is based on the systolic array implementation of the
Givens algorithm for QRD. It is built with three dierent types of computation blocks
which are connected in a 2-D array structure. These blocks are controlled by a
scheduler which facilitates reusability of the blocks to perform computation for any
input matrix size which is a multiple of 4. These blocks are designed using two
basic programming elements which support both the forward and backward paths to
compute matrix R in QRD and column-matrix X in back substitution computation.
The proposed architecture has been mapped to Xilinx Zynq Ultrascale+ FPGA
(Field Programmable Gate Array), ZCU102. All inputs are complex with precision
of 40 bits (38 fractional bits and 1 signed bit). The architecture can be clocked at
50 MHz. The synthesis results of the folded architecture for dierent matrix sizes
are presented. The results show that the folded architecture can support QRD and
back substitution for inputs of large sizes which otherwise cannot t on an FPGA
when implemented using a
at architecture. The memory sizes required for dierent
matrix sizes are also presented.
ContributorsKanagala, Srimayee (Author) / Chakrabarti, Chaitali (Thesis advisor) / Bliss, Daniel (Committee member) / Cao, Yu (Kevin) (Committee member) / Arizona State University (Publisher)
Created2020
168530-Thumbnail Image.png
Description
Edge computing applications have recently gained prominence as the world of internet-of-things becomes increasingly embedded into people's lives. Performing computations at the edge addresses multiple issues, such as memory bandwidth-latency bottlenecks, exposure of sensitive data to external attackers, etc. It is important to protect the data collected and processed by

Edge computing applications have recently gained prominence as the world of internet-of-things becomes increasingly embedded into people's lives. Performing computations at the edge addresses multiple issues, such as memory bandwidth-latency bottlenecks, exposure of sensitive data to external attackers, etc. It is important to protect the data collected and processed by edge devices, and also to prevent unauthorized access to such data. It is also important to ensure that the computing hardware fits well within the tight energy and area budgets for the edge devices which are being progressively scaled-down in size. Firstly, a novel low-power smart security prototype chip that combines multiple entropy sources, such as real-time electrocardiogram (ECG) data, and SRAM-based physical unclonable functions (PUF), for authentication and cryptography applications is proposed. Up to ~12X improvement in the equal error rate compared to a prior ECG-only authentication system is achieved by combining feature vectors obtained from ECG, heart rate variability, and SRAM PUF. The resulting vectors can also be utilized for secure cryptography applications. Secondly, a novel in-memory computing (IMC) hardware noise-aware training algorithms that make DNNs more robust to hardware noise is developed and evaluated. Up to 17% accuracy was recovered in deep neural networks (DNNs) deployed on IMC prototype hardware. The noise-aware training principles are also used to improve the adversarial robustness of DNNs, and successfully defend against both adversarial input and weight attacks. Up to ~10\% improvement in robustness against adversarial input attacks, and up to 33% improvement in robustness against adversarial weight attacks are achieved. Finally, a DNN training algorithm that pursues and optimises both activation and weight sparsity simultaneously is proposed and evaluated to obtain highly compressed DNNs. This lead to up to 4.7x reduction in the total number of flops required to perform complex image recognition tasks. A custom sparse inference accelerator is designed and synthesized to evaluate the benefits of the above flop reduction. A speedup of 4.24x is achieved. In summary, this dissertation contains innovative algorithm and hardware design techniques aided by machine learning, which enhance the security and efficiency of edge computing applications.
ContributorsCherupally, Sai Kiran (Author) / Seo, Jae-Sun (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Cao, Yu (Kevin) (Committee member) / Fan, Deliang (Committee member) / Arizona State University (Publisher)
Created2022