Reconfigurable High-Performance Computing of Sparse Linear Algebra

Bank Tavakoli, Erfan

This thesis presents novel software/hardware co-design methodologies aimed at accelerating sparse linear algebra applications within the realm of High-Performance Computing (HPC). The motivation stems from the limitations of conventional CPU- and GPU-based solutions for sparse linear algebra, which are hindered…

This thesis presents novel software/hardware co-design methodologies aimed at accelerating sparse linear algebra applications within the realm of High-Performance Computing (HPC). The motivation stems from the limitations of conventional CPU- and GPU-based solutions for sparse linear algebra, which are hindered by fixed hardware architecture and memory hierarchy, frequent off-chip memory access, and high energy consumption. In response, this work explores the deployment of Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) to overcome these challenges through their customized nature, offering performance and energy efficiency gains. The scope of the thesis is divided into three main parts: firstly, it introduces a framework that combines an FPGA computational kernel with a novel scheduling algorithm running on a host processor for accelerating the supernodal multifrontal algorithm for sparse Cholesky factorization. This approach minimizes off-chip memory access and on-chip memory requirements by efficiently managing data dependencies and enhancing data locality. Secondly, it presents FSpGEMM, an OpenCL-based framework for accelerating general sparse matrix-matrix multiplication on FPGAs. FSpGEMM exploits a new compressed sparse vector format (CSV) and a custom buffering scheme tailored to Gustavson's algorithm, significantly improving computational performance by optimizing memory access patterns. Additionally, a row reordering technique is utilized to increase the data reuse enabled by the CSV format. Lastly, the thesis proposes an ASIC design for Sparse Tensor Core, which utilizes a Hardware Merge Sorter to increase parallelism in processing units without compromising operating frequency, offering a high-speed solution for sparse linear algebra operations. In summary, the thesis addresses the challenges of implementing sparse linear algebra algorithms on FPGAs and ASICs, such as the complexity of data dependencies and the need for efficient memory management. By proposing solutions that enhance computational performance, reduce energy consumption, and improve the usability of FPGAs and ASICs in HPC infrastructures, this work contributes to computational science, offering a pathway toward more efficient and sustainable computing for complex, data-intensive applications.

Copyright Statement