Code Generation Framework for Fine-Grained Reconfigurable Array Architectures

Murugan, Narayanan

Digital signal processing accelerator architectures are designed to provide either high-energy efficiency or high programmability depending on the targeted application and use case. For example, Domain Adaptive Processor (DAP), a highly reconfigurable array architecture, designed by University of Michigan, for…

Digital signal processing accelerator architectures are designed to provide either high-energy efficiency or high programmability depending on the targeted application and use case. For example, Domain Adaptive Processor (DAP), a highly reconfigurable array architecture, designed by University of Michigan, for signal processing workloads is highly energy efficient but difficult to program. DAP consists of 8x8 array of Processing elements (PE) with each PE containing four heterogeneous SUB-PEs. Each SUB-PE has its own instruction memory and is capable of executing Very Large Instruction Word (VLIW) instructions. Unfortunately, instructions have to be written for every cycle of computation for each SUB-PE used in the application and handcrafted such that all the inter-PE dependencies are synchronized. This thesis builds up on prior work at Arizona State University(ASU) to make DAP more programmable. First, the compiler back-end developed at ASU is extended with more features. Prior work introduced DAP Instruction Set Architecture (ISA), an assembly instruction format, and proposed a compiler framework, called DAP Assembler, with optimization passes to reduce the complexity of programming applications in DAP. While this back-end infrastructure helped generated code with relative ease compared to Very Large Instruction Word (VLIW) code by hand, the output of the code generated was not software-pipelined and the code generated for each Processing Element(PE) had to be manually synchronized. So in this thesis, DAP Assembler tool is extended to support software-pipelining for high throughput applications. Further, a generic synchronization tool is proposed to synchronize instructions in a multi-PE setup and integrated with DAP Assembler to generate synchronized high-throughput application code. Second, a Multi-Level Intermediate Representation(MLIR) based compiler front-end infrastructure is proposed to first lower the application code written by the programmer to an Intermediate Representation (IR) that is suitable for generic array architectures and then further converted to DAP-specific IR that can be used for generating machine code for DAP using DAP ISA. This two stage process enables this infrastructure to be more easily adapted to other array architectures. The first conversion pass uses a designer-provided modular hardware architecture information, called Resource Registry, to allocate operations in the input IR to resources in the Resource registry and capture all data movement. While the resource registry changes from architecture to architecture, the conversion pass algorithm is generic and can be used for other architectures. The second conversion pass is more geared towards DAP and integrates DAP specific constructs to generate optimized instruction in DAP ISA. Multiple kernels such as matrix multiplication, vector-vector addition were implemented using this infrastructure and the code generated by the tool verified to be functionally correct.

Copyright Statement