Search Content

Matching Items (3)

Filtering by

All Subjects: CGRA
Genre: Masters Thesis
Creators: Shrivastava, Aviral

Register file organization for coarse-grained reconfigurable architectures: compiler-microarchitecture perspective

Description

Coarse-Grained Reconfigurable Architectures (CGRA) are a promising fabric for improving the performance and power-efficiency of computing devices. CGRAs are composed of components that are well-optimized to execute loops and rotating register file is an example of such a component present in CGRAs. Due to the rotating nature of register indexes in rotating register file, it is very challenging, if at all possible, to hold and properly index memory addresses (pointers) and static values. In this Thesis, different structures for CGRA register files are investigated. Those structures are experimentally compared in terms of performance of mapped applications, design frequency, and area. It is shown that a register file that can logically be partitioned into rotating and non-rotating regions is an excellent choice because it imposes the minimum restriction on underlying CGRA mapping algorithm while resulting in efficient resource utilization.

ContributorsSaluja, Dipal (Author) / Shrivastava, Aviral (Thesis advisor) / Lee, Yann-Hang (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2014

Improving CGRA utilization by enabling multi-threading for power-efficient embedded systems

Description

Performance improvements have largely followed Moore's Law due to the help from technology scaling. In order to continue improving performance, power-efficiency must be reduced. Better technology has improved power-efficiency, but this has a limit. Multi-core architectures have been shown to be an additional aid to this crusade of increased power-efficiency. Accelerators are growing in popularity as the next means of achieving power-efficient performance. Accelerators such as Intel SSE are ideal, but prove difficult to program. FPGAs, on the other hand, are less efficient due to their fine-grained reconfigurability. A middle ground is found in CGRAs, which are highly power-efficient, but largely programmable accelerators. Power-efficiencies of 100s of GOPs/W have been estimated, more than 2 orders of magnitude greater than current processors. Currently, CGRAs are limited in their applicability due to their ability to only accelerate a single thread at a time. This limitation becomes especially apparent as multi-core/multi-threaded processors have moved into the mainstream. This limitation is removed by enabling multi-threading on CGRAs through a software-oriented approach. The key capability in this solution is enabling quick run-time transformation of schedules to execute on targeted portions of the CGRA. This allows the CGRA to be shared among multiple threads simultaneously. Analysis shows that enabling multi-threading has very small costs but provides very large benefits (less than 1% single-threaded performance loss but nearly 300% CGRA throughput increase). By increasing dynamism of CGRA scheduling, system performance is shown to increase overall system performance of an optimized system by almost 350% over that of a single-threaded CGRA and nearly 20x faster than the same system with no CGRA in a highly threaded environment.

ContributorsPager, Jared (Author) / Shrivastava, Aviral (Thesis advisor) / Gupta, Sandeep (Committee member) / Speyer, Gil (Committee member) / Arizona State University (Publisher)

Created2011

COMSAT: Modified Modulo Scheduling Techniques for Acceleration on Unknown Trip Count and Early Exit Loops

Description

Coarse-grain reconfigurable architectures (CGRAs) have shown significant improvements as hardware accelerator whilst demanding low power. Such acceleration inherits from the nature of instruction-level parallelism and exploited by many techniques. Modulo scheduling is a popular approach to software pipelining techniques that provides an efficient heuristic to accelerations on loops, repetitive regions of an application. Existing scheduling algorithms for modulo scheduling heuristic persist on loop exiting problems that limit CGRA acceleration to only loops with known trip count and no exit statements. Another notable limitation is the early exit problem, where loops can only terminate after certain iterations as CGRA moves to kernel stage. In attempts to circumvent such obstacles, COMSAT introduces a modified modulo scheduling technique that acts as an external module and can be applied to any existing scheduling/mapping algorithms with minimal hardware changes. Experiments from MiBench and Rodinia benchmark suites have shown that COMSAT achieved an average speedup of 3x in overall benchmarks and 10x speedup in kernel regions. Without COMSAT techniques, only 25% of said loops would have been able to accelerate, reducing benchmark and kernel speedups to 1.25x and 3.63x respectively.

ContributorsTa, Vinh (Author) / Shrivastava, Aviral (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kinsey, Michel (Committee member) / Arizona State University (Publisher)

Created2022