Matching Items (2)
Filtering by

Clear all filters

158677-Thumbnail Image.png
Description
Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute intensive CNNs. Graphics Processing Units (GPUs), with massive processing capability,

Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute intensive CNNs. Graphics Processing Units (GPUs), with massive processing capability, have been of general interest for the acceleration of CNNs. Recently, Field Programmable Gate Arrays (FPGAs) have been promising in CNN acceleration since they offer high performance while also being re-configurable to support the evolution of CNNs. This work focuses on a design methodology to accelerate CNNs on FPGA with low inference latency and high-throughput which are crucial for scenarios like self-driving cars, video surveillance etc. It also includes optimizations which reduce the resource utilization by a large margin with a small degradation in performance thus making the design suitable for low-end FPGA devices as well.

FPGA accelerators often suffer due to the limited main memory bandwidth. Also, highly parallel designs with large resource utilization often end up achieving low operating frequency due to poor routing. This work employs data fetch and buffer mechanisms, designed specifically for the memory access pattern of CNNs, that overlap computation with memory access. This work proposes a novel arrangement of the systolic processing element array to achieve high frequency and consume less resources than the existing works. Also, support has been extended to more complicated CNNs to do video processing. On Intel Arria 10 GX1150, the design operates at a frequency as high as 258MHz and performs single inference of VGG-16 and C3D in 23.5ms and 45.6ms respectively. For VGG-16 and C3D the design offers a throughput of 66.1 and 23.98 inferences/s respectively. This design can outperform other FPGA 2D CNN accelerators by up to 9.7 times and 3D CNN accelerators by up to 2.7 times.
ContributorsRavi, Pravin Kumar (Author) / Zhao, Ming (Thesis advisor) / Li, Baoxin (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)
Created2020
132211-Thumbnail Image.png
Description
As the Internet of Things continues to expand, not only must our computing power grow
alongside it, our very approach must evolve. While the recent trend has been to centralize our
computing resources in the cloud, it now looks beneficial to push more computing power
towards the “edge” with so called edge computing,

As the Internet of Things continues to expand, not only must our computing power grow
alongside it, our very approach must evolve. While the recent trend has been to centralize our
computing resources in the cloud, it now looks beneficial to push more computing power
towards the “edge” with so called edge computing, reducing the immense strain on cloud
servers and the latency experienced by IoT devices. A new computing paradigm also brings
new opportunities for innovation, and one such innovation could be the use of FPGAs as edge
servers. In this research project, I learn the design flow for developing OpenCL kernels and
custom FPGA BSPs. Using these tools, I investigate the viability of using FPGAs as standalone
edge computing devices. Concluding that—although the technology is a great fit—the current
necessity of dynamically reprogrammable FPGAs to be closely coupled with a host CPU is
holding them back from this purpose. I propose a modification to the architecture of the Intel
Arria 10 GX that would allow it to be decoupled from its host CPU, allowing it to truly serve as a
viable edge computing solution.
ContributorsBarth, Brandon Albert (Author) / Ren, Fengbo (Thesis director) / Vrudhula, Sarma (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2019-05