Matching Items (2)
Filtering by

Clear all filters

155631-Thumbnail Image.png
Description
The information era has brought about many technological advancements in the past

few decades, and that has led to an exponential increase in the creation of digital images and

videos. Constantly, all digital images go through some image processing algorithm for

various reasons like compression, transmission, storage, etc. There is data loss during

The information era has brought about many technological advancements in the past

few decades, and that has led to an exponential increase in the creation of digital images and

videos. Constantly, all digital images go through some image processing algorithm for

various reasons like compression, transmission, storage, etc. There is data loss during this

process which leaves us with a degraded image. Hence, to ensure minimal degradation of

images, the requirement for quality assessment has become mandatory. Image Quality

Assessment (IQA) has been researched and developed over the last several decades to

predict the quality score in a manner that agrees with human judgments of quality. Modern

image quality assessment (IQA) algorithms are quite effective at prediction accuracy, and

their development has not focused on improving computational performance. The existing

serial implementation requires a relatively large run-time on the order of seconds for a single

frame. Hardware acceleration using Field programmable gate arrays (FPGAs) provides

reconfigurable computing fabric that can be tailored for a broad range of applications.

Usually, programming FPGAs has required expertise in hardware descriptive languages

(HDLs) or high-level synthesis (HLS) tool. OpenCL is an open standard for cross-platform,

parallel programming of heterogeneous systems along with Altera OpenCL SDK, enabling

developers to use FPGA's potential without extensive hardware knowledge. Hence, this

thesis focuses on accelerating the computationally intensive part of the most apparent

distortion (MAD) algorithm on FPGA using OpenCL. The results are compared with CPU

implementation to evaluate performance and efficiency gains.
ContributorsGunavelu Mohan, Aswin (Author) / Sohoni, Sohum (Thesis advisor) / Ren, Fengbo (Thesis advisor) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)
Created2017
158677-Thumbnail Image.png
Description
Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute intensive CNNs. Graphics Processing Units (GPUs), with massive processing capability,

Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute intensive CNNs. Graphics Processing Units (GPUs), with massive processing capability, have been of general interest for the acceleration of CNNs. Recently, Field Programmable Gate Arrays (FPGAs) have been promising in CNN acceleration since they offer high performance while also being re-configurable to support the evolution of CNNs. This work focuses on a design methodology to accelerate CNNs on FPGA with low inference latency and high-throughput which are crucial for scenarios like self-driving cars, video surveillance etc. It also includes optimizations which reduce the resource utilization by a large margin with a small degradation in performance thus making the design suitable for low-end FPGA devices as well.

FPGA accelerators often suffer due to the limited main memory bandwidth. Also, highly parallel designs with large resource utilization often end up achieving low operating frequency due to poor routing. This work employs data fetch and buffer mechanisms, designed specifically for the memory access pattern of CNNs, that overlap computation with memory access. This work proposes a novel arrangement of the systolic processing element array to achieve high frequency and consume less resources than the existing works. Also, support has been extended to more complicated CNNs to do video processing. On Intel Arria 10 GX1150, the design operates at a frequency as high as 258MHz and performs single inference of VGG-16 and C3D in 23.5ms and 45.6ms respectively. For VGG-16 and C3D the design offers a throughput of 66.1 and 23.98 inferences/s respectively. This design can outperform other FPGA 2D CNN accelerators by up to 9.7 times and 3D CNN accelerators by up to 2.7 times.
ContributorsRavi, Pravin Kumar (Author) / Zhao, Ming (Thesis advisor) / Li, Baoxin (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)
Created2020