Search Content

A fast fluid simulator using smoothed-particle hydrodynamics

Description

This document presents a new implementation of the Smoothed Particles Hydrodynamics algorithm using DirectX 11 and DirectCompute. The main goal of this document is to present to the reader an alternative solution to the largely studied and researched problem of fluid simulation. Most other solutions have been implemented using the…

This document presents a new implementation of the Smoothed Particles Hydrodynamics algorithm using DirectX 11 and DirectCompute. The main goal of this document is to present to the reader an alternative solution to the largely studied and researched problem of fluid simulation. Most other solutions have been implemented using the NVIDIA CUDA framework; however, the proposed solution in this document uses the Microsoft general-purpose computing on graphics processing units API. The implementation allows for the simulation of a large number of particles in a real-time scenario. The solution presented here uses the Smoothed Particles Hydrodynamics algorithm to calculate the forces within the fluid; this algorithm provides a Lagrangian approach for discretizes the Navier-Stockes equations into a set of particles. Our solution uses the DirectCompute compute shaders to evaluate each particle using the multithreading and multi-core capabilities of the GPU increasing the overall performance. The solution then describes a method for extracting the fluid surface using the Marching Cubes method and the programmable interfaces exposed by the DirectX pipeline. Particularly, this document presents a method for using the Geometry Shader Stage to generate the triangle mesh as defined by the Marching Cubes method. The implementation results show the ability to simulate over 64K particles at a rate of 900 and 400 frames per second, not including the surface reconstruction steps and including the Marching Cubes steps respectively.

ContributorsFigueroa, Gustavo (Author) / Farin, Gerald (Thesis advisor) / Maciejewski, Ross (Committee member) / Wang, Yalin (Committee member) / Arizona State University (Publisher)

Created2012

The C++ implementation of refined level set grid (RLSG) method

Description

In this thesis, a FORTRAN code is rewritten in C++ with an object oriented ap-

proach. There are several reasons for this purpose. The first reason is to establish

the basis of a GPU programming. To write programs that utilize GPU hardware,

CUDA or OpenCL is used which only support C and C++.…

In this thesis, a FORTRAN code is rewritten in C++ with an object oriented ap-

proach. There are several reasons for this purpose. The first reason is to establish

the basis of a GPU programming. To write programs that utilize GPU hardware,

CUDA or OpenCL is used which only support C and C++. FORTRAN has a feature

that lets its programs to call C/C++ functions. FORTRAN sends relevant data to

C/C++, which in turn sends that data to OpenCL. Although this approach works,

it makes the code messy and bulky and in the end more difficult to deal with. More-

over, there is a slight performance decrease from the additional data copy. This is

the motivation to have the code entirely written in C++ to make it more uniform,

efficient and clean. The second reason is the object oriented feature of the C++. The

“abstraction”, “inheritance” and “run-time polymorphism” features of C++ provide

some form of classes and objects, the ability to build new abstractions, and some

form of run-time binding, respectively. In recent years, some of popular codes has

been rewritten in C++ which were initially in FORTRAN. One of these softwares is

LAMMPS.

In this code the level set equation is solved by RLSG method to track the interface in

two phase flow. In gas/fluid flows, the surface tension is important and only exists at

the interface. Therefore, the location and some geometric features of interface need

to be evaluated which can be achieved by solving the level set equation.

ContributorsSafarkhani, Salar (Author) / Herrmann, Mrcus (Thesis advisor) / Oswald, Jay (Committee member) / Rykczewski, Konrad (Committee member) / Arizona State University (Publisher)

Created2015

GPGPU based implementation of BLIINDS-II NR-IQA

Description

The technological advances in the past few decades have made possible creation and consumption of digital visual content at an explosive rate. Consequently, there is a need for efficient quality monitoring systems to ensure minimal degradation of images and videos during various processing operations like compression, transmission, storage etc. Objective…

The technological advances in the past few decades have made possible creation and consumption of digital visual content at an explosive rate. Consequently, there is a need for efficient quality monitoring systems to ensure minimal degradation of images and videos during various processing operations like compression, transmission, storage etc. Objective Image Quality Assessment (IQA) algorithms have been developed that predict quality scores which match well with human subjective quality assessment. However, a lot of research still remains to be done before IQA algorithms can be deployed in real world systems. Long runtimes for one frame of image is a major hurdle. Graphics Processing Units (GPUs), equipped with massive number of computational cores, provide an opportunity to accelerate IQA algorithms by performing computations in parallel. Indeed, General Purpose Graphics Processing Units (GPGPU) techniques have been applied to a few Full Reference IQA algorithms which fall under the. We present a GPGPU implementation of Blind Image Integrity Notator using DCT Statistics (BLIINDS-II), which falls under the No Reference IQA algorithm paradigm. We have been able to achieve a speedup of over 30x over the previous CPU version of this algorithm. We test our implementation using various distorted images from the CSIQ database and present the performance trends observed. We achieve a very consistent performance of around 9 milliseconds per distorted image, which made possible the execution of over 100 images per second (100 fps).

ContributorsYadav, Aman (Author) / Sohoni, Sohum (Thesis advisor) / Aukes, Daniel (Committee member) / Redkar, Sangram (Committee member) / Arizona State University (Publisher)

Created2016

Analysis of hardware usage of shuffle instruction based performance optimization in the Blinds-II image quality assessment algorithm

Description

With the advent of GPGPU, many applications are being accelerated by using CUDA programing paradigm. We are able to achieve around 10x -100x speedups by simply porting the application on to the GPU and running the parallel chunk of code on its multi cored SIMT (Single instruction multiple thread) architecture.…

With the advent of GPGPU, many applications are being accelerated by using CUDA programing paradigm. We are able to achieve around 10x -100x speedups by simply porting the application on to the GPU and running the parallel chunk of code on its multi cored SIMT (Single instruction multiple thread) architecture. But for optimal performance it is necessary to make sure that all the GPU resources are efficiently used, and the latencies in the application are minimized. For this, it is essential to monitor the Hardware usage of the algorithm and thus diagnose the compute and memory bottlenecks in the implementation. In the following thesis, we will be analyzing the mapping of CUDA implementation of BLIINDS-II algorithm on the underlying GPU hardware, and come up with a Kepler architecture specific solution of using shuffle instruction via CUB library to tackle the two major bottlenecks in the algorithm. Experiments were conducted to convey the advantage of using shuffle instru3ction in algorithm over only using shared memory as a buffer to global memory. With the new implementation of BLIINDS-II algorithm using CUB library, a speedup of around 13.7% was achieved.

ContributorsWadekar, Ameya (Author) / Sohoni, Sohum (Thesis advisor) / Aukes, Daniel (Committee member) / Redkar, Sangram (Committee member) / Arizona State University (Publisher)

Created2017

An analysis of the memory bottleneck and cache performance of most apparent distortion image quality assessment algorithm on GPU

Description

As digital images are transmitted over the network or stored on a disk, image processing is done as part of the standard for efficient storage and bandwidth. This causes some amount of distortion or artifacts in the image which demands the need for quality assessment. Subjective image quality assessment is…

As digital images are transmitted over the network or stored on a disk, image processing is done as part of the standard for efficient storage and bandwidth. This causes some amount of distortion or artifacts in the image which demands the need for quality assessment. Subjective image quality assessment is expensive, time consuming and influenced by the subject's perception. Hence, there is a need for developing mathematical models that are capable of predicting the quality evaluation. With the advent of the information era and an exponential growth in image/video generation and consumption, the requirement for automated quality assessment has become mandatory to assess the degradation. The last few decades have seen research on automated image quality assessment (IQA) algorithms gaining prominence. However, the focus has been on achieving better predication accuracy, and not on improving computational performance. As a result, existing serial implementations require a lot of time in processing a single frame. In the last 5 years, research on general-purpose graphic processing unit (GPGPU) based image quality assessment (IQA) algorithm implementation has shown promising results for single images. Still, the implementations are not efficient enough for deployment in real world applications, especially for live videos at high resolution. Hence, in this thesis, it is proposed that microarchitecture-conscious coding on a graphics processing unit (GPU) combined with detailed understanding of the image quality assessment (IQA) algorithm can result in non-trivial speedups without compromising quality prediction accuracy. This document focusses on the microarchitectural analysis of the most apparent distortion (MAD) algorithm. The results are analyzed in-depth and one of the major bottlenecks is identified. With the knowledge of underlying microarchitecture, the implementation is restructured thereby resolving the bottleneck and improving the performance.

ContributorsKannan, Vignesh (Author) / Sohoni, Sohum (Thesis advisor) / Ren, Fengbo (Committee member) / Sayeed, Mohamed (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by

A fast fluid simulator using smoothed-particle hydrodynamics

The C++ implementation of refined level set grid (RLSG) method

GPGPU based implementation of BLIINDS-II NR-IQA

Analysis of hardware usage of shuffle instruction based performance optimization in the Blinds-II image quality assessment algorithm

An analysis of the memory bottleneck and cache performance of most apparent distortion image quality assessment algorithm on GPU