Search Content

A fast fluid simulator using smoothed-particle hydrodynamics

Description

This document presents a new implementation of the Smoothed Particles Hydrodynamics algorithm using DirectX 11 and DirectCompute. The main goal of this document is to present to the reader an alternative solution to the largely studied and researched problem of fluid simulation. Most other solutions have been implemented using the…

This document presents a new implementation of the Smoothed Particles Hydrodynamics algorithm using DirectX 11 and DirectCompute. The main goal of this document is to present to the reader an alternative solution to the largely studied and researched problem of fluid simulation. Most other solutions have been implemented using the NVIDIA CUDA framework; however, the proposed solution in this document uses the Microsoft general-purpose computing on graphics processing units API. The implementation allows for the simulation of a large number of particles in a real-time scenario. The solution presented here uses the Smoothed Particles Hydrodynamics algorithm to calculate the forces within the fluid; this algorithm provides a Lagrangian approach for discretizes the Navier-Stockes equations into a set of particles. Our solution uses the DirectCompute compute shaders to evaluate each particle using the multithreading and multi-core capabilities of the GPU increasing the overall performance. The solution then describes a method for extracting the fluid surface using the Marching Cubes method and the programmable interfaces exposed by the DirectX pipeline. Particularly, this document presents a method for using the Geometry Shader Stage to generate the triangle mesh as defined by the Marching Cubes method. The implementation results show the ability to simulate over 64K particles at a rate of 900 and 400 frames per second, not including the surface reconstruction steps and including the Marching Cubes steps respectively.

ContributorsFigueroa, Gustavo (Author) / Farin, Gerald (Thesis advisor) / Maciejewski, Ross (Committee member) / Wang, Yalin (Committee member) / Arizona State University (Publisher)

Created2012

Exploring latent structure in data: algorithms and implementations

Description

Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many…

Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many efforts to generate data-driven representations using clustering and sparse models. This dissertation focuses on building data-driven unsupervised models for analyzing raw data and developing efficient feature representations.

Simultaneous segmentation and feature extraction approaches for silicon-pores sensor data are considered. Aggregating data into a matrix and performing low rank and sparse matrix decompositions with additional smoothness constraints are proposed to solve this problem. Comparison of several variants of the approaches and results for signal de-noising and translocation/trapping event extraction are presented. Algorithms to improve transform-domain features for ion-channel time-series signals based on matrix completion are presented. The improved features achieve better performance in classification tasks and in reducing the false alarm rates when applied to analyte detection.

Developing representations for multimedia is an important and challenging problem with applications ranging from scene recognition, multi-media retrieval and personal life-logging systems to field robot navigation. In this dissertation, we present a new framework for feature extraction for challenging natural environment sounds. Proposed features outperform traditional spectral features on challenging environmental sound datasets. Several algorithms are proposed that perform supervised tasks such as recognition and tag annotation. Ensemble methods are proposed to improve the tag annotation process.

To facilitate the use of large datasets, fast implementations are developed for sparse coding, the key component in our algorithms. Several strategies to speed-up Orthogonal Matching Pursuit algorithm using CUDA kernel on a GPU are proposed. Implementations are also developed for a large scale image retrieval system. Image-based "exact search" and "visually similar search" using the image patch sparse codes are performed. Results demonstrate large speed-up over CPU implementations and good retrieval performance is also achieved.

ContributorsSattigeri, Prasanna S (Author) / Spanias, Andreas (Thesis advisor) / Thornton, Trevor (Committee member) / Goryll, Michael (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2014

Power-Performance Modeling and Adaptive Management of Heterogeneous Mobile Platforms

Description

Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by…

Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful SoC with numerous other resources, including display, memory, power management IC, battery and wireless modems. Furthermore, the SoC itself is a heterogeneous resource that integrates many processing elements, such as CPU cores, GPU, video, image, and audio processors. Therefore, CPU cores do not dominate the platform power consumption under many application scenarios.

Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads.

ContributorsGupta, Ujjwala (Author) / Ogras, Umit Y. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kishinevsky, Michael (Committee member) / Dutt, Nikil (Committee member) / Arizona State University (Publisher)

Created2018

A GPU accelerated discontinuous Galerkin conservative level set method for simulating atomization

Description

This dissertation describes a process for interface capturing via an arbitrary-order, nearly quadrature free, discontinuous Galerkin (DG) scheme for the conservative level set method (Olsson et al., 2005, 2008). The DG numerical method is utilized to solve both advection and reinitialization, and executed on a refined level set grid (Herrmann,…

This dissertation describes a process for interface capturing via an arbitrary-order, nearly quadrature free, discontinuous Galerkin (DG) scheme for the conservative level set method (Olsson et al., 2005, 2008). The DG numerical method is utilized to solve both advection and reinitialization, and executed on a refined level set grid (Herrmann, 2008) for effective use of processing power. Computation is executed in parallel utilizing both CPU and GPU architectures to make the method feasible at high order. Finally, a sparse data structure is implemented to take full advantage of parallelism on the GPU, where performance relies on well-managed memory operations.

With solution variables projected into a kth order polynomial basis, a k+1 order convergence rate is found for both advection and reinitialization tests using the method of manufactured solutions. Other standard test cases, such as Zalesak's disk and deformation of columns and spheres in periodic vortices are also performed, showing several orders of magnitude improvement over traditional WENO level set methods. These tests also show the impact of reinitialization, which often increases shape and volume errors as a result of level set scalar trapping by normal vectors calculated from the local level set field.

Accelerating advection via GPU hardware is found to provide a 30x speedup factor comparing a 2.0GHz Intel Xeon E5-2620 CPU in serial vs. a Nvidia Tesla K20 GPU, with speedup factors increasing with polynomial degree until shared memory is filled. A similar algorithm is implemented for reinitialization, which relies on heavier use of shared and global memory and as a result fills them more quickly and produces smaller speedups of 18x.

ContributorsJibben, Zechariah J (Author) / Herrmann, Marcus (Thesis advisor) / Squires, Kyle (Committee member) / Adrian, Ronald (Committee member) / Chen, Kangping (Committee member) / Treacy, Michael (Committee member) / Arizona State University (Publisher)

Created2015

Dynamics, modeling, simulation and control of mid-flight coupling of quadrotors

Description

Unmanned aerial vehicles have received increased attention in the last decade due to their versatility, as well as the availability of inexpensive sensors (e.g. GPS, IMU) for their navigation and control. Multirotor vehicles, specifically quadrotors, have formed a fast growing field in robotics, with the range of applications spanning from…

Unmanned aerial vehicles have received increased attention in the last decade due to their versatility, as well as the availability of inexpensive sensors (e.g. GPS, IMU) for their navigation and control. Multirotor vehicles, specifically quadrotors, have formed a fast growing field in robotics, with the range of applications spanning from surveil- lance and reconnaissance to agriculture and large area mapping. Although in most applications single quadrotors are used, there is an increasing interest in architectures controlling multiple quadrotors executing a collaborative task. This thesis introduces a new concept of control involving more than one quadrotors, according to which two quadrotors can be physically coupled in mid-flight. This concept equips the quadro- tors with new capabilities, e.g. increased payload or pursuit and capturing of other quadrotors. A comprehensive simulation of the approach is built to simulate coupled quadrotors. The dynamics and modeling of the coupled system is presented together with a discussion regarding the coupling mechanism, impact modeling and additional considerations that have been investigated. Simulation results are presented for cases of static coupling as well as enemy quadrotor pursuit and capture, together with an analysis of control methodology and gain tuning. Practical implementations are introduced as results show the feasibility of this design.

ContributorsLarsson, Daniel (Author) / Artemiadis, Panagiotis (Thesis advisor) / Marvi, Hamidreza (Committee member) / Berman, Spring (Committee member) / Arizona State University (Publisher)

Created2016

A new image quantitative method for diagnosis and therapeutic response

Description

Accurate quantitative information of tumor/lesion volume plays a critical role

in diagnosis and treatment assessment. The current clinical practice emphasizes on efficiency, but sacrifices accuracy (bias and precision). In the other hand, many computational algorithms focus on improving the accuracy, but are often time consuming and cumbersome to use. Not to…

Accurate quantitative information of tumor/lesion volume plays a critical role

in diagnosis and treatment assessment. The current clinical practice emphasizes on efficiency, but sacrifices accuracy (bias and precision). In the other hand, many computational algorithms focus on improving the accuracy, but are often time consuming and cumbersome to use. Not to mention that most of them lack validation studies on real clinical data. All of these hinder the translation of these advanced methods from benchside to bedside.

In this dissertation, I present a user interactive image application to rapidly extract accurate quantitative information of abnormalities (tumor/lesion) from multi-spectral medical images, such as measuring brain tumor volume from MRI. This is enabled by a GPU level set method, an intelligent algorithm to learn image features from user inputs, and a simple and intuitive graphical user interface with 2D/3D visualization. In addition, a comprehensive workflow is presented to validate image quantitative methods for clinical studies.

This application has been evaluated and validated in multiple cases, including quantifying healthy brain white matter volume from MRI and brain lesion volume from CT or MRI. The evaluation studies show that this application has been able to achieve comparable results to the state-of-the-art computer algorithms. More importantly, the retrospective validation study on measuring intracerebral hemorrhage volume from CT scans demonstrates that not only the measurement attributes are superior to the current practice method in terms of bias and precision but also it is achieved without a significant delay in acquisition time. In other words, it could be useful to the clinical trials and clinical practice, especially when intervention and prognostication rely upon accurate baseline lesion volume or upon detecting change in serial lesion volumetric measurements. Obviously, this application is useful to biomedical research areas which desire an accurate quantitative information of anatomies from medical images. In addition, the morphological information is retained also. This is useful to researches which require an accurate delineation of anatomic structures, such as surgery simulation and planning.

ContributorsXue, Wenzhe (Author) / Kaufman, David (Thesis advisor) / Mitchell, J. Ross (Thesis advisor) / Johnson, William (Committee member) / Scotch, Matthew (Committee member) / Arizona State University (Publisher)

Created2016

Comparison of MIMD and SIMT Parallel Iterative Solvers for Laplace's Equation

Description

A comparison of the performance of CUDA versus OpenMP for Jacobi, Gauss-Seidel, and S.O.R. iterative methods for Laplace's Equation with Dirichlet boundary conditions is presented. Both the number of cores and the grid size were varied for the OpenMP program, while the grid size was varied for the CUDA program.…

A comparison of the performance of CUDA versus OpenMP for Jacobi, Gauss-Seidel, and S.O.R. iterative methods for Laplace's Equation with Dirichlet boundary conditions is presented. Both the number of cores and the grid size were varied for the OpenMP program, while the grid size was varied for the CUDA program. CUDA outperforms the 8-core OpenMP program with the Jacobi and Gauss-Seidel schemes for all grid sizes, and is competitive with S.O.R for all grid sizes examined.

ContributorsProst, Spencer Arthur (Author) / Gardner, Carl (Thesis director) / Welfert, Bruno (Committee member) / Speyer, Gil (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2013-05

GPU-enabled Functional-as-a-Service

Description

Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including ML inference, rely on Graphics-Processing-Unit (GPU) to achieve high performance;…

Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including ML inference, rely on Graphics-Processing-Unit (GPU) to achieve high performance; however, support for GPUs is currently lacking in existing FaaS solutions. The unique event-triggered and short-lived nature of functions poses new challenges to enabling GPUs on FaaS which must consider the overhead of transferring data (e.g., ML model parameters and inputs/outputs) between GPU and host memory. This thesis presents a new GPU-enabled FaaS solution that enables functions to efficiently utilize GPUs to accelerate computations such as model inference. First, the work extends existing open-source FaaS frameworks such as OpenFaaS to support the scheduling and execution of functions across GPUs in a FaaS cluster. Second, it provides caching of ML models in GPU memory to improve the performance of model inference functions and global management of GPU memories to improve the cache utilization. Third, it offers co-designed GPU function scheduling and cache management to optimize the performance of ML inference functions. Specifically, the thesis proposes locality-aware scheduling which maximizes the utilization of both GPU memory for cache hits and GPU cores for parallel processing. A thorough evaluation based on real-world traces and ML models shows that the proposed GPU-enabled FaaS works well for ML inference tasks, and the proposed locality-aware scheduler achieves a speedup of 34x compared to the default, load-balancing only scheduler.

ContributorsHong, Sungho (Author) / Zhao, Ming (Thesis advisor) / Cao, Zhichao (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2022

Performance Comparison of the Marching Cubes Algorithm: CPU vs. GPU

Description

This paper compares two approaches to implementing the Marching Cubes algorithm, a method of extracting a polygonal mesh from a 3D scalar field. One possible application of this algorithm is as a procedural terrain generation technique for use in video game development. The Marching Cubes algorithm is an easily parallelizable…

This paper compares two approaches to implementing the Marching Cubes algorithm, a method of extracting a polygonal mesh from a 3D scalar field. One possible application of this algorithm is as a procedural terrain generation technique for use in video game development. The Marching Cubes algorithm is an easily parallelizable task, and as such benefits greatly from being executed on the GPU. The reason that the algorithm is so well suited for parallelization is that it breaks the problem of mesh generation into a large group of similar sub-problems that can be solved completely independently.

ContributorsLord, William (Author) / Kobayashi, Yoshihiro (Thesis director) / Hansford, Dianne (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Computing and Informatics Program (Contributor)

Created2022-12

Thesis Writeup (Draft 3).pdf

ContributorsLord, William (Author) / Kobayashi, Yoshihiro (Thesis director) / Hansford, Dianne (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-12