Search Content

Analysis of hardware usage of shuffle instruction based performance optimization in the Blinds-II image quality assessment algorithm

Description

With the advent of GPGPU, many applications are being accelerated by using CUDA programing paradigm. We are able to achieve around 10x -100x speedups by simply porting the application on to the GPU and running the parallel chunk of code on its multi cored SIMT (Single instruction multiple thread) architecture.…

With the advent of GPGPU, many applications are being accelerated by using CUDA programing paradigm. We are able to achieve around 10x -100x speedups by simply porting the application on to the GPU and running the parallel chunk of code on its multi cored SIMT (Single instruction multiple thread) architecture. But for optimal performance it is necessary to make sure that all the GPU resources are efficiently used, and the latencies in the application are minimized. For this, it is essential to monitor the Hardware usage of the algorithm and thus diagnose the compute and memory bottlenecks in the implementation. In the following thesis, we will be analyzing the mapping of CUDA implementation of BLIINDS-II algorithm on the underlying GPU hardware, and come up with a Kepler architecture specific solution of using shuffle instruction via CUB library to tackle the two major bottlenecks in the algorithm. Experiments were conducted to convey the advantage of using shuffle instru3ction in algorithm over only using shared memory as a buffer to global memory. With the new implementation of BLIINDS-II algorithm using CUB library, a speedup of around 13.7% was achieved.

ContributorsWadekar, Ameya (Author) / Sohoni, Sohum (Thesis advisor) / Aukes, Daniel (Committee member) / Redkar, Sangram (Committee member) / Arizona State University (Publisher)

Created2017

Next-Generation Smart Cars: Towards a More Intelligent Interactive Infotainment System

Description

Today, in a world of automation, the impact of Artificial Intelligence can be seen in every aspect of our lives. Starting from smart homes to self-driving cars everything is run using intelligent, adaptive technologies. In this thesis, an attempt is made to analyze the correlation between driving quality and its…

Today, in a world of automation, the impact of Artificial Intelligence can be seen in every aspect of our lives. Starting from smart homes to self-driving cars everything is run using intelligent, adaptive technologies. In this thesis, an attempt is made to analyze the correlation between driving quality and its impact on the use of car infotainment system and vice versa and hence the driver distraction. Various internal and external driving factors have been identified to understand the dependency and seriousness of driver distraction caused due to the car infotainment system. We have seen a number UI/UX changes, speech recognition advancements in cars to reduce distraction. But reducing the number of casualties on road is still a persisting problem in hand as the cognitive load on the driver is considered to be one of the primary reasons for distractions leading to casualties. In this research, a pathway has been provided to move towards building an artificially intelligent, adaptive and interactive infotainment which is trained to behave differently by analyzing the driving quality without the intervention of the driver. The aim is to not only shift focus of the driver from screen to street view, but to also change the inherent behavior of the infotainment system based on the driving statistics at that point in time without the need for driver intervention.

ContributorsSuresh, Seema (Author) / Gaffar, Ashraf (Thesis advisor) / Sodemann, Angela (Committee member) / Gonzalez-Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2017

Hardware Acceleration of Most Apparent Distortion Image Quality Assessment Algorithm on FPGA Using OpenCL

Description

The information era has brought about many technological advancements in the past

few decades, and that has led to an exponential increase in the creation of digital images and

videos. Constantly, all digital images go through some image processing algorithm for

various reasons like compression, transmission, storage, etc. There is data loss during…

The information era has brought about many technological advancements in the past

few decades, and that has led to an exponential increase in the creation of digital images and

videos. Constantly, all digital images go through some image processing algorithm for

various reasons like compression, transmission, storage, etc. There is data loss during this

process which leaves us with a degraded image. Hence, to ensure minimal degradation of

images, the requirement for quality assessment has become mandatory. Image Quality

Assessment (IQA) has been researched and developed over the last several decades to

predict the quality score in a manner that agrees with human judgments of quality. Modern

image quality assessment (IQA) algorithms are quite effective at prediction accuracy, and

their development has not focused on improving computational performance. The existing

serial implementation requires a relatively large run-time on the order of seconds for a single

frame. Hardware acceleration using Field programmable gate arrays (FPGAs) provides

reconfigurable computing fabric that can be tailored for a broad range of applications.

Usually, programming FPGAs has required expertise in hardware descriptive languages

(HDLs) or high-level synthesis (HLS) tool. OpenCL is an open standard for cross-platform,

parallel programming of heterogeneous systems along with Altera OpenCL SDK, enabling

developers to use FPGA's potential without extensive hardware knowledge. Hence, this

thesis focuses on accelerating the computationally intensive part of the most apparent

distortion (MAD) algorithm on FPGA using OpenCL. The results are compared with CPU

implementation to evaluate performance and efficiency gains.

ContributorsGunavelu Mohan, Aswin (Author) / Sohoni, Sohum (Thesis advisor) / Ren, Fengbo (Thesis advisor) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2017

New methodology of automatic design collaboration

Description

When software design teams attempt to collaborate on different design docu-

ments they suffer from a serious collaboration problem. Designers collaborate either in person or remotely. In person collaboration is expensive but effective. Remote collaboration is inexpensive but inefficient. In, order to gain the most benefit from collaboration there needs to…

When software design teams attempt to collaborate on different design docu-

ments they suffer from a serious collaboration problem. Designers collaborate either in person or remotely. In person collaboration is expensive but effective. Remote collaboration is inexpensive but inefficient. In, order to gain the most benefit from collaboration there needs to be remote collaboration that is not only cheap but also as efficient as physical collaboration.

Remotely collaborating on software design relies on general tools such as Word, and Excel. These tools are then shared in an inefficient manner by using either email, cloud based file locking tools, or something like google docs. Because these tools either increase the number of design building blocks, or limit the number

of available times in which one can work on a specific document, they drastically decrease productivity.

This thesis outlines a new methodology to increase design productivity, accom- plished by providing design specific collaboration. Using version control systems, this methodology allows for effective project collaboration between remotely lo- cated design teams. The methodology of this paper encompasses role management, policy management, and design artifact management, including nonfunctional re- quirements. Version control can be used for different design products, improving communication and productivity amongst design teams. This thesis outlines this methodology and then outlines a proof of concept tool that embodies the core of these principles.

ContributorsPike, Shawn (Author) / Gaffar, Ashraf (Thesis advisor) / Lindquist, Timothy (Committee member) / Whitehouse, Richard (Committee member) / Arizona State University (Publisher)

Created2016

Analysis and Performance Optimization of a GPGPU Implementation of Image Quality Assessment (IQA) Algorithm VSNR

Description

Image processing has changed the way we store, view and share images. One important component of sharing images over the networks is image compression. Lossy image compression techniques compromise the quality of images to reduce their size. To ensure that the distortion of images due to image compression is not…

Image processing has changed the way we store, view and share images. One important component of sharing images over the networks is image compression. Lossy image compression techniques compromise the quality of images to reduce their size. To ensure that the distortion of images due to image compression is not highly detectable by humans, the perceived quality of an image needs to be maintained over a certain threshold. Determining this threshold is best done using human subjects, but that is impractical in real-world scenarios. As a solution to this issue, image quality assessment (IQA) algorithms are used to automatically compute a fidelity score of an image.

However, poor performance of IQA algorithms has been observed due to complex statistical computations involved. General Purpose Graphics Processing Unit (GPGPU) programming is one of the solutions proposed to optimize the performance of these algorithms.

This thesis presents a Compute Unified Device Architecture (CUDA) based optimized implementation of full reference IQA algorithm, Visual Signal to Noise Ratio (VSNR) that uses M-level 2D Discrete Wavelet Transform (DWT) with 9/7 biorthogonal filters among other statistical computations. The presented implementation is tested upon four different image quality databases containing images with multiple distortions and sizes ranging from 512 x 512 to 1600 x 1280. The CUDA implementation of VSNR shows a speedup of over 32x for 1600 x 1280 images. It is observed that the speedup scales with the increase in size of images. The results showed that the implementation is fast enough to use VSNR on high definition videos with a frame rate of 60 fps. This work presents the optimizations made due to the use of GPU’s constant memory and reuse of allocated memory on the GPU. Also, it shows the performance improvement using profiler driven GPGPU development in CUDA. The presented implementation can be deployed in production combined with existing applications.

ContributorsGupta, Ayush (Author) / Sohoni, Sohum (Thesis advisor) / Amresh, Ashish (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2017

An analysis of the memory bottleneck and cache performance of most apparent distortion image quality assessment algorithm on GPU

Description

As digital images are transmitted over the network or stored on a disk, image processing is done as part of the standard for efficient storage and bandwidth. This causes some amount of distortion or artifacts in the image which demands the need for quality assessment. Subjective image quality assessment is…

As digital images are transmitted over the network or stored on a disk, image processing is done as part of the standard for efficient storage and bandwidth. This causes some amount of distortion or artifacts in the image which demands the need for quality assessment. Subjective image quality assessment is expensive, time consuming and influenced by the subject's perception. Hence, there is a need for developing mathematical models that are capable of predicting the quality evaluation. With the advent of the information era and an exponential growth in image/video generation and consumption, the requirement for automated quality assessment has become mandatory to assess the degradation. The last few decades have seen research on automated image quality assessment (IQA) algorithms gaining prominence. However, the focus has been on achieving better predication accuracy, and not on improving computational performance. As a result, existing serial implementations require a lot of time in processing a single frame. In the last 5 years, research on general-purpose graphic processing unit (GPGPU) based image quality assessment (IQA) algorithm implementation has shown promising results for single images. Still, the implementations are not efficient enough for deployment in real world applications, especially for live videos at high resolution. Hence, in this thesis, it is proposed that microarchitecture-conscious coding on a graphics processing unit (GPU) combined with detailed understanding of the image quality assessment (IQA) algorithm can result in non-trivial speedups without compromising quality prediction accuracy. This document focusses on the microarchitectural analysis of the most apparent distortion (MAD) algorithm. The results are analyzed in-depth and one of the major bottlenecks is identified. With the knowledge of underlying microarchitecture, the implementation is restructured thereby resolving the bottleneck and improving the performance.

ContributorsKannan, Vignesh (Author) / Sohoni, Sohum (Thesis advisor) / Ren, Fengbo (Committee member) / Sayeed, Mohamed (Committee member) / Arizona State University (Publisher)

Created2016

Cognitive software complexity analysis

Description

A well-defined Software Complexity Theory which captures the Cognitive means of algorithmic information comprehension is needed in the domain of cognitive informatics & computing. The existing complexity heuristics are vague and empirical. Industrial software is a combination of algorithms implemented. However, it would be wrong to conclude that algorithmic space…

A well-defined Software Complexity Theory which captures the Cognitive means of algorithmic information comprehension is needed in the domain of cognitive informatics & computing. The existing complexity heuristics are vague and empirical. Industrial software is a combination of algorithms implemented. However, it would be wrong to conclude that algorithmic space and time complexity is software complexity. An algorithm with multiple lines of pseudocode might sometimes be simpler to understand that the one with fewer lines. So, it is crucial to determine the Algorithmic Understandability for an algorithm, in order to better understand Software Complexity. This work deals with understanding Software Complexity from a cognitive angle. Also, it is vital to compute the effect of reducing cognitive complexity. The work aims to prove three important statements. The first being, that, while algorithmic complexity is a part of software complexity, software complexity does not solely and entirely mean algorithmic Complexity. Second, the work intends to bring to light the importance of cognitive understandability of algorithms. Third, is about the impact, reducing Cognitive Complexity, would have on Software Design and Development.

ContributorsMannava, Manasa Priyamvada (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2016

Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

Description

Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into…

Parents fulfill a pivotal role in early childhood development of social and communication

skills. In children with autism, the development of these skills can be delayed. Applied

behavioral analysis (ABA) techniques have been created to aid in skill acquisition.

Among these, pivotal response treatment (PRT) has been empirically shown to foster

improvements. Research into PRT implementation has also shown that parents can be

trained to be effective interventionists for their children. The current difficulty in PRT

training is how to disseminate training to parents who need it, and how to support and

motivate practitioners after training.

Evaluation of the parents’ fidelity to implementation is often undertaken using video

probes that depict the dyadic interaction occurring between the parent and the child during

PRT sessions. These videos are time consuming for clinicians to process, and often result

in only minimal feedback for the parents. Current trends in technology could be utilized to

alleviate the manual cost of extracting data from the videos, affording greater

opportunities for providing clinician created feedback as well as automated assessments.

The naturalistic context of the video probes along with the dependence on ubiquitous

recording devices creates a difficult scenario for classification tasks. The domain of the

PRT video probes can be expected to have high levels of both aleatory and epistemic

uncertainty. Addressing these challenges requires examination of the multimodal data

along with implementation and evaluation of classification algorithms. This is explored

through the use of a new dataset of PRT videos.

The relationship between the parent and the clinician is important. The clinician can

provide support and help build self-efficacy in addition to providing knowledge and

modeling of treatment procedures. Facilitating this relationship along with automated

feedback not only provides the opportunity to present expert feedback to the parent, but

also allows the clinician to aid in personalizing the classification models. By utilizing a

human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the

classification models by providing additional labeled samples. This will allow the system

to improve classification and provides a person-centered approach to extracting

multimodal data from PRT video probes.

ContributorsCopenhaver Heath, Corey D (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Venkateswara, Hemanth (Committee member) / Davulcu, Hasan (Committee member) / Gaffar, Ashraf (Committee member) / Arizona State University (Publisher)

Created2019

Sequencing Behavior in an Intelligent Pro-active Co-Driver System

Description

Driving is the coordinated operation of mind and body for movement of a vehicle, such as a car, or a bus. Driving, being considered an everyday activity for many people, still has an issue of safety. Driver distraction is becoming a critical safety problem. Speed, drunk driving as well as…

Driving is the coordinated operation of mind and body for movement of a vehicle, such as a car, or a bus. Driving, being considered an everyday activity for many people, still has an issue of safety. Driver distraction is becoming a critical safety problem. Speed, drunk driving as well as distracted driving are the three leading factors in the fatal car crashes. Distraction, which is defined as an excessive workload and limited attention, is the main paradigm that guides this research area. Driver behavior analysis can be used to address the distraction problem and provide an intelligent adaptive agent to work closely with the driver, fay beyond traditional algorithmic computational models. A variety of machine learning approaches has been proposed to estimate or predict drivers’ fatigue level using car data, driver status or a combination of them.

Three important features of intelligence and cognition are perception, attention and sensory memory. In this thesis, I focused on memory and attention as essential parts of highly intelligent systems. Without memory, systems will only show limited intelligence since their response would be exclusively based on spontaneous decision without considering the effect of previous events. I proposed a memory-based sequence to predict the driver behavior and distraction level using neural network. The work started with a large-scale experiment to collect data and make an artificial intelligence-friendly dataset. After that, the data was used to train a deep neural network to estimate the driver behavior. With a focus on memory by using Long Short Term Memory (LSTM) network to increase the level of intelligence in two dimensions: Forgiveness of minor glitches, and accumulation of anomalous behavior., I reduced the model error and computational expense by adding attention mechanism on the top of LSTM models. This system can be generalized to build and train highly intelligent agents in other domains.

ContributorsMonjezi Kouchak, Shokoufeh (Author) / Gaffar, Ashraf (Thesis advisor) / Doupe, Adam (Committee member) / Ben Amor, Hani (Committee member) / Cheeks, Loretta (Committee member) / Arizona State University (Publisher)

Created2020

Designing an AI-driven System at Scale for Detection of Abusive Head Trauma using Domain Modeling

Description

Traumatic injuries are the leading cause of death in children under 18, with head trauma being the leading cause of death in children below 5. A large but unknown number of traumatic injuries are non-accidental, i.e. inflicted. The lack of sensitivity and specificity required to diagnose Abusive Head Trauma (AHT)…

Traumatic injuries are the leading cause of death in children under 18, with head trauma being the leading cause of death in children below 5. A large but unknown number of traumatic injuries are non-accidental, i.e. inflicted. The lack of sensitivity and specificity required to diagnose Abusive Head Trauma (AHT) from radiological studies results in putting the children at risk of re-injury and death. Modern Deep Learning techniques can be utilized to detect Abusive Head Trauma using Computer Tomography (CT) scans. Training models using these techniques are only a part of building AI-driven Computer-Aided Diagnostic systems. There are challenges in deploying the models to make them highly available and scalable.

The thesis models the domain of Abusive Head Trauma using Deep Learning techniques and builds an AI-driven System at scale using best Software Engineering Practices. It has been done in collaboration with Phoenix Children Hospital (PCH). The thesis breaks down AHT into sub-domains of Medical Knowledge, Data Collection, Data Pre-processing, Image Generation, Image Classification, Building APIs, Containers and Kubernetes. Data Collection and Pre-processing were done at PCH with the help of trauma researchers and radiologists. Experiments are run using Deep Learning models such as DCGAN (for Image Generation), Pretrained 2D and custom 3D CNN classifiers for the classification tasks. The trained models are exposed as APIs using the Flask web framework, contained using Docker and deployed on a Kubernetes cluster.

The results are analyzed based on the accuracy of the models, the feasibility of their implementation as APIs and load testing the Kubernetes cluster. They suggest the need for Data Annotation at the Slice level for CT scans and an increase in the Data Collection process. Load Testing reveals the auto-scalability feature of the cluster to serve a high number of requests.

ContributorsVikram, Aditya (Author) / Sanchez, Javier Gonzalez (Thesis advisor) / Gaffar, Ashraf (Thesis advisor) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2020