Search Content

Quantization and Evaluation of AI Algorithms for Hardware Acceleration

Description

Artificial intelligence is one of the leading technologies that mimics the problem solving and decision making capabilities of the human brain. Machine learning algorithms, especially deep learning algorithms, are leading the way in terms of performance and robustness. They are used for various purposes, mainly for computer vision, speech recognition,…

Artificial intelligence is one of the leading technologies that mimics the problem solving and decision making capabilities of the human brain. Machine learning algorithms, especially deep learning algorithms, are leading the way in terms of performance and robustness. They are used for various purposes, mainly for computer vision, speech recognition, and object detection. The algorithms are usually tested inaccuracy, and they utilize full floating-point precision (32 bits). The hardware would require a high amount of power and area to accommodate many parameters with full precision. In this exploratory work, the convolution autoencoder is quantized for the working of an event base camera. The model is designed so that the autoencoder can work on-chip, which would sufficiently decrease the latency in processing. Different quantization methods are used to quantize and binarize the weights and activations of this neural network model to be portable and power efficient. The sparsity term is added to make the model as robust and energy-efficient as possible. The network model was able to recoup the lost accuracy due to binarizing the weights and activation's to quantize the layers of the encoder selectively. This method of recouping the accuracy gives enough flexibility to introduce the network on the chip to get real-time processing from systems like event-based cameras. Lately, computer vision, especially object detection have made strides in their object detection accuracy. The algorithms can sufficiently detect and predict the objects in real-time. However, end-to-end detection of the algorithm is challenging due to the large parameter need and processing requirements. A change in the Non Maximum Suppression algorithm in SSD(Single Shot Detector)-Mobilenet-V1 resulted in less computational complexity without change in the quality of output metric. The Mean Average Precision(mAP) calculated suggests that this method can be implemented in the post-processing of other networks.

ContributorsKuzhively, Ajay Balu (Author) / Cao, Yu (Thesis advisor) / Seo, Jae-Sun (Committee member) / Fan, Delian (Committee member) / Arizona State University (Publisher)

Created2021

Computer Vision: Improving Detection and Tracking for Occluded and Blurry Settings

Description

Computer vision and tracking has become an area of great interest for many reasons, including self-driving cars, identification of vehicles and drivers on roads, and security camera monitoring, all of which are expanding in the modern digital era. When working with practical systems that are constrained in multiple ways, such…

Computer vision and tracking has become an area of great interest for many reasons, including self-driving cars, identification of vehicles and drivers on roads, and security camera monitoring, all of which are expanding in the modern digital era. When working with practical systems that are constrained in multiple ways, such as video quality or viewing angle, algorithms that work well theoretically can have a high error rate in practice. This thesis studies several ways in which that error can be minimized.This thesis describes an application in a practical system. This project is to detect, track and count people entering different lanes at an airport security checkpoint, using CCTV videos as a primary source. This thesis improves an existing algorithm that is not optimized for this particular problem and has a high error rate when comparing the algorithm counts with the true volume of users. The high error rate is caused by many people crowding into security lanes at the same time. The camera from which footage was captured is located at a poor angle, and thus many of the people occlude each other and cause the existing algorithm to miss people. One solution is to count only heads; since heads are smaller than a full body, they will occlude less, and in addition, since the camera is angled from above, the heads in back will appear higher and will not be occluded by people in front. One of the primary improvements to the algorithm is to combine both person detections and head detections to improve the accuracy. The proposed algorithm also improves the accuracy of detections. The existing algorithm used the COCO training dataset, which works well in scenarios where people are visible and not occluded. However, the available video quality in this project was not very good, with people often blocking each other from the camera’s view. Thus, a different training set was needed that could detect people even in poor-quality frames and with occlusion. The new training set is the first algorithmic improvement, and although occasionally performing worse, corrected the error by 7.25% on average.

ContributorsLarsen, Andrei (Author) / Askin, Ronald (Thesis advisor) / Sefair, Jorge (Thesis advisor) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Bayesian Nonparametric Reinforcement Learning in LTE and Wi-Fi Coexistence

Description

With the formation of next generation wireless communication, a growing number of new applications like internet of things, autonomous car, and drone is crowding the unlicensed spectrum. Licensed network such as LTE also comes to the unlicensed spectrum for better providing high-capacity contents with low cost. However, LTE was not…

With the formation of next generation wireless communication, a growing number of new applications like internet of things, autonomous car, and drone is crowding the unlicensed spectrum. Licensed network such as LTE also comes to the unlicensed spectrum for better providing high-capacity contents with low cost. However, LTE was not designed for sharing spectrum with others. A cooperation center for these networks is costly because they possess heterogeneous properties and everyone can enter and leave the spectrum unrestrictedly, so the design will be challenging. Since it is infeasible to incorporate potentially infinite scenarios with one unified design, an alternative solution is to let each network learn its own coexistence policy. Previous solutions only work on fixed scenarios. In this work we present a reinforcement learning algorithm to cope with the coexistence between Wi-Fi and LTE-LAA agents in 5 GHz unlicensed spectrum. The coexistence problem was modeled as a Dec-POMDP and Bayesian approach was adopted for policy learning with nonparametric prior to accommodate the uncertainty of policy for different agents. A fairness measure was introduced in the reward function to encourage fair sharing between agents. We turned the reinforcement learning into an optimization problem by transforming the value function as likelihood and variational inference for posterior approximation. Simulation results demonstrate that this algorithm can reach high value with compact policy representations, and stay computationally efficient when applying to agent set.

ContributorsSHIH, PO-KAN (Author) / Moraffah, Bahman (Thesis advisor) / Papandreou-Suppappola, Antonia (Thesis advisor) / Dasarathy, Gautam (Committee member) / Shih, YiChang (Committee member) / Arizona State University (Publisher)

Created2021

Machine Learning and Vision Using Edge Devices for Multimodal Chatbots and Bio-meteorological Sensing

Description

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with…

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with a cost of high computation, which invariably increases power usage and cost of the hardware. In this thesis we explore applications of ML techniques, applied to two completely different fields - arts, media and theater and urban climate research using low-cost and low-powered edge devices. The multi-modal chatbot uses different machine learning techniques: natural language processing (NLP) and computer vision (CV) to understand inputs of the user and accordingly perform in the play and interact with the audience. This system is also equipped with other interactive hardware setups like movable LED systems, together they provide an experiential theatrical play tailored to each user. I will discuss how I used edge devices to achieve this AI system which has created a new genre in theatrical play. I will then discuss MaRTiny, which is an AI-based bio-meteorological system that calculates mean radiant temperature (MRT), which is an important parameter for urban climate research. It is also equipped with a vision system that performs different machine learning tasks like pedestrian and shade detection. The entire system costs around $200 which can potentially replace the existing setup worth $20,000. I will further discuss how I overcame the inaccuracies in MRT value caused by the system, using machine learning methods. These projects although belonging to two very different fields, are implemented using edge devices and use similar ML techniques. In this thesis I will detail out different techniques that are shared between these two projects and how they can be used in several other applications using edge devices.

ContributorsKulkarni, Karthik Kashinath (Author) / Jayasuriya, Suren (Thesis advisor) / Middel, Ariane (Thesis advisor) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2021

Video Captioning with Commonsense Knowledge Anchors

Description

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents…

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.

ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)

Created2022

NeRF Robustness Study Against Adversarial Bit Flip Attack

Description

Recently, there has been a notable surge in the development of generative models dedicated to synthesizing 3D scenes. In these research works, Neural Radiance Fields(NeRF) is one of the most popular AI approaches due to its outstanding performance with relatively smaller model size and fast training/ rendering time. Owing to…

Recently, there has been a notable surge in the development of generative models dedicated to synthesizing 3D scenes. In these research works, Neural Radiance Fields(NeRF) is one of the most popular AI approaches due to its outstanding performance with relatively smaller model size and fast training/ rendering time. Owing to its popularity, it is important to investigate the NeRF model security concern. If it is widely used for different applications with some fatal security issues would cause some serious problems. Meanwhile, as for AI security and model robustness research, an emerging adversarial Bit Flip Attack (BFA) is demonstrated to be able to greatly reduce AI model accuracy by flipping several bits out of millions of weight parameters stored in the computer's main memory. Such malicious fault injection attack brings emerging model robustness concern for the widely used NeRF-based 3D modeling. This master thesis is targeting to study the NeRF model robustness against the adversarial bit flip attack. Based on the research works the fact can be discovered that the NeRF model is highly vulnerable to BFA, where the rendered image quality will have great degradation with only several bit flips in the model parameters.

ContributorsYU, Zhou (Author) / Fan, Deliang DF (Thesis advisor) / Chakrabart, Chaitali CC (Committee member) / Zhang, Yanchao YZ (Committee member) / Arizona State University (Publisher)

Created2023

Code Generation Framework for Fine-Grained Reconfigurable Array Architectures

Description

Digital signal processing accelerator architectures are designed to provide either high-energy efficiency or high programmability depending on the targeted application and use case. For example, Domain Adaptive Processor (DAP), a highly reconfigurable array architecture, designed by University of Michigan, for signal processing workloads is highly energy efficient but difficult to…

Digital signal processing accelerator architectures are designed to provide either high-energy efficiency or high programmability depending on the targeted application and use case. For example, Domain Adaptive Processor (DAP), a highly reconfigurable array architecture, designed by University of Michigan, for signal processing workloads is highly energy efficient but difficult to program. DAP consists of 8x8 array of Processing elements (PE) with each PE containing four heterogeneous SUB-PEs. Each SUB-PE has its own instruction memory and is capable of executing Very Large Instruction Word (VLIW) instructions. Unfortunately, instructions have to be written for every cycle of computation for each SUB-PE used in the application and handcrafted such that all the inter-PE dependencies are synchronized. This thesis builds up on prior work at Arizona State University(ASU) to make DAP more programmable. First, the compiler back-end developed at ASU is extended with more features. Prior work introduced DAP Instruction Set Architecture (ISA), an assembly instruction format, and proposed a compiler framework, called DAP Assembler, with optimization passes to reduce the complexity of programming applications in DAP. While this back-end infrastructure helped generated code with relative ease compared to Very Large Instruction Word (VLIW) code by hand, the output of the code generated was not software-pipelined and the code generated for each Processing Element(PE) had to be manually synchronized. So in this thesis, DAP Assembler tool is extended to support software-pipelining for high throughput applications. Further, a generic synchronization tool is proposed to synchronize instructions in a multi-PE setup and integrated with DAP Assembler to generate synchronized high-throughput application code. Second, a Multi-Level Intermediate Representation(MLIR) based compiler front-end infrastructure is proposed to first lower the application code written by the programmer to an Intermediate Representation (IR) that is suitable for generic array architectures and then further converted to DAP-specific IR that can be used for generating machine code for DAP using DAP ISA. This two stage process enables this infrastructure to be more easily adapted to other array architectures. The first conversion pass uses a designer-provided modular hardware architecture information, called Resource Registry, to allocate operations in the input IR to resources in the Resource registry and capture all data movement. While the resource registry changes from architecture to architecture, the conversion pass algorithm is generic and can be used for other architectures. The second conversion pass is more geared towards DAP and integrates DAP specific constructs to generate optimized instruction in DAP ISA. Multiple kernels such as matrix multiplication, vector-vector addition were implemented using this infrastructure and the code generated by the tool verified to be functionally correct.

ContributorsMurugan, Narayanan (Author) / Chakrabarti, Chaitali Dr (Thesis advisor) / Akoglu, Ali Dr (Committee member) / Bliss, Daniel Dr (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by

Quantization and Evaluation of AI Algorithms for Hardware Acceleration

Computer Vision: Improving Detection and Tracking for Occluded and Blurry Settings

Bayesian Nonparametric Reinforcement Learning in LTE and Wi-Fi Coexistence

Machine Learning and Vision Using Edge Devices for Multimodal Chatbots and Bio-meteorological Sensing

Video Captioning with Commonsense Knowledge Anchors

NeRF Robustness Study Against Adversarial Bit Flip Attack

Code Generation Framework for Fine-Grained Reconfigurable Array Architectures