Search Content

End-to-End Performance Benchmarking Tool for High-Speed Memory Access in Deep Learning

Description

Due to high DRAM access latency and energy, several convolutional neural network(CNN) accelerators face performance and energy efficiency challenges, which are critical for embedded implementations. As these applications exploit larger datasets, memory accesses of these emerging applications are increasing. As a result, it is difficult to predict the combined…

Due to high DRAM access latency and energy, several convolutional neural network(CNN) accelerators face performance and energy efficiency challenges, which are critical for embedded implementations. As these applications exploit larger datasets, memory accesses of these emerging applications are increasing. As a result, it is difficult to predict the combined dynamic random access memory (DRAM) workload behavior, which can sabotage memory optimizations in software. To understand the impact of external memory access on CNN accelerators which reduces the high DRAMaccess latency and energy, simulators such as RAMULATOR and VAMPIRE have been proposed in prior work. In this work, we utilize these simulators to benchmark external memory access in CNN accelerators. Experiments are performed generating trace files based on the number of parameters and data precision and also using trace file generated for CNN Accelerator Altera Arria 10 GX 1150 FPGA data to complete the end to end workflow using the mentioned simulators. Besides that, certain modifications were made in the default VAMPIRE code to implement certain functionalities such as PREA(Precharge All) and REF(Refresh). Then, precalculated energies were computed for DDR3, DDR4, and HBM based on the micron model to mention it in the dram specification file inputted to the VAMPIRE tool. An experimental study was performed and a comparison is made between DDR3, DDR4, and HBM, it was proved that DDR4 is nearly 31% more energy-efficient than DDR3 and HBMis 54% energy-efficient than DDR3. Performed modeling and experimental analysis on a large set of data and then split it into a set of data and compared the results of the small sets multiplied with the number of sets and the large data set and concluded that the results were nearly the same. Finally, a GUI is developed by wrapping both the simulators. GUI provides user-friendly access and can analyze the parameters without much prior knowledge and understanding of the working.

ContributorsPannala, Manvitha (Author) / Cao, Yu (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2021

Bayesian Nonparametric Reinforcement Learning in LTE and Wi-Fi Coexistence

Description

With the formation of next generation wireless communication, a growing number of new applications like internet of things, autonomous car, and drone is crowding the unlicensed spectrum. Licensed network such as LTE also comes to the unlicensed spectrum for better providing high-capacity contents with low cost. However, LTE was not…

With the formation of next generation wireless communication, a growing number of new applications like internet of things, autonomous car, and drone is crowding the unlicensed spectrum. Licensed network such as LTE also comes to the unlicensed spectrum for better providing high-capacity contents with low cost. However, LTE was not designed for sharing spectrum with others. A cooperation center for these networks is costly because they possess heterogeneous properties and everyone can enter and leave the spectrum unrestrictedly, so the design will be challenging. Since it is infeasible to incorporate potentially infinite scenarios with one unified design, an alternative solution is to let each network learn its own coexistence policy. Previous solutions only work on fixed scenarios. In this work we present a reinforcement learning algorithm to cope with the coexistence between Wi-Fi and LTE-LAA agents in 5 GHz unlicensed spectrum. The coexistence problem was modeled as a Dec-POMDP and Bayesian approach was adopted for policy learning with nonparametric prior to accommodate the uncertainty of policy for different agents. A fairness measure was introduced in the reward function to encourage fair sharing between agents. We turned the reinforcement learning into an optimization problem by transforming the value function as likelihood and variational inference for posterior approximation. Simulation results demonstrate that this algorithm can reach high value with compact policy representations, and stay computationally efficient when applying to agent set.

ContributorsSHIH, PO-KAN (Author) / Moraffah, Bahman (Thesis advisor) / Papandreou-Suppappola, Antonia (Thesis advisor) / Dasarathy, Gautam (Committee member) / Shih, YiChang (Committee member) / Arizona State University (Publisher)

Created2021

Machine Learning and Vision Using Edge Devices for Multimodal Chatbots and Bio-meteorological Sensing

Description

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with…

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with a cost of high computation, which invariably increases power usage and cost of the hardware. In this thesis we explore applications of ML techniques, applied to two completely different fields - arts, media and theater and urban climate research using low-cost and low-powered edge devices. The multi-modal chatbot uses different machine learning techniques: natural language processing (NLP) and computer vision (CV) to understand inputs of the user and accordingly perform in the play and interact with the audience. This system is also equipped with other interactive hardware setups like movable LED systems, together they provide an experiential theatrical play tailored to each user. I will discuss how I used edge devices to achieve this AI system which has created a new genre in theatrical play. I will then discuss MaRTiny, which is an AI-based bio-meteorological system that calculates mean radiant temperature (MRT), which is an important parameter for urban climate research. It is also equipped with a vision system that performs different machine learning tasks like pedestrian and shade detection. The entire system costs around $200 which can potentially replace the existing setup worth $20,000. I will further discuss how I overcame the inaccuracies in MRT value caused by the system, using machine learning methods. These projects although belonging to two very different fields, are implemented using edge devices and use similar ML techniques. In this thesis I will detail out different techniques that are shared between these two projects and how they can be used in several other applications using edge devices.

ContributorsKulkarni, Karthik Kashinath (Author) / Jayasuriya, Suren (Thesis advisor) / Middel, Ariane (Thesis advisor) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2021

Analyzing Multi-viewpoint Capabilities of Light Estimation Frameworks for Augmented Reality Using TCP/IP and UDP

Description

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s…

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s ARKit (Apple’s Augmented Reality Software Development Kit) are computationally expensive and have very slow refresh rates, which make them incompatible for dynamic environments and low-end mobile devices. Recently, there have been other illumination estimation frameworks such as GLEAM, Xihe, which aim at providing better illumination with faster refresh rates. GLEAM is an illumination estimation framework that understands the real scene by collecting pixel data from a reflecting spherical light probe. GLEAM uses this data to form environment cubemaps which are later mapped onto a reflection probe to generate illumination for AR objects. It is noticed that from a single viewpoint only one half of the light probe can be observed at a time which does not give complete information about the environment. This leads to the idea of having a multi-viewpoint estimation for better performance. This thesis work analyzes the multi-viewpoint capabilities of AR illumination frameworks that use physical light probes to understand the environment. The current work builds networking using TCP and UDP protocols on GLEAM. This thesis work also documents how processor load sharing has been done while networking devices and how that benefits the performance of GLEAM on mobile devices. Some enhancements using multi-threading have also been made to the already existing GLEAM model to improve its performance.

ContributorsGurram, Sahithi (Author) / LiKamWa, Robert (Thesis advisor) / Jayasuriya, Suren (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2022

Video Captioning with Commonsense Knowledge Anchors

Description

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents…

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.

ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)

Created2022

NeRF Robustness Study Against Adversarial Bit Flip Attack

Description

Recently, there has been a notable surge in the development of generative models dedicated to synthesizing 3D scenes. In these research works, Neural Radiance Fields(NeRF) is one of the most popular AI approaches due to its outstanding performance with relatively smaller model size and fast training/ rendering time. Owing to…

Recently, there has been a notable surge in the development of generative models dedicated to synthesizing 3D scenes. In these research works, Neural Radiance Fields(NeRF) is one of the most popular AI approaches due to its outstanding performance with relatively smaller model size and fast training/ rendering time. Owing to its popularity, it is important to investigate the NeRF model security concern. If it is widely used for different applications with some fatal security issues would cause some serious problems. Meanwhile, as for AI security and model robustness research, an emerging adversarial Bit Flip Attack (BFA) is demonstrated to be able to greatly reduce AI model accuracy by flipping several bits out of millions of weight parameters stored in the computer's main memory. Such malicious fault injection attack brings emerging model robustness concern for the widely used NeRF-based 3D modeling. This master thesis is targeting to study the NeRF model robustness against the adversarial bit flip attack. Based on the research works the fact can be discovered that the NeRF model is highly vulnerable to BFA, where the rendered image quality will have great degradation with only several bit flips in the model parameters.

ContributorsYU, Zhou (Author) / Fan, Deliang DF (Thesis advisor) / Chakrabart, Chaitali CC (Committee member) / Zhang, Yanchao YZ (Committee member) / Arizona State University (Publisher)

Created2023

Code Generation Framework for Fine-Grained Reconfigurable Array Architectures

Description

Digital signal processing accelerator architectures are designed to provide either high-energy efficiency or high programmability depending on the targeted application and use case. For example, Domain Adaptive Processor (DAP), a highly reconfigurable array architecture, designed by University of Michigan, for signal processing workloads is highly energy efficient but difficult to…

Digital signal processing accelerator architectures are designed to provide either high-energy efficiency or high programmability depending on the targeted application and use case. For example, Domain Adaptive Processor (DAP), a highly reconfigurable array architecture, designed by University of Michigan, for signal processing workloads is highly energy efficient but difficult to program. DAP consists of 8x8 array of Processing elements (PE) with each PE containing four heterogeneous SUB-PEs. Each SUB-PE has its own instruction memory and is capable of executing Very Large Instruction Word (VLIW) instructions. Unfortunately, instructions have to be written for every cycle of computation for each SUB-PE used in the application and handcrafted such that all the inter-PE dependencies are synchronized. This thesis builds up on prior work at Arizona State University(ASU) to make DAP more programmable. First, the compiler back-end developed at ASU is extended with more features. Prior work introduced DAP Instruction Set Architecture (ISA), an assembly instruction format, and proposed a compiler framework, called DAP Assembler, with optimization passes to reduce the complexity of programming applications in DAP. While this back-end infrastructure helped generated code with relative ease compared to Very Large Instruction Word (VLIW) code by hand, the output of the code generated was not software-pipelined and the code generated for each Processing Element(PE) had to be manually synchronized. So in this thesis, DAP Assembler tool is extended to support software-pipelining for high throughput applications. Further, a generic synchronization tool is proposed to synchronize instructions in a multi-PE setup and integrated with DAP Assembler to generate synchronized high-throughput application code. Second, a Multi-Level Intermediate Representation(MLIR) based compiler front-end infrastructure is proposed to first lower the application code written by the programmer to an Intermediate Representation (IR) that is suitable for generic array architectures and then further converted to DAP-specific IR that can be used for generating machine code for DAP using DAP ISA. This two stage process enables this infrastructure to be more easily adapted to other array architectures. The first conversion pass uses a designer-provided modular hardware architecture information, called Resource Registry, to allocate operations in the input IR to resources in the Resource registry and capture all data movement. While the resource registry changes from architecture to architecture, the conversion pass algorithm is generic and can be used for other architectures. The second conversion pass is more geared towards DAP and integrates DAP specific constructs to generate optimized instruction in DAP ISA. Multiple kernels such as matrix multiplication, vector-vector addition were implemented using this infrastructure and the code generated by the tool verified to be functionally correct.

ContributorsMurugan, Narayanan (Author) / Chakrabarti, Chaitali Dr (Thesis advisor) / Akoglu, Ali Dr (Committee member) / Bliss, Daniel Dr (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by

End-to-End Performance Benchmarking Tool for High-Speed Memory Access in Deep Learning

Bayesian Nonparametric Reinforcement Learning in LTE and Wi-Fi Coexistence

Machine Learning and Vision Using Edge Devices for Multimodal Chatbots and Bio-meteorological Sensing

Analyzing Multi-viewpoint Capabilities of Light Estimation Frameworks for Augmented Reality Using TCP/IP and UDP

Video Captioning with Commonsense Knowledge Anchors

NeRF Robustness Study Against Adversarial Bit Flip Attack

Code Generation Framework for Fine-Grained Reconfigurable Array Architectures