Search Content

Improving the Programmability of a Systolic Array Processor

Description

This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much…

This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much on programmability and reconfigurability. The structure of a typical DAP code for each Processing Element (PE) is very different from any other programming language format. As a result, writing code for DAP requires the programmer to acquire processor-specific knowledge including configuration rules, cycle accurate execution state for memory and datapath components within each PE, etc. Each code must be carefully handcrafted to meet the strict timing and resource constraints, leading to very long programming times and low productivity. In this thesis, a code generation and optimization tool is introduced to improve the programmability of DAP and make code development easier. The tool consists of a configuration code generator, optimizer, and a scheduler. An Instruction Set Architecture (ISA) has been designed specifically for DAP. The programmer writes the assembly code for each PE using the DAP ISA. The assembly code is then translated into a low-level configuration code. This configuration code undergoes several optimizations passes. Level 1 (L1) optimization handles instruction redundancy and performs loop optimizations through code movement. The Level 2 (L2) optimization performs instruction-level parallelism. Use of L1 and L2 optimization passes result in a code that has fewer instructions and requires fewer cycles. In addition, a scheduling tool has been introduced which performs final timing adjustments on the code to match the input data rate.

ContributorsVipperla, Anish (Author) / Chakrabarti, Chaitali (Thesis advisor) / Bliss, Daniel (Committee member) / Akoglu, Ali (Committee member) / Arizona State University (Publisher)

Created2022

Blame-Free Motion Planning in Hybrid Traffic

Description

Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability…

Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability of human-driven vehicles. Given that eliminating accidents is impossible, an achievable goal of designing AVs is to design them in a way so that they will not be blamed for any accident in which they are involved in. This work proposes BlaFT – a Blame-Free motion planning algorithm in hybrid Traffic. BlaFT is designed to be compatible with HVs and other AVs, and will not be blamed for accidents in a structured road environment. Also, it proves that no accidents will happen if all AVs are using the BlaFT motion planner and that when in hybrid traffic, the AV using BlaFT will be blame-free even if it is involved in a collision. The work instantiated scores of BlaFT and HV vehicles in an urban road scape loop in the 'Simulation of Urban MObility', ran the simulation for several hours, and observe that as the percentage of BlaFT vehicles increases, the traffic becomes safer. Adding BlaFT vehicles to HVs also increases the efficiency of traffic as a whole by up to 34%.

ContributorsPark, Sanggu (Author) / Shrivastava, Aviral (Thesis advisor) / Wang, Ruoyu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2022

Computational Imaging for Energy-Efficient Cameras: Adaptive ROI-based Object Tracking and Optically Defocused Event-based Sensing.

Description

Computer vision is becoming an essential component of embedded system applications such as smartphones, wearables, autonomous systems and internet-of-things (IoT). These applications are generally deployed into environments with limited energy, memory bandwidth and computational resources. This trend is driving the development of energy-effi cient image processing solutions from sensing to…

Computer vision is becoming an essential component of embedded system applications such as smartphones, wearables, autonomous systems and internet-of-things (IoT). These applications are generally deployed into environments with limited energy, memory bandwidth and computational resources. This trend is driving the development of energy-effi cient image processing solutions from sensing to computation. In this thesis, diff erent alternatives are explored to implement energy-efficient computer vision systems. First, I present a fi eld programmable gate array (FPGA) implementation of an adaptive subsampling algorithm for region-of-interest (ROI) -based object tracking. By implementing the computationally intensive sections of this algorithm on an FPGA, I aim to offl oad computing resources from energy-ineffi cient graphics processing units (GPUs) and/or general-purpose central processing units (CPUs). I also present a working system executing this algorithm in near real-time latency implemented on a standalone embedded device. Secondly, I present a neural network-based pipeline to improve the performance of event-based cameras in non-ideal optical conditions. Event-based cameras or dynamic vision sensors (DVS) are bio-inspired sensors that measure logarithmic per-pixel brightness changes in a scene. Their advantages include high dynamic range, low latency and ultra-low power when compared to standard frame-based cameras. Several tasks have been proposed to take advantage of these novel sensors but they rely on perfectly calibrated optical lenses that are in-focus. In this work I propose a methodto reconstruct events captured with an out-of-focus event-camera so they can be fed into an intensity reconstruction task. The network is trained with a dataset generated by simulating defocus blur in sequences from object tracking datasets such as LaSOT and OTB100. I also test the generalization performance of this network in scenes captured with a DAVIS event-based sensor equipped with an out-of-focus lens.

ContributorsTorres Muro, Victor Isaac (Author) / Jayasuriya, Suren (Thesis advisor) / Spanias, Andreas (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2022

COMSAT: Modified Modulo Scheduling Techniques for Acceleration on Unknown Trip Count and Early Exit Loops

Description

Coarse-grain reconfigurable architectures (CGRAs) have shown significant improvements as hardware accelerator whilst demanding low power. Such acceleration inherits from the nature of instruction-level parallelism and exploited by many techniques. Modulo scheduling is a popular approach to software pipelining techniques that provides an efficient heuristic to accelerations on loops, repetitive regions…

Coarse-grain reconfigurable architectures (CGRAs) have shown significant improvements as hardware accelerator whilst demanding low power. Such acceleration inherits from the nature of instruction-level parallelism and exploited by many techniques. Modulo scheduling is a popular approach to software pipelining techniques that provides an efficient heuristic to accelerations on loops, repetitive regions of an application. Existing scheduling algorithms for modulo scheduling heuristic persist on loop exiting problems that limit CGRA acceleration to only loops with known trip count and no exit statements. Another notable limitation is the early exit problem, where loops can only terminate after certain iterations as CGRA moves to kernel stage. In attempts to circumvent such obstacles, COMSAT introduces a modified modulo scheduling technique that acts as an external module and can be applied to any existing scheduling/mapping algorithms with minimal hardware changes. Experiments from MiBench and Rodinia benchmark suites have shown that COMSAT achieved an average speedup of 3x in overall benchmarks and 10x speedup in kernel regions. Without COMSAT techniques, only 25% of said loops would have been able to accelerate, reducing benchmark and kernel speedups to 1.25x and 3.63x respectively.

ContributorsTa, Vinh (Author) / Shrivastava, Aviral (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Kinsey, Michel (Committee member) / Arizona State University (Publisher)

Created2022

Code Generation Techniques For Emerging Capability Architectures

Description

Memory safety and security issues continue to plague modern systems and are rapidly becoming a top priority. Capability architectures are a proposed solution that solve the problem at a fundamental hardware level, with several commercially viable options under active development. These new and evolving designs place higher demand upon the…

Memory safety and security issues continue to plague modern systems and are rapidly becoming a top priority. Capability architectures are a proposed solution that solve the problem at a fundamental hardware level, with several commercially viable options under active development. These new and evolving designs place higher demand upon the software tools needed to develop software to ensure correct execution. Capabilities introduce ideas that challenge typical architecture assumptions about the representation of data and its location in memory. This calls for a new core system software ecosystem. A fundamental component of any software ecosystem is a compiler. Without a compiler, large critical components of the ecosystem must be written in assembly language; a tedious and possibly error-prone task. A compiler for a capability architecture that emphasizes memory security must above all else ensure functional and correct code generation, raw performance and power efficiency are no longer the chief concerns. Compilers for these architectures have been developed, but as capability architectures mature in complexity new compilation support is required. A set of techniques that help solve the compilation challenges for a capability architecture are presented in this work. These capability-aware compiler ideas are presented in their generalized forms to enable their adoption in other architectures and future extensions. Some of the ideas presented come out of work on a compiler for a new capability architecture, Zeno. The Zeno compiler utilizes the extensible RISC-V instruction set and adds a set of global memory extensions, xBGAS (Extended Base Global Address Space), which is used to provide memory security. The Zeno compiler is described in detail as an implementation of the generalized capability-aware compiler. Static analysis is used to evaluate the generated assembly code produced by the compiler. Rather than focusing on the runtime performance of code generated by the Zeno compiler, this work evaluates the compiler based on a static analysis of the generated source code. We find the code produced by the Zeno compiler sufficient to enable further testing of the Zeno architecture and drive its development. The generated code is sufficient to enable further testing of the Zeno architecture and drive its development.

ContributorsAbraham, Jacob (Author) / Kinsy, Michel (Thesis advisor) / Rudd, Kevin (Committee member) / Glew, Andy (Committee member) / Arizona State University (Publisher)

Created2022

A Performance Study of Different Deep Learning Architectures For Detecting Construction Equipment in Sites

Description

There are relatively few available construction equipment detectors models thatuse deep learning architectures; many of these use old object detection architectures like CNN (Convolutional Neural Networks), RCNN (Region-Based Convolutional Neural Network), and early versions of You Only Look Once (YOLO) V1. It can be challenging to deploy these models in practice for tracking…

There are relatively few available construction equipment detectors models thatuse deep learning architectures; many of these use old object detection architectures like CNN (Convolutional Neural Networks), RCNN (Region-Based Convolutional Neural Network), and early versions of You Only Look Once (YOLO) V1. It can be challenging to deploy these models in practice for tracking construction equipment while working on site. This thesis aims to provide a clear guide on how to train and evaluate the performance of different deep learning architecture models to detect different kinds of construction equipment on-site using two You Only Look Once (YOLO) architecturesYOLO v5s and YOLO R to detect three classes of different construction equipment onsite, including Excavators, Dump Trucks, and Loaders. The thesis also provides a simple solution to deploy the trained models. Additionally, this thesis describes a specialized, high-quality dataset with three thousand pictures created to train these models on real data by considering a typical worksite scene, various motions, varying perspectives, and angles of construction equipment on the site. The results presented herein show that after 150 epochs of training, the YOLORP6 has the best mAP at 0.981, while the YOLO v5s mAP is 0.936. However, YOLO v5s had the fastest and the shortest training time on Tesla P100 GPU as a processing unit on the Google Colab notebook. The YOLOv5s needed 4 hours and 52 minutes, but the YOLOR-P6 needed 14 hours and 35 minutes to finish the training.ii The final findings of this study show that the YOLOv5s model is the most efficient model to use when building an artificial intelligence model to detect construction equipment because of the size of its weights file relative to other versions of YOLO models- 14.4 MB for YOLOV5s vs. 288 MB for YOLOR-P6. This hugely impacts the processing unit’s performance, which is used to predict the construction equipment on site. In addition, the constructed database is published on a public dataset on the Roboflow platform, which can be used later as a foundation for future research and improvement for the newer deep learning architectures.

Contributorssabek, mohamed mamdooh (Author) / Parrish, Kristen (Thesis advisor) / Czerniawski, Thomas (Committee member) / Ayer, Steven K (Committee member) / Arizona State University (Publisher)

Created2022

Analyzing Multi-viewpoint Capabilities of Light Estimation Frameworks for Augmented Reality Using TCP/IP and UDP

Description

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s…

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s ARKit (Apple’s Augmented Reality Software Development Kit) are computationally expensive and have very slow refresh rates, which make them incompatible for dynamic environments and low-end mobile devices. Recently, there have been other illumination estimation frameworks such as GLEAM, Xihe, which aim at providing better illumination with faster refresh rates. GLEAM is an illumination estimation framework that understands the real scene by collecting pixel data from a reflecting spherical light probe. GLEAM uses this data to form environment cubemaps which are later mapped onto a reflection probe to generate illumination for AR objects. It is noticed that from a single viewpoint only one half of the light probe can be observed at a time which does not give complete information about the environment. This leads to the idea of having a multi-viewpoint estimation for better performance. This thesis work analyzes the multi-viewpoint capabilities of AR illumination frameworks that use physical light probes to understand the environment. The current work builds networking using TCP and UDP protocols on GLEAM. This thesis work also documents how processor load sharing has been done while networking devices and how that benefits the performance of GLEAM on mobile devices. Some enhancements using multi-threading have also been made to the already existing GLEAM model to improve its performance.

ContributorsGurram, Sahithi (Author) / LiKamWa, Robert (Thesis advisor) / Jayasuriya, Suren (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2022

Image Processing Techniques for Object Sorting by a Two Degree of Freedom Robotic Manipulator: A Comparative Computer Simulation Study

Description

Object sorting is a very common application especially in the industry setting, but this is a labor intensive and time consuming process and it proves to be challenging if done manually. Thanks to the rapid development in technology now almost all these object sorting tasks are partially or completely automated.…

Object sorting is a very common application especially in the industry setting, but this is a labor intensive and time consuming process and it proves to be challenging if done manually. Thanks to the rapid development in technology now almost all these object sorting tasks are partially or completely automated. Image processing techniques are essential for the full operation of such a pick and place robot as it is responsible for perceiving the environment and to correctly identify ,classify and localize the different objects in it. In order for the robots to perform accurate object sorting with efficiency and stability this thesis discusses how different Deep learning based perception techniques can be used. In the era of Artificial Intelligence this sorting problem can be done more efficiently than the existing techniques. This thesis presents different image processing techniques and algorithms that can be used to perform object sorting efficiently. A comparison between three different deep learning based techniques is presented and their pros and cons are discussed. Furthermore this thesis also presents a comprehensive study about the kinematics and the dynamics involved in a 2 Degree of Freedom Robotic Manipulator .

ContributorsRanganathan, Pavithra (Author) / Rodriguez, Armando (Thesis advisor) / Si, Jennie (Committee member) / Berman, Spring (Committee member) / Arizona State University (Publisher)

Created2021

Machine Learning and Vision Using Edge Devices for Multimodal Chatbots and Bio-meteorological Sensing

Description

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with…

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with a cost of high computation, which invariably increases power usage and cost of the hardware. In this thesis we explore applications of ML techniques, applied to two completely different fields - arts, media and theater and urban climate research using low-cost and low-powered edge devices. The multi-modal chatbot uses different machine learning techniques: natural language processing (NLP) and computer vision (CV) to understand inputs of the user and accordingly perform in the play and interact with the audience. This system is also equipped with other interactive hardware setups like movable LED systems, together they provide an experiential theatrical play tailored to each user. I will discuss how I used edge devices to achieve this AI system which has created a new genre in theatrical play. I will then discuss MaRTiny, which is an AI-based bio-meteorological system that calculates mean radiant temperature (MRT), which is an important parameter for urban climate research. It is also equipped with a vision system that performs different machine learning tasks like pedestrian and shade detection. The entire system costs around $200 which can potentially replace the existing setup worth $20,000. I will further discuss how I overcame the inaccuracies in MRT value caused by the system, using machine learning methods. These projects although belonging to two very different fields, are implemented using edge devices and use similar ML techniques. In this thesis I will detail out different techniques that are shared between these two projects and how they can be used in several other applications using edge devices.

ContributorsKulkarni, Karthik Kashinath (Author) / Jayasuriya, Suren (Thesis advisor) / Middel, Ariane (Thesis advisor) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2021

Exploration of Edge Machine Learning-based Stress Detection Using Wearable Devices

Description

Stress is one of the critical factors in daily lives, as it has a profound impact onperformance at work and decision-making processes. With the development of IoT technology, smart wearables can handle diverse operations, including networking and recording biometric signals. Also, it has become easier for individual users to selfdetect stress with…

Stress is one of the critical factors in daily lives, as it has a profound impact onperformance at work and decision-making processes. With the development of IoT technology, smart wearables can handle diverse operations, including networking and recording biometric signals. Also, it has become easier for individual users to selfdetect stress with recorded data since these wearables as well as their accompanying smartphones now have data processing capability. Edge computing on such devices enables real-time feedback and in turn preemptive identification of reactions to stress. This can provide an opportunity to prevent more severe consequences that might result if stress is unaddressed. From a system perspective, leveraging edge computing allows saving energy such as network bandwidth and latency since it processes data in proximity to the data source. It can also strengthen privacy by implementing stress prediction at local devices without transferring personal information to the public cloud. This thesis presents a framework for real-time stress prediction using Fitbit and machine learning with the support from cloud computing. Fitbit is a wearable tracker that records biometric measurements using optical sensors on the wrist. It also provides developers with platforms to design custom applications. I developed an application for the Fitbit and the user’s accompanying mobile device to collect heart rate fluctuations and corresponding stress levels entered by users. I also established the dataset collected from police cadets during their academy training program. Machine learning classifiers for stress prediction are built using classic models and TensorFlow in the cloud. Lastly, the classifiers are optimized using model compression techniques for deploying them on the smartphones and analyzed how efficiently stress prediction can be performed on the edge.

ContributorsSim, Sang-Hun (Author) / Zhao, Ming (Thesis advisor) / Roberts, Nicole (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by