Search Content

Neuron-based Digital and Mixed-signal Circuit Design: From ASIC to SIMD Processors

Description

Among the many challenges facing circuit designers in deep sub-micron technologies, power, performance, area (PPA) and process variations are perhaps the most critical. Since existing strategies for reducing power and boosting the performance of the circuit designs have already matured to saturation, it is necessary to explore alternate unconventional strategies.…

Among the many challenges facing circuit designers in deep sub-micron technologies, power, performance, area (PPA) and process variations are perhaps the most critical. Since existing strategies for reducing power and boosting the performance of the circuit designs have already matured to saturation, it is necessary to explore alternate unconventional strategies. This investigation focuses on using perceptrons to enhance PPA in digital circuits and starts by constructing the perceptron using a combination of complementary metal-oxide-semiconductor (CMOS) and flash technology. The use of flash enables the perceptron to have a variable delay and functionality, making them robust to process, voltage, and temperature variations. By replacing parts of an application-specific integrated circuit (ASIC) with these perceptrons, improvements of up to 30% in the area and 20% in power can be achieved without affecting performance. Furthermore, the ability to vary the delay of a perceptron enables circuit designers to fix setup and hold-time violations post-fabrication, while reprogramming the functionality enables the obfuscation of the circuits. The study extends to field-programmable gate arrays (FPGAs), showing that traditional FPGA architectures can also achieve improved PPA by replacing some Look-Up-Tables (LUTs) with perceptrons. Considering that replacing parts of traditional digital circuits provides significant improvements in PPA, a natural extension was to see whether circuits built dedicatedly using perceptrons as its compute unit would lead to improvements in energy efficiency. This was demonstrated by developing perceptron-based compute elements and constructing an architecture using these elements for Quantized Neural Network acceleration. The resulting circuit delivered up to 50 times more energy efficiency compared to a CMOS-based accelerator without using standard low-power techniques such as voltage scaling and approximate computing.

ContributorsWagle, Ankit (Author) / Vrudhula, Sarma (Thesis advisor) / Khatri, Sunil (Committee member) / Shrivastava, Aviral (Committee member) / Seo, Jae-Sun (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)

Created2023

Formalizing Safety, Perception, and Mission Requirements for Testing and Planning in Autonomous Vehicles

Description

Autonomous Vehicles (AV) are inevitable entities in future mobility systems thatdemand safety and adaptability as two critical factors in replacing/assisting human drivers. Safety arises in defining, standardizing, quantifying, and monitoring requirements for all autonomous components. Adaptability, on the other hand, involves efficient handling of uncertainty and inconsistencies in models and data. First, I…

Autonomous Vehicles (AV) are inevitable entities in future mobility systems thatdemand safety and adaptability as two critical factors in replacing/assisting human drivers. Safety arises in defining, standardizing, quantifying, and monitoring requirements for all autonomous components. Adaptability, on the other hand, involves efficient handling of uncertainty and inconsistencies in models and data. First, I address safety by presenting a search-based test-case generation framework that can be used in training and testing deep-learning components of AV. Next, to address adaptability, I propose a framework based on multi-valued linear temporal logic syntax and semantics that allows autonomous agents to perform model-checking on systems with uncertainties. The search-based test-case generation framework provides safety assurance guarantees through formalizing and monitoring Responsibility Sensitive Safety (RSS) rules. I use the RSS rules in signal temporal logic as qualification specifications for monitoring and screening the quality of generated test-drive scenarios. Furthermore, to extend the existing temporal-based formal languages’ expressivity, I propose a new spatio-temporal perception logic that enables formalizing qualification specifications for perception systems. All-in-one, my test-generation framework can be used for reasoning about the quality of perception, prediction, and decision-making components in AV. Finally, my efforts resulted in publicly available software. One is an offline monitoring algorithm based on the proposed logic to reason about the quality of perception systems. The other is an optimal planner (model checker) that accepts mission specifications and model descriptions in the form of multi-valued logic and multi-valued sets, respectively. My monitoring framework is distributed with the publicly available S-TaLiRo and Sim-ATAV tools.

ContributorsHekmatnejad, Mohammad (Author) / Fainekos, Georgios (Thesis advisor) / Deshmukh, Jyotirmoy V (Committee member) / Karam, Lina (Committee member) / Pedrielli, Giulia (Committee member) / Shrivastava, Aviral (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Compiler and architecture design for coarse-grained programmable accelerators

Description

The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density…

The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density increases significantly by technology scaling. Due to technology factors, the reduction in power consumption per transistor is not sufficient to offset the increase in power consumption per unit area. Therefore, to improve performance, increasing energy-efficiency must be addressed at all design levels from circuit level to application and algorithm levels.

At architectural level, one promising approach is to populate the system with hardware accelerators each optimized for a specific task. One drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low as they perform one specific function. Using software programmable accelerators is an alternative approach to achieve high energy-efficiency and programmability. Due to intrinsic characteristics of software accelerators, they can exploit both instruction level parallelism and data level parallelism.

Coarse-Grained Reconfigurable Architecture (CGRA) is a software programmable accelerator consists of a number of word-level functional units. Motivated by promising characteristics of software programmable accelerators, the potentials of CGRAs in future computing platforms is studied and an end-to-end CGRA research framework is developed. This framework consists of three different aspects: CGRA architectural design, integration in a computing system, and CGRA compiler. First, the design and implementation of a CGRA and its instruction set is presented. This design is then modeled in a cycle accurate system simulator. The simulation platform enables us to investigate several problems associated with a CGRA when it is deployed as an accelerator in a computing system. Next, the problem of mapping a compute intensive region of a program to CGRAs is formulated. From this formulation, several efficient algorithms are developed which effectively utilize CGRA scarce resources very well to minimize the running time of input applications. Finally, these mapping algorithms are integrated in a compiler framework to construct a compiler for CGRA

ContributorsHamzeh, Mahdi (Author) / Vrudhula, Sarma (Thesis advisor) / Gopalakrishnan, Kailash (Committee member) / Shrivastava, Aviral (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2015

Construction of GCCFG for inter-procedural optimizations in Software Managed Manycore (SMM)

Description

Software Managed Manycore (SMM) architectures - in which each core has only a scratch pad memory (instead of caches), - are a promising solution for scaling memory hierarchy to hundreds of cores. However, in these architectures, the code and data of the tasks mapped to the cores must be explicitly…

Software Managed Manycore (SMM) architectures - in which each core has only a scratch pad memory (instead of caches), - are a promising solution for scaling memory hierarchy to hundreds of cores. However, in these architectures, the code and data of the tasks mapped to the cores must be explicitly managed in the software by the compiler. State-of-the-art compiler techniques for SMM architectures require inter-procedural information and analysis. A call graph of the program does not have enough information, and Global CFG, i.e., combining all the control flow graphs of the program has too much information, and becomes too big. As a result, most new techniques have informally defined and used GCCFG (Global Call Control Flow Graph) - a whole program representation which captures the control-flow as well as function call information in a succinct way - to perform inter-procedural analysis. However, how to construct it has not been shown yet. We find that for several simple call and control flow graphs, constructing GCCFG is relatively straightforward, but there are several cases in common applications where unique graph transformation is needed in order to formally and correctly construct the GCCFG. This paper fills this gap, and develops graph transformations to allow the construction of GCCFG in (almost) all cases. Our experiments show that by using succinct representation (GCCFG) rather than elaborate representation (GlobalCFG), the compilation time of state-of-the-art code management technique [4] can be improved by an average of 5X, and that of stack management [20] can be improved by an average of 4X.

ContributorsHolton, Bryce (Author) / Shrivastava, Aviral (Thesis advisor) / Collofello, James (Committee member) / Richa, Andrea (Committee member) / Arizona State University (Publisher)

Created2014

EdgeFaaS: A Function-based Framework for Edge Computing

Description

The rapid growth of data generated from Internet of Things (IoTs) such as smart phones and smart home devices presents new challenges to cloud computing in transferring, storing, and processing the data. With increasingly more powerful edge devices, edge computing, on the other hand, has the potential to better responsiveness,…

The rapid growth of data generated from Internet of Things (IoTs) such as smart phones and smart home devices presents new challenges to cloud computing in transferring, storing, and processing the data. With increasingly more powerful edge devices, edge computing, on the other hand, has the potential to better responsiveness, privacy, and cost efficiency. However, resources across the cloud and edge are highly distributed and highly diverse. To address these challenges, this paper proposes EdgeFaaS, a Function-as-a-Service (FaaS) based computing framework that supports the flexible, convenient, and optimized use of distributed and heterogeneous resources across IoT, edge, and cloud systems. EdgeFaaS allows cluster resources and individual devices to be managed under the same framework and provide computational and storage resources for functions. It provides virtual function and virtual storage interfaces for consistent function management and storage management across heterogeneous compute and storage resources. It automatically optimizes the scheduling of functions and placement of data according to their performance and privacy requirements. EdgeFaaS is evaluated based on two edge workflows: video analytics workflow and federated learning workflow, both of which are representative edge applications and involve large amounts of input data generated from edge devices.

ContributorsJin, Runyu (Author) / Zhao, Ming (Thesis advisor) / Shrivastava, Aviral (Committee member) / Sarwat Abdelghany Aly Elsayed, Mohamed (Committee member) / Arizona State University (Publisher)

Created2021

TIPANGLE: A Machine Learning Approach for Accurate Spatial Pan and Tilt Angle Determination of Pan Tilt Traffic Cameras

Description

Pan Tilt Traffic Cameras (PTTC) are a vital component of traffic managementsystems for monitoring/surveillance. In a real world scenario, if a vehicle is in pursuit of another vehicle or an accident has occurred at an intersection causing traffic stoppages, accurate and venerable data from PTTC is necessary to quickly localize the cars on…

Pan Tilt Traffic Cameras (PTTC) are a vital component of traffic managementsystems for monitoring/surveillance. In a real world scenario, if a vehicle is in pursuit of another vehicle or an accident has occurred at an intersection causing traffic stoppages, accurate and venerable data from PTTC is necessary to quickly localize the cars on a map for adept emergency response as more and more traffic systems are getting automated using machine learning concepts. However, the position(orientation) of the PTTC with respect to the environment is often unknown as most of them lack Inertial Measurement Units or Encoders. Current State Of the Art systems 1. Demand high performance compute and use carbon footprint heavy Deep Neural Networks(DNN), 2. Are only applicable to scenarios with appropriate lane markings or only roundabouts, 3. Demand complex mathematical computations to determine focal length and optical center first before determining the pose. A compute light approach "TIPANGLE" is presented in this work. The approach uses the concept of Siamese Neural Networks(SNN) encompassing simple mathematical functions i.e., Euclidian Distance and Contrastive Loss to achieve the objective. The effectiveness of the approach is reckoned with a thorough comparison study with alternative approaches and also by executing the approach on an embedded system i.e., Raspberry Pi 3.

ContributorsJagadeesha, Shreehari (Author) / Shrivastava, Aviral (Thesis advisor) / Gopalan, Nakul (Committee member) / Arora, Aman (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by

Neuron-based Digital and Mixed-signal Circuit Design: From ASIC to SIMD Processors

Formalizing Safety, Perception, and Mission Requirements for Testing and Planning in Autonomous Vehicles

Compiler and architecture design for coarse-grained programmable accelerators

Construction of GCCFG for inter-procedural optimizations in Software Managed Manycore (SMM)

EdgeFaaS: A Function-based Framework for Edge Computing

TIPANGLE: A Machine Learning Approach for Accurate Spatial Pan and Tilt Angle Determination of Pan Tilt Traffic Cameras