Matching Items (47)
Filtering by

Clear all filters

150197-Thumbnail Image.png
Description
Ever reducing time to market, along with short product lifetimes, has created a need to shorten the microprocessor design time. Verification of the design and its analysis are two major components of this design cycle. Design validation techniques can be broadly classified into two major categories: simulation based approaches and

Ever reducing time to market, along with short product lifetimes, has created a need to shorten the microprocessor design time. Verification of the design and its analysis are two major components of this design cycle. Design validation techniques can be broadly classified into two major categories: simulation based approaches and formal techniques. Simulation based microprocessor validation involves running millions of cycles using random or pseudo random tests and allows verification of the register transfer level (RTL) model against an architectural model, i.e., that the processor executes instructions as required. The validation effort involves model checking to a high level description or simulation of the design against the RTL implementation. Formal techniques exhaustively analyze parts of the design but, do not verify RTL against the architecture specification. The focus of this work is to implement a fully automated validation environment for a MIPS based radiation hardened microprocessor using simulation based approaches. The basic framework uses the classical validation approach in which the design to be validated is described in a Hardware Definition Language (HDL) such as VHDL or Verilog. To implement a simulation based approach a number of random or pseudo random tests are generated. The output of the HDL based design is compared against the one obtained from a "perfect" model implementing similar functionality, a mismatch in the results would thus indicate a bug in the HDL based design. Effort is made to design the environment in such a manner that it can support validation during different stages of the design cycle. The validation environment includes appropriate changes so as to support architecture changes which are introduced because of radiation hardening. The manner in which the validation environment is build is highly dependent on the specifications of the perfect model used for comparisons. This work implements the validation environment for two MIPS simulators as the reference model. Two bugs have been discovered in the RTL model, using simulation based approaches through the validation environment.
ContributorsSharma, Abhishek (Author) / Clark, Lawrence (Thesis advisor) / Holbert, Keith E. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2011
150460-Thumbnail Image.png
Description
Performance improvements have largely followed Moore's Law due to the help from technology scaling. In order to continue improving performance, power-efficiency must be reduced. Better technology has improved power-efficiency, but this has a limit. Multi-core architectures have been shown to be an additional aid to this crusade of increased power-efficiency.

Performance improvements have largely followed Moore's Law due to the help from technology scaling. In order to continue improving performance, power-efficiency must be reduced. Better technology has improved power-efficiency, but this has a limit. Multi-core architectures have been shown to be an additional aid to this crusade of increased power-efficiency. Accelerators are growing in popularity as the next means of achieving power-efficient performance. Accelerators such as Intel SSE are ideal, but prove difficult to program. FPGAs, on the other hand, are less efficient due to their fine-grained reconfigurability. A middle ground is found in CGRAs, which are highly power-efficient, but largely programmable accelerators. Power-efficiencies of 100s of GOPs/W have been estimated, more than 2 orders of magnitude greater than current processors. Currently, CGRAs are limited in their applicability due to their ability to only accelerate a single thread at a time. This limitation becomes especially apparent as multi-core/multi-threaded processors have moved into the mainstream. This limitation is removed by enabling multi-threading on CGRAs through a software-oriented approach. The key capability in this solution is enabling quick run-time transformation of schedules to execute on targeted portions of the CGRA. This allows the CGRA to be shared among multiple threads simultaneously. Analysis shows that enabling multi-threading has very small costs but provides very large benefits (less than 1% single-threaded performance loss but nearly 300% CGRA throughput increase). By increasing dynamism of CGRA scheduling, system performance is shown to increase overall system performance of an optimized system by almost 350% over that of a single-threaded CGRA and nearly 20x faster than the same system with no CGRA in a highly threaded environment.
ContributorsPager, Jared (Author) / Shrivastava, Aviral (Thesis advisor) / Gupta, Sandeep (Committee member) / Speyer, Gil (Committee member) / Arizona State University (Publisher)
Created2011
150544-Thumbnail Image.png
Description
Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a powerful programming tool and is widely used for software development. STLs provide dynamic data structures, algorithms, and iterators for vector, deque (double-ended queue), list, map (red-black tree), etc. Since the size of the local memory is limited in the cores of the LLM architecture, and data transfer is not automatically supported by hardware cache or OS, the usage of current STL implementation on LLM multicores is limited. Specifically, there is a hard limitation on the amount of data they can handle. In this article, we propose and implement a framework which manages the STL container classes on the local memory of LLM multicore architecture. Our proposal removes the data size limitation of the STL, and therefore improves the programmability on LLM multicore architectures with little change to the original program. Our implementation results in only about 12%-17% increase in static library code size and reasonable runtime overheads.
ContributorsLu, Di (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)
Created2012
149617-Thumbnail Image.png
Description
The ubiquity of embedded computational systems has exploded in recent years impacting everything from hand-held computers and automotive driver assistance to battlefield command and control and autonomous systems. Typical embedded computing systems are characterized by highly resource constrained operating environments. In particular, limited energy resources constrain performance in embedded systems

The ubiquity of embedded computational systems has exploded in recent years impacting everything from hand-held computers and automotive driver assistance to battlefield command and control and autonomous systems. Typical embedded computing systems are characterized by highly resource constrained operating environments. In particular, limited energy resources constrain performance in embedded systems often reliant on independent fuel or battery supplies. Ultimately, mitigating energy consumption without sacrificing performance in these systems is paramount. In this work power/performance optimization emphasizing prevailing data centric applications including video and signal processing is addressed for energy constrained embedded systems. Frameworks are presented which exchange quality of service (QoS) for reduced power consumption enabling power aware energy management. Power aware systems provide users with tools for precisely managing available energy resources in light of user priorities, extending availability when QoS can be sacrificed. Specifically, power aware management tools for next generation bistable electrophoretic displays and the state of the art H.264 video codec are introduced. The multiprocessor system on chip (MPSoC) paradigm is examined in the context of next generation many-core hand-held computing devices. MPSoC architectures promise to breach the power/performance wall prohibiting advancement of complex high performance single core architectures. Several many-core distributed memory MPSoC architectures are commercially available, while the tools necessary to effectively tap their enormous potential remain largely open for discovery. Adaptable scalability in many-core systems is addressed through a scalable high performance multicore H.264 video decoder implemented on the representative Cell Broadband Engine (CBE) architecture. The resulting agile performance scalable system enables efficient adaptive power optimization via decoding-rate driven sleep and voltage/frequency state management. The significant problem of mapping applications onto these architectures is additionally addressed from the perspective of instruction mapping for limited distributed memory architectures with a code overlay generator implemented on the CBE. Finally runtime scheduling and mapping of scalable applications in multitasking environments is addressed through the introduction of a lightweight work partitioning framework targeting streaming applications with low latency and near optimal throughput demonstrated on the CBE.
ContributorsBaker, Michael (Author) / Chatha, Karam S. (Thesis advisor) / Raupp, Gregory B. (Committee member) / Vrudhula, Sarma B. K. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2011
149343-Thumbnail Image.png
Description
Threshold logic has long been studied as a means of achieving higher performance and lower power dissipation, providing improvements by condensing simple logic gates into more complex primitives, effectively reducing gate count, pipeline depth, and number of interconnects. This work proposes a new physical implementation of threshold logic, the threshold

Threshold logic has long been studied as a means of achieving higher performance and lower power dissipation, providing improvements by condensing simple logic gates into more complex primitives, effectively reducing gate count, pipeline depth, and number of interconnects. This work proposes a new physical implementation of threshold logic, the threshold logic latch (TLL), which overcomes the difficulties observed in previous work, particularly with respect to gate reliability in the presence of noise and process variations. Simple but effective models were created to assess the delay, power, and noise margin of TLL gates for the purpose of determining the physical parameters and assignment of input signals that achieves the lowest delay subject to constraints on power and reliability. From these models, an optimized library of standard TLL cells was developed to supplement a commercial library of static CMOS gates. The new cells were then demonstrated on a number of automatically synthesized, placed, and routed designs. A two-stage 2's complement integer multiplier designed with CMOS and TLL gates utilized 19.5% less area, 28.0% less active power, and 61.5% less leakage power than an equivalent design with the same performance using only static CMOS gates. Additionally, a two-stage 32-instruction 4-way issue queue designed with CMOS and TLL gates utilized 30.6% less area, 31.0% less active power, and 58.9% less leakage power than an equivalent design with the same performance using only static CMOS gates.
ContributorsLeshner, Samuel (Author) / Vrudhula, Sarma (Thesis advisor) / Chatha, Karamvir (Committee member) / Clark, Lawrence (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2010
168821-Thumbnail Image.png
Description
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.
ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)
Created2022
171583-Thumbnail Image.png
Description
With the breakdown of Dennard scaling, computer architects can no longer rely on integrated circuit energy efficiency to scale with transistor density, and must under-clock or power-gate parts of their designs in order to fit within given power budgets. Hardware accelerators may improve energy efficiency of some compute-intensive tasks, but

With the breakdown of Dennard scaling, computer architects can no longer rely on integrated circuit energy efficiency to scale with transistor density, and must under-clock or power-gate parts of their designs in order to fit within given power budgets. Hardware accelerators may improve energy efficiency of some compute-intensive tasks, but as more tasks are accelerated, the general-purpose portions of workloads account for a larger share of execution time while also leaving less instruction, data, or task-level parallelism to exploit. Adaptive computing systems have potential to address these challenges by modifying their behavior at runtime. Adaptation requires runtime decision-making, which can be performed both in hardware and software. While software-based decision-making is more flexible and can execute higher complexity operations compared to hardware, it also incurs a significant latency and power overhead. Hardware designs are more limited in the space of decisions they can make, but have direct access to their own internal microarchitectural states and can make faster decisions, allowing for better-informed adaptation and extracting previously unobtainable performance and security benefits. In this dissertation I study (i) the viability and trade-offs of general-purpose adaptive systems, (ii) the difficulty and complexity of making adaptation decisions, and (iii) how time spent in the observation-analysis-adaptation cycle affects adaptation benefits. I introduce techniques for (a) modeling and understanding high performance computing systems and microarchitecture, (b) enabling hardware learning and decision-making through low-latency networks, and (c) on securing hardware designs using runtime decision-making. I propose an always-awake and active learning `hardware nervous system' pervasive throughout the chip that can reason about the individual hardware module performance, energy usage, and security. I present the design and implementation of (1) a reference architecture and (2) a microarchitecture-aware static binary instrumentation tool. Finally, I provide results showing (1) that runtime adaptation is a necessary to continue improving performance on general-purpose tasks, (2) that significant performance loss and performance variation happens under the ISA-level, and is unobservable without hardware support, and (3) that hardware must possess decision-making and ‘self-awareness’ capabilities at the microarchitecture level in order to efficiently use its own faculties.
ContributorsIsakov, Mihailo (Author) / Kinsy, Michel (Thesis advisor) / Shrivastava, Aviral (Committee member) / Rudd, Kevin (Committee member) / Gadepally, Vijay (Committee member) / Arizona State University (Publisher)
Created2022
187599-Thumbnail Image.png
Description
5G Millimeter Wave (mmWave) technology holds great promise for Connected Autonomous Vehicles (CAVs) due to its ability to achieve data rates in the Gbps range. However, mmWave suffers high beamforming overhead and requirement of line of sight (LOS) to maintain a strong connection. For Vehicle-to-Infrastructure (V2I) scenarios, where CAVs connect

5G Millimeter Wave (mmWave) technology holds great promise for Connected Autonomous Vehicles (CAVs) due to its ability to achieve data rates in the Gbps range. However, mmWave suffers high beamforming overhead and requirement of line of sight (LOS) to maintain a strong connection. For Vehicle-to-Infrastructure (V2I) scenarios, where CAVs connect to roadside units (RSUs), these drawbacks become apparent. Because vehicles are dynamic, there is a large potential for link blockages, which in turn is detrimental to the connected applications running on the vehicle, such as cooperative perception and remote driver takeover. Existing RSU selection schemes base their decisions on signal strength and vehicle trajectory alone, which is not enough to prevent the blockage of links. Most recent CAVs motion planning algorithms routinely use other vehicle's near-future plans, either by explicit communication among vehicles, or by prediction. In this thesis, I make use of this knowledge (of the other vehicle's near future path plans) to further improve the RSU association mechanism for CAVs. I solve the RSU association problem by converting it to a shortest path problem with the objective to maximize the total communication bandwidth. Evaluations of B-AWARE in simulation using Simulated Urban Mobility (SUMO) and Digital twin for self-dRiving Intelligent VEhicles (DRIVE) on 12 highway and city street scenarios with varying traffic density and RSU placements show that B-AWARE results in a 1.05x improvement of the potential datarate in the average case and 1.28x in the best case vs. the state of the art. But more impressively, B-AWARE reduces the time spent with no connection by 48% in the average case and 251% in the best case as compared to the state-of-the-art methods. This is partly a result of B-AWARE reducing almost 100% of blockage occurrences in simulation.
ContributorsSzeto, Matthew (Author) / Shrivastava, Aviral (Thesis advisor) / LiKamWa, Robert (Committee member) / Meuth, Ryan (Committee member) / Arizona State University (Publisher)
Created2023
171818-Thumbnail Image.png
Description
Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability

Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability of human-driven vehicles. Given that eliminating accidents is impossible, an achievable goal of designing AVs is to design them in a way so that they will not be blamed for any accident in which they are involved in. This work proposes BlaFT – a Blame-Free motion planning algorithm in hybrid Traffic. BlaFT is designed to be compatible with HVs and other AVs, and will not be blamed for accidents in a structured road environment. Also, it proves that no accidents will happen if all AVs are using the BlaFT motion planner and that when in hybrid traffic, the AV using BlaFT will be blame-free even if it is involved in a collision. The work instantiated scores of BlaFT and HV vehicles in an urban road scape loop in the 'Simulation of Urban MObility', ran the simulation for several hours, and observe that as the percentage of BlaFT vehicles increases, the traffic becomes safer. Adding BlaFT vehicles to HVs also increases the efficiency of traffic as a whole by up to 34%.
ContributorsPark, Sanggu (Author) / Shrivastava, Aviral (Thesis advisor) / Wang, Ruoyu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
171862-Thumbnail Image.png
Description
Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some

Deep neural networks have been shown to be vulnerable to adversarial attacks. Typical attack strategies alter authentic data subtly so as to obtain adversarial samples that resemble the original but otherwise would cause a network's misbehavior such as a high misclassification rate. Various attack approaches have been reported, with some showing state-of-the-art performance in attacking certain networks. In the meanwhile, many defense mechanisms have been proposed in the literature, some of which are quite effective for guarding against typical attacks. Yet, most of these attacks fail when the targeted network modifies its architecture or uses another set of parameters and vice versa. Moreover, the emerging of more advanced deep neural networks, such as generative adversarial networks (GANs), has made the situation more complicated and the game between the attack and defense is continuing. This dissertation aims at exploring the venerability of the deep neural networks by investigating the mechanisms behind the success/failure of the existing attack and defense approaches. Therefore, several deep learning-based approaches have been proposed to study the problem from different perspectives. First, I developed an adversarial attack approach by exploring the unlearned region of a typical deep neural network which is often over-parameterized. Second, I proposed an end-to-end learning framework to analyze the images generated by different GAN models. Third, I developed a defense mechanism that can secure the deep neural network against adversarial attacks with a defense layer consisting of a set of orthogonal kernels. Substantial experiments are conducted to unveil the potential factors that contribute to attack/defense effectiveness. This dissertation also concludes with a discussion of possible future works of achieving a robust deep neural network.
ContributorsDing, Yuzhen (Author) / Li, Baoxin (Thesis advisor) / Davulcu, Hasan (Committee member) / Venkateswara, Hemanth Kumar Demakethepalli (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022