Matching Items (37)
Filtering by

Clear all filters

151527-Thumbnail Image.png
Description
Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation

Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation of effective, yet low-cost protection mechanism. The basic idea is that, there is a high probability that a soft-fault in program execution will eventually alter the control flow of the program. Therefore just by making sure that the control flow of the program is correct, significant protection can be achieved. More than a dozen techniques for CFC have been developed over the last several decades, ranging from hardware techniques, software techniques, and hardware-software hybrid techniques as well. Our analysis shows that existing CFC techniques are not only ineffective in protecting from soft errors, but cause additional power and performance overheads. For this analysis, we develop and validate a simulation based experimental setup to accurately and quantitatively estimate the architectural vulnerability of a program execution on a processor micro-architecture. We model the protection achieved by various state-of-the-art CFC techniques in this quantitative vulnerability estimation setup, and find out that software only CFC protection schemes (CFCSS, CFCSS+NA, CEDA) increase system vulnerability by 18% to 21% with 17% to 38% performance overhead. Hybrid CFC protection (CFEDC) increases vulnerability by 5%, while the vulnerability remains almost the same for hardware only CFC protection (CFCET); notwithstanding the hardware overheads of design cost, area, and power incurred in the hardware modifications required for their implementations.
ContributorsRhisheekesan, Abhishek (Author) / Shrivastava, Aviral (Thesis advisor) / Colbourn, Charles Joseph (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2013
151851-Thumbnail Image.png
Description
In this thesis we deal with the problem of temporal logic robustness estimation. We present a dynamic programming algorithm for the robust estimation problem of Metric Temporal Logic (MTL) formulas regarding a finite trace of time stated sequence. This algorithm not only tests if the MTL specification is satisfied by

In this thesis we deal with the problem of temporal logic robustness estimation. We present a dynamic programming algorithm for the robust estimation problem of Metric Temporal Logic (MTL) formulas regarding a finite trace of time stated sequence. This algorithm not only tests if the MTL specification is satisfied by the given input which is a finite system trajectory, but also quantifies to what extend does the sequence satisfies or violates the MTL specification. The implementation of the algorithm is the DP-TALIRO toolbox for MATLAB. Currently it is used as the temporal logic robust computing engine of S-TALIRO which is a tool for MATLAB searching for trajectories of minimal robustness in Simulink/ Stateflow. DP-TALIRO is expected to have near linear running time and constant memory requirement depending on the structure of the MTL formula. DP-TALIRO toolbox also integrates new features not supported in its ancestor FW-TALIRO such as parameter replacement, most related iteration and most related predicate. A derivative of DP-TALIRO which is DP-T-TALIRO is also addressed in this thesis which applies dynamic programming algorithm for time robustness computation. We test the running time of DP-TALIRO and compare it with FW-TALIRO. Finally, we present an application where DP-TALIRO is used as the robustness computation core of S-TALIRO for a parameter estimation problem.
ContributorsYang, Hengyi (Author) / Fainekos, Georgios (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2013
152778-Thumbnail Image.png
Description
Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense

Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense and encompasses sensors, feature calculations, activity classification algorithms, sleep schedules, and transmission protocols. Design choices in each of these areas impact energy use, overall accuracy, and usefulness of the system. This thesis explores methods software can influence the trade-off between energy consumption and system accuracy. In general the more energy a system consumes the more accurate will be. We explore how finding the transitions between human activities is able to reduce the energy consumption of such systems without reducing much accuracy. We introduce the Log-likelihood Ratio Test as a method to detect transitions, and explore how choices of sensor, feature calculations, and parameters concerning time segmentation affect the accuracy of this method. We discovered an approximate 5X increase in energy efficiency could be achieved with only a 5% decrease in accuracy. We also address how a system's sleep mode, in which the processor enters a low-power state and sensors are turned off, affects a wearable computing platform that does activity recognition. We discuss the energy trade-offs in each stage of the activity recognition process. We find that careful analysis of these parameters can result in great increases in energy efficiency if small compromises in overall accuracy can be tolerated. We call this the ``Great Compromise.'' We found a 6X increase in efficiency with a 7% decrease in accuracy. We then consider how wireless transmission of data affects the overall energy efficiency of a wearable computing platform. We find that design decisions such as feature calculations and grouping size have a great impact on the energy consumption of the system because of the amount of data that is stored and transmitted. For example, storing and transmitting vector-based features such as FFT or DCT do not compress the signal and would use more energy than storing and transmitting the raw signal. The effect of grouping size on energy consumption depends on the feature. For scalar features energy consumption is proportional in the inverse of grouping size, so it's reduced as grouping size goes up. For features that depend on the grouping size, such as FFT, energy increases with the logarithm of grouping size, so energy consumption increases slowly as grouping size increases. We find that compressing data through activity classification and transition detection significantly reduces energy consumption and that the energy consumed for the classification overhead is negligible compared to the energy savings from data compression. We provide mathematical models of energy usage and data generation, and test our ideas using a mobile computing platform, the Texas Instruments Chronos watch.
ContributorsBoyd, Jeffrey Michael (Author) / Sundaram, Hari (Thesis advisor) / Li, Baoxin (Thesis advisor) / Shrivastava, Aviral (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2014
152905-Thumbnail Image.png
Description
Coarse-Grained Reconfigurable Architectures (CGRA) are a promising fabric for improving the performance and power-efficiency of computing devices. CGRAs are composed of components that are well-optimized to execute loops and rotating register file is an example of such a component present in CGRAs. Due to the rotating nature of register indexes

Coarse-Grained Reconfigurable Architectures (CGRA) are a promising fabric for improving the performance and power-efficiency of computing devices. CGRAs are composed of components that are well-optimized to execute loops and rotating register file is an example of such a component present in CGRAs. Due to the rotating nature of register indexes in rotating register file, it is very challenging, if at all possible, to hold and properly index memory addresses (pointers) and static values. In this Thesis, different structures for CGRA register files are investigated. Those structures are experimentally compared in terms of performance of mapped applications, design frequency, and area. It is shown that a register file that can logically be partitioned into rotating and non-rotating regions is an excellent choice because it imposes the minimum restriction on underlying CGRA mapping algorithm while resulting in efficient resource utilization.
ContributorsSaluja, Dipal (Author) / Shrivastava, Aviral (Thesis advisor) / Lee, Yann-Hang (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2014
153089-Thumbnail Image.png
Description
A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other

A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other portable computing systems, energy is a limited resource. Based on the energy characterization of a commercial widely-used smartphone, application cores are found to consume a significant part of the total energy consumption of the device. With this insight, the subsequent part of this thesis focuses on the portion of energy that is spent to move data from the memory system to the application core's internal registers. The primary motivation for this work comes from the relatively higher power consumption associated with a data movement instruction compared to that of an arithmetic instruction. The data movement energy cost is worsened esp. in a System on Chip (SoC) because the amount of data received and exchanged in a SoC based smartphone increases at an explosive rate. A detailed investigation is performed to quantify the impact of data movement

on the overall energy consumption of a smartphone device. To aid this study, microbenchmarks that generate desired data movement patterns between different levels of the memory hierarchy are designed. Energy costs of data movement are then computed by measuring the instantaneous power consumption of the device when the micro benchmarks are executed. This work makes an extensive use of hardware performance counters to validate the memory access behavior of microbenchmarks and to characterize the energy consumed in moving data. Finally, the calculated energy costs of data movement are used to characterize the portion of energy that MobileBench applications spend in moving data. The results of this study show that a significant 35% of the total device energy is spent in data movement alone. Energy is an increasingly important criteria in the context of designing architectures for future smartphones and this thesis offers insights into data movement energy consumption.
ContributorsPandiyan, Dhinakaran (Author) / Wu, Carole-Jean (Thesis advisor) / Shrivastava, Aviral (Committee member) / Lee, Yann-Hang (Committee member) / Arizona State University (Publisher)
Created2014
153033-Thumbnail Image.png
Description
Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else

Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of

achieving high performance at low power consumption. While CGRAs can efficiently

accelerate loop kernels, accelerating loops with control flow (loops with if-then-else

structures) is quite challenging. Techniques that handle control flow execution in

CGRAs generally use predication. Such techniques execute both branches of an

if-then-else structure and select outcome of either branch to commit based on the

result of the conditional. This results in poor utilization of CGRA s computational

resources. Dual-issue scheme which is the state of the art technique for control flow

fetches instructions from both paths of the branch and selects one to execute at

runtime based on the result of the conditional. This technique has an overhead in

instruction fetch bandwidth. In this thesis, to improve performance of control flow

execution in CGRAs, I propose a solution in which the result of the conditional

expression that decides the branch outcome is communicated to the instruction fetch

unit to selectively issue instructions from the path taken by the branch at run time.

Experimental results show that my solution can achieve 34.6% better performance

and 52.1% improvement in energy efficiency on an average compared to state of the

art dual issue scheme without imposing any overhead in instruction fetch bandwidth.
ContributorsRajendran Radhika, Shri Hari (Author) / Shrivastava, Aviral (Thesis advisor) / Christen, Jennifer Blain (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2014
153040-Thumbnail Image.png
Description
Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as the part of its stack. The I/O scheduler, is a part of the Linux kernel, responsible for scheduling data requests to the internal and the external memory devices that are attached to the mobile systems.

The usage of solid state drives in the Android tablet has also seen a rise owing to its speed of operation and mechanical stability. The I/O schedulers that exist in the present Linux kernel are not better suited for handling solid state drives in particular to exploit the inherent parallelism offered by the solid state drives. The Android provides information to the Linux kernel about the processes running in the foreground and background. Based on this information the kernel decides the process scheduling and the memory management, but no such information exists for the I/O scheduling. Research shows that the resource management could be done better if the operating system is aware of the characteristics of the requester. Thus, there is a need for a better I/O scheduler that could schedule I/O operations based on the application and also exploit the parallelism in the solid state drives. The scheduler proposed through this research does that. It contains two algorithms working in unison one focusing on the solid state drives and the other on the application awareness.

The Android application context aware scheduler has the features of increasing the responsiveness of the time sensitive applications and also increases the throughput by parallel scheduling of request in the solid state drive. The suggested scheduler is tested using standard benchmarks and real-time scenarios, the results convey that our scheduler outperforms the existing default completely fair queuing scheduler of the Android.
ContributorsSivasankaran, Jeevan Prasath (Author) / Lee, Yann Hang (Thesis advisor) / Wu, Carole-Jean (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2014
153193-Thumbnail Image.png
Description
As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware.

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. This work presents a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application and maps it to the shared memory such that it enables execution on HSM architectures.
ContributorsRawat, Tushar (Author) / Shrivastava, Aviral (Thesis advisor) / Dasgupta, Partha (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)
Created2014
150197-Thumbnail Image.png
Description
Ever reducing time to market, along with short product lifetimes, has created a need to shorten the microprocessor design time. Verification of the design and its analysis are two major components of this design cycle. Design validation techniques can be broadly classified into two major categories: simulation based approaches and

Ever reducing time to market, along with short product lifetimes, has created a need to shorten the microprocessor design time. Verification of the design and its analysis are two major components of this design cycle. Design validation techniques can be broadly classified into two major categories: simulation based approaches and formal techniques. Simulation based microprocessor validation involves running millions of cycles using random or pseudo random tests and allows verification of the register transfer level (RTL) model against an architectural model, i.e., that the processor executes instructions as required. The validation effort involves model checking to a high level description or simulation of the design against the RTL implementation. Formal techniques exhaustively analyze parts of the design but, do not verify RTL against the architecture specification. The focus of this work is to implement a fully automated validation environment for a MIPS based radiation hardened microprocessor using simulation based approaches. The basic framework uses the classical validation approach in which the design to be validated is described in a Hardware Definition Language (HDL) such as VHDL or Verilog. To implement a simulation based approach a number of random or pseudo random tests are generated. The output of the HDL based design is compared against the one obtained from a "perfect" model implementing similar functionality, a mismatch in the results would thus indicate a bug in the HDL based design. Effort is made to design the environment in such a manner that it can support validation during different stages of the design cycle. The validation environment includes appropriate changes so as to support architecture changes which are introduced because of radiation hardening. The manner in which the validation environment is build is highly dependent on the specifications of the perfect model used for comparisons. This work implements the validation environment for two MIPS simulators as the reference model. Two bugs have been discovered in the RTL model, using simulation based approaches through the validation environment.
ContributorsSharma, Abhishek (Author) / Clark, Lawrence (Thesis advisor) / Holbert, Keith E. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2011
150524-Thumbnail Image.png
Description
Network-on-Chip (NoC) architectures have emerged as the solution to the on-chip communication challenges of multi-core embedded processor architectures. Design space exploration and performance evaluation of a NoC design requires fast simulation infrastructure. Simulation of register transfer level model of NoC is too slow for any meaningful design space exploration. One

Network-on-Chip (NoC) architectures have emerged as the solution to the on-chip communication challenges of multi-core embedded processor architectures. Design space exploration and performance evaluation of a NoC design requires fast simulation infrastructure. Simulation of register transfer level model of NoC is too slow for any meaningful design space exploration. One of the solutions to reduce the speed of simulation is to increase the level of abstraction. SystemC TLM2.0 provides the capability to model hardware design at higher levels of abstraction with trade-off of simulation speed and accuracy. In this thesis, SystemC TLM2.0 models of NoC routers are developed at three levels of abstraction namely loosely-timed, approximately-timed, and cycle accurate. Simulation speed and accuracy of these three models are evaluated by a case study of a 4x4 mesh NoC.
ContributorsArlagadda Narasimharaju, Jyothi Swaroop (Author) / Chatha, Karamvir S (Thesis advisor) / Sen, Arunabha (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2012