Search Content

Enabling multi-threaded applications on hybrid shared memory manycore architectures

Description

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware.…

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. This work presents a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application and maps it to the shared memory such that it enables execution on HSM architectures.

ContributorsRawat, Tushar (Author) / Shrivastava, Aviral (Thesis advisor) / Dasgupta, Partha (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2014

A study of backward compatible dynamic software update

Description

Dynamic software update (DSU) enables a program to update while it is running. DSU aims to minimize the loss due to program downtime for updates. Usually DSU is done in three steps: suspending the execution of an old program, mapping the execution state from the old program to a new…

Dynamic software update (DSU) enables a program to update while it is running. DSU aims to minimize the loss due to program downtime for updates. Usually DSU is done in three steps: suspending the execution of an old program, mapping the execution state from the old program to a new one, and resuming execution of the new program with the mapped state. The semantic correctness of DSU depends largely on the state mapping which is mostly composed by developers manually nowadays. However, the manual construction of a state mapping does not necessarily ensure sound and dependable state mapping. This dissertation presents a methodology to assist developers by automating the construction of a partial state mapping with a guarantee of correctness.

This dissertation includes a detailed study of DSU correctness and automatic state mapping for server programs with an established user base. At first, the dissertation presents the formal treatment of DSU correctness and the state mapping problem. Then the dissertation presents an argument that for programs with an established user base, dynamic updates must be backward compatible. The dissertation next presents a general definition of backward compatibility that specifies the allowed changes in program interaction between an old version and a new version and identified patterns of code evolution that results in backward compatible behavior. Thereafter the dissertation presents formal definitions of these patterns together with proof that any changes to programs in these patterns will result in backward compatible update. To show the applicability of the results, the dissertation presents SitBack, a program analysis tool that has an old version program and a new one as input and computes a partial state mapping under the assumption that the new version is backward compatible with the old version.

SitBack does not handle all kinds of changes and it reports to the user in incomplete part of a state mapping. The dissertation presents a detailed evaluation of SitBack which shows that the methodology of automatic state mapping is promising in deal with real world program updates. For example, SitBack produces state mappings for 17-75% of the changed functions. Furthermore, SitBack generates automatic state mapping that leads to successful DSU. In conclusion, the study presented in this dissertation does assist developers in developing state mappings for DSU by automating the construction of state mappings with a correctness guarantee, which helps the adoption of DSU ultimately.

ContributorsShen, Jun (Author) / Bazzi, Rida A (Thesis advisor) / Fainekos, Georgios (Committee member) / Neamtiu, Iulian (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2015

Dynamic analysis of embedded software

Description

Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation…

Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.

ContributorsSong, Young Wn (Author) / Lee, Yann-Hang (Thesis advisor) / Shrivastava, Aviral (Committee member) / Fainekos, Georgios (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2015

Hybrid Multiresolution Simulation & Model Checking: Network-On-Chip Systems

Description

Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to…

Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to software engineering practices, incremental model design is required for complex system design. As a result, models at early increments are significantly simpler relative to real systems. While experimenting (verification or validation) on models at early increments are computationally less demanding, the results of these experiments are less trustworthy and less rewarding. At any increment of design, a set of tools and technique are required for controlling the complexity of models and experimentation.

A complex system such as Network-on-Chip (NoC) may benefit from incremental design stages. Current design methods for NoC rely on multiple models developed using various modeling frameworks. It is useful to develop frameworks that can formalize the relationships among these models. Fine-grain models are derived using their coarse-grain counterparts. Moreover, validation and verification capability at various design stages enabled through disciplined model conversion is very beneficial.

In this research, Multiresolution Modeling (MRM) is used for system level design of NoC. MRM aids in creating a family of models at different levels of scale and complexity with well-formed relationships. In addition, a variant of the Discrete Event System Specification (DEVS) formalism is proposed which supports model checking. Hierarchical models of Network-on-Chip components may be created at different resolutions while each model can be validated using discrete-event simulation and verified via state exploration. System property expressions are defined in the DEVS language and developed as Transducers which can be applied seamlessly for model checking and simulation purposes.

Multiresolution Modeling with verification and validation capabilities of this framework complement one another. MRM manages the scale and complexity of models which in turn can reduces V&V time and effort and conversely the V&V helps ensure correctness of models at multiple resolutions. This framework is realized through extending the DEVS-Suite simulator and its applicability demonstrated for exemplar NoC models.

ContributorsGholami, Soroosh (Author) / Sarjoughian, Hessam S. (Thesis advisor) / Fainekos, Georgios (Committee member) / Ogras, Umit Y. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2017

WCET-aware scratchpad memory management for hard real-time systems

Description

Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize…

Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have a large impact on the worst-case performance due to expensive off- chip memory accesses involved in cache miss handling. In this regard, software-controlled scratchpad memories (SPMs) have become a promising alternative to caches. An SPM is a raw SRAM, controlled only by executing data movement instructions explicitly at runtime, and such explicit control facilitates static analyses to obtain safe and tight upper bounds of WCETs. SPM management techniques, used in compilers targeting an SPM-based processor, determine how to use a given SPM space by deciding where to insert data movement instructions and what operations to perform at those program locations. This dissertation presents several management techniques for program code and stack data, which aim to optimize the WCETs of a given program. The proposed code management techniques include optimal allocation algorithms and a polynomial-time heuristic for allocating functions to the SPM space, with or without the use of abstraction of SPM regions, and a heuristic for splitting functions into smaller partitions. The proposed stack data management technique, on the other hand, finds an optimal set of program locations to evict and restore stack frames to avoid stack overflows, when the call stack resides in a size-limited SPM. In the evaluation, the WCETs of various benchmarks including real-world automotive applications are statically calculated for SPMs and caches in several different memory configurations.

ContributorsKim, Yooseong (Author) / Shrivastava, Aviral (Thesis advisor) / Broman, David (Committee member) / Fainekos, Georgios (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2017

Crossroads --- A Time-Sensitive Autonomous Intersection Management Technique

Description

For autonomous vehicles, intelligent autonomous intersection management will be required for safe and efficient operation. In order to achieve safe operation despite uncertainties in vehicle trajectory, intersection management techniques must consider a safety buffer around the vehicles. For truly safe operation, an extra buffer space should be added to account…

For autonomous vehicles, intelligent autonomous intersection management will be required for safe and efficient operation. In order to achieve safe operation despite uncertainties in vehicle trajectory, intersection management techniques must consider a safety buffer around the vehicles. For truly safe operation, an extra buffer space should be added to account for the network and computational delay caused by communication with the Intersection Manager (IM). However, modeling the worst-case computation and network delay as additional buffer around the vehicle degrades the throughput of the intersection. To avoid this problem, AIM, a popular state-of-the-art IM, adopts a query-based approach in which the vehicle requests to enter at a certain arrival time dictated by its current velocity and distance to the intersection, and the IM replies yes
o. Although this solution does not degrade the position uncertainty, it ultimately results in poor intersection throughput. We present Crossroads, a time-sensitive programming method to program the interface of a vehicle and the IM. Without requiring additional buffer to account for the effect of network and computational delay, Crossroads enables efficient intersection management. Test results on a 1/10 scale model of intersection using TRAXXAS RC cars demonstrates that our Crossroads approach obviates the need for large buffers to accommodate for the network and computation delay, and can reduce the average wait time for the vehicles at a single-lane intersection by 24%. To compare Crossroads with previous approaches, we perform extensive Matlab simulations, and find that Crossroads achieves on average 1.62X higher throughput than a simple VT-IM with extra safety buffer, and 1.36X better than AIM.

ContributorsAndert, Edward (Author) / Shrivastava, Aviral (Thesis advisor) / Fainekos, Georgios (Committee member) / Ben Amor, Hani (Committee member) / Arizona State University (Publisher)

Created2017

UnSync: a soft error resilient redundant CMP architecture

Description

Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to improve reliability on general purpose Chip Multiprocessors (CMPs). With the…

Reducing device dimensions, increasing transistor densities, and smaller timing windows, expose the vulnerability of processors to soft errors induced by charge carrying particles. Since these factors are inevitable in the advancement of processor technology, the industry has been forced to improve reliability on general purpose Chip Multiprocessors (CMPs). With the availability of increased hardware resources, redundancy based techniques are the most promising methods to eradicate soft error failures in CMP systems. This work proposes a novel customizable and redundant CMP architecture (UnSync) that utilizes hardware based detection mechanisms (most of which are readily available in the processor), to reduce overheads during error free executions. In the presence of errors (which are infrequent), the always forward execution enabled recovery mechanism provides for resilience in the system. The inherent nature of UnSync architecture framework supports customization of the redundancy, and thereby provides means to achieve possible performance-reliability trade-offs in many-core systems. This work designs a detailed RTL model of UnSync architecture and performs hardware synthesis to compare the hardware (power/area) overheads incurred. It then compares the same with those of the Reunion technique, a state-of-the-art redundant multi-core architecture. This work also performs cycle-accurate simulations over a wide range of SPEC2000, and MiBench benchmarks to evaluate the performance efficiency achieved over that of the Reunion architecture. Experimental results show that, UnSync architecture reduces power consumption by 34.5% and improves performance by up to 20% with 13.3% less area overhead, when compared to Reunion architecture for the same level of reliability achieved.

ContributorsHong, Fei (Author) / Shrivastava, Aviral (Thesis advisor) / Bazzi, Rida (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2011

An Intelligent Framework for Energy-Aware Mobile Computing Subject to Stochastic System Dynamics

Description

User satisfaction is pivotal to the success of mobile applications. At the same time, it is imperative to maximize the energy efficiency of the mobile device to ensure optimal usage of the limited energy source available to mobile devices while maintaining the necessary levels of user satisfaction. However, this is…

User satisfaction is pivotal to the success of mobile applications. At the same time, it is imperative to maximize the energy efficiency of the mobile device to ensure optimal usage of the limited energy source available to mobile devices while maintaining the necessary levels of user satisfaction. However, this is complicated due to user interactions, numerous shared resources, and network conditions that produce substantial uncertainty to the mobile device's performance and power characteristics. In this dissertation, a new approach is presented to characterize and control mobile devices that accurately models these uncertainties. The proposed modeling framework is a completely data-driven approach to predicting power and performance. The approach makes no assumptions on the distributions of the underlying sources of uncertainty and is capable of predicting power and performance with over 93% accuracy.

Using this data-driven prediction framework, a closed-loop solution to the DEM problem is derived to maximize the energy efficiency of the mobile device subject to various thermal, reliability and deadline constraints. The design of the controller imposes minimal operational overhead and is able to tune the performance and power prediction models to changing system conditions. The proposed controller is implemented on a real mobile platform, the Google Pixel smartphone, and demonstrates a 19% improvement in energy efficiency over the standard frequency governor implemented on all Android devices.

ContributorsGaudette, Benjamin David (Author) / Vrudhula, Sarma (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Fainekos, Georgios (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2017

Safe and Robust Cooperative Algorithm for Connected Autonomous Vehicles

Description

Autonomous Vehicles (AVs) have the potential to significantly evolve transportation. AVs are expected to make transportation safer by avoiding accidents that happen due to human errors. When AVs become connected, they can exchange information with the infrastructure or other Connected Autonomous Vehicles (CAVs) to efficiently plan their future motion and…

Autonomous Vehicles (AVs) have the potential to significantly evolve transportation. AVs are expected to make transportation safer by avoiding accidents that happen due to human errors. When AVs become connected, they can exchange information with the infrastructure or other Connected Autonomous Vehicles (CAVs) to efficiently plan their future motion and therefore, increase the road throughput and reduce energy consumption. Cooperative algorithms for CAVs will not be deployed in real life unless they are proved to be safe, robust, and resilient to different failure models. Since intersections are crucial areas where most accidents happen, this dissertation first focuses on making existing intersection management algorithms safe and resilient against network and computation time, bounded model mismatches and external disturbances, and the existence of a rogue vehicle. Then, a generic algorithm for conflict resolution and cooperation of CAVs is proposed that ensures the safety of vehicles even when other vehicles suddenly change their plan. The proposed approach can also detect deadlock situations among CAVs and resolve them through a negotiation process. A testbed consisting of 1/10th scale model CAVs is built to evaluate the proposed algorithms. In addition, a simulator is developed to perform tests at a large scale. Results from the conducted experiments indicate the robustness and resilience of proposed approaches.

ContributorsKhayatian, Mohammad (Author) / Shrivastava, Aviral (Thesis advisor) / Fainekos, Georgios (Committee member) / Ben Amor, Heni (Committee member) / Yang, Yezhou (Committee member) / Lou, Yingyan (Committee member) / Iannucci, Bob (Committee member) / Arizona State University (Publisher)

Created2021

ASU Electronic Theses and Dissertations

Filtering by

Enabling multi-threaded applications on hybrid shared memory manycore architectures

A study of backward compatible dynamic software update

Dynamic analysis of embedded software

Hybrid Multiresolution Simulation & Model Checking: Network-On-Chip Systems

WCET-aware scratchpad memory management for hard real-time systems

Crossroads --- A Time-Sensitive Autonomous Intersection Management Technique

UnSync: a soft error resilient redundant CMP architecture

An Intelligent Framework for Energy-Aware Mobile Computing Subject to Stochastic System Dynamics

Safe and Robust Cooperative Algorithm for Connected Autonomous Vehicles