Search Content

System-level synthesis of dataplane subsystems for MPSoCs

Description

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a…

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP processors, and custom hardware accelerators. Typically, these sub-systems utilize scratchpad memories (SPM) rather than support cache coherency. The overall architecture is an integration of the various sub-systems through a high bandwidth system-level interconnect (such as a Network-on-Chip (NoC)). The shift to MPSoCs has been fueled by three major factors: demand for high performance, the use of component libraries, and short design turn around time. As customers continue to desire more and more complex applications on their embedded devices the performance demand for these devices continues to increase. Designers have turned to using MPSoCs to address this demand. By using pre-made IP libraries designers can quickly piece together a MPSoC that will meet the application demands of the end user with minimal time spent designing new hardware. Additionally, the use of MPSoCs allows designers to generate new devices very quickly and thus reducing the time to market. In this work, a complete MPSoC synthesis design flow is presented. We first present a technique \cite{leary1_intro} to address the synthesis of the interconnect architecture (particularly Network-on-Chip (NoC)). We then address the synthesis of the memory architecture of a MPSoC sub-system \cite{leary2_intro}. Lastly, we present a co-synthesis technique to generate the functional and memory architectures simultaneously. The validity and quality of each synthesis technique is demonstrated through extensive experimentation.

ContributorsLeary, Glenn (Author) / Chatha, Karamvir S (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Beraha, Rudy (Committee member) / Arizona State University (Publisher)

Created2013

StreamWorks: an energy-efficient embedded co-processor for stream computing

Description

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The…

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.

This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.

Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.

The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator.

ContributorsPanda, Amrit (Author) / Chatha, Karam S. (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2014

Sustainable cloud computing

Description

Energy consumption of the data centers worldwide is rapidly growing fueled by ever-increasing demand for Cloud computing applications ranging from social networking to e-commerce. Understandably, ensuring energy-efficiency and sustainability of Cloud data centers without compromising performance is important for both economic and environmental reasons. This dissertation develops a cyber-physical multi-tier…

Energy consumption of the data centers worldwide is rapidly growing fueled by ever-increasing demand for Cloud computing applications ranging from social networking to e-commerce. Understandably, ensuring energy-efficiency and sustainability of Cloud data centers without compromising performance is important for both economic and environmental reasons. This dissertation develops a cyber-physical multi-tier server and workload management architecture which operates at the local and the global (geo-distributed) data center level. We devise optimization frameworks for each tier to optimize energy consumption, energy cost and carbon footprint of the data centers. The proposed solutions are aware of various energy management tradeoffs that manifest due to the cyber-physical interactions in data centers, while providing provable guarantee on the solutions' computation efficiency and energy/cost efficiency. The local data center level energy management takes into account the impact of server consolidation on the cooling energy, avoids cooling-computing power tradeoff, and optimizes the total energy (computing and cooling energy) considering the data centers' technology trends (servers' power proportionality and cooling system power efficiency). The global data center level cost management explores the diversity of the data centers to minimize the utility cost while satisfying the carbon cap requirement of the Cloud and while dealing with the adversity of the prediction error on the data center parameters. Finally, the synergy of the local and the global data center energy and cost optimization is shown to help towards achieving carbon neutrality (net-zero) in a cost efficient manner.

ContributorsAbbasi, Zahra (Author) / Gupta, Sandeep K. S. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2014

Determining the integrity of applications and operating systems using remote and local attesters

Description

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute,…

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute, and the code generates results which are sent to the external entity. These results provide the external entity an assurance as to whether the client application and the OS are in pristine condition. This work also presents a technique where it can be verified that the application which was attested, did not get replaced by a different application after completion of the attestation. The implementation of these three techniques was achieved entirely in software and is backward compatible with legacy machines on the Intel x86 architecture. This research also presents two approaches to incorporating software based "root of trust" using Virtual Machine Monitors (VMMs). The first approach determines the integrity of an executing Guest OS from the Host OS using Linux Kernel-based Virtual Machine (KVM) and qemu emulation software. The second approach implements a small VMM called MIvmm that can be utilized as a trusted codebase to build security applications such as those implemented in this research. MIvmm was conceptualized and implemented without using any existing codebase; its minimal size allows it to be trustworthy. Both the VMM approaches leverage processor support for virtualization in the Intel x86 architecture.

ContributorsSrinivasan, Raghunathan (Author) / Dasgupta, Partha (Thesis advisor) / Colbourn, Charles (Committee member) / Shrivastava, Aviral (Committee member) / Huang, Dijiang (Committee member) / Dewan, Prashant (Committee member) / Arizona State University (Publisher)

Created2011

Threshold logic properties and methods: applications to post-CMOS design automation and gene regulation modeling

Description

Threshold logic has been studied by at least two independent group of researchers. One group of researchers studied threshold logic with the intention of building threshold logic circuits. The earliest research to this end was done in the 1960's. The major work at that time focused on studying mathematical properties…

Threshold logic has been studied by at least two independent group of researchers. One group of researchers studied threshold logic with the intention of building threshold logic circuits. The earliest research to this end was done in the 1960's. The major work at that time focused on studying mathematical properties of threshold logic as no efficient circuit implementations of threshold logic were available. Recently many post-CMOS (Complimentary Metal Oxide Semiconductor) technologies that implement threshold logic have been proposed along with efficient CMOS implementations. This has renewed the effort to develop efficient threshold logic design automation techniques. This work contributes to this ongoing effort. Another group studying threshold logic did so, because the building block of neural networks - the Perceptron, is identical to the threshold element implementing a threshold function. Neural networks are used for various purposes as data classifiers. This work contributes tangentially to this field by proposing new methods and techniques to study and analyze functions implemented by a Perceptron After completion of the Human Genome Project, it has become evident that most biological phenomenon is not caused by the action of single genes, but due to the complex interaction involving a system of genes. In recent times, the `systems approach' for the study of gene systems is gaining popularity. Many different theories from mathematics and computer science has been used for this purpose. Among the systems approaches, the Boolean logic gene model has emerged as the current most popular discrete gene model. This work proposes a new gene model based on threshold logic functions (which are a subset of Boolean logic functions). The biological relevance and utility of this model is argued illustrated by using it to model different in-vivo as well as in-silico gene systems.

ContributorsLinge Gowda, Tejaswi (Author) / Vrudhula, Sarma (Thesis advisor) / Shrivastava, Aviral (Committee member) / Chatha, Karamvir (Committee member) / Kim, Seungchan (Committee member) / Arizona State University (Publisher)

Created2012

Compilation of stream programs onto embedded multicore architectures

Description

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM)…

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling.

ContributorsChe, Weijia (Author) / Chatha, Karam Singh (Thesis advisor) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2012

A study of backward compatible dynamic software update

Description

Dynamic software update (DSU) enables a program to update while it is running. DSU aims to minimize the loss due to program downtime for updates. Usually DSU is done in three steps: suspending the execution of an old program, mapping the execution state from the old program to a new…

Dynamic software update (DSU) enables a program to update while it is running. DSU aims to minimize the loss due to program downtime for updates. Usually DSU is done in three steps: suspending the execution of an old program, mapping the execution state from the old program to a new one, and resuming execution of the new program with the mapped state. The semantic correctness of DSU depends largely on the state mapping which is mostly composed by developers manually nowadays. However, the manual construction of a state mapping does not necessarily ensure sound and dependable state mapping. This dissertation presents a methodology to assist developers by automating the construction of a partial state mapping with a guarantee of correctness.

This dissertation includes a detailed study of DSU correctness and automatic state mapping for server programs with an established user base. At first, the dissertation presents the formal treatment of DSU correctness and the state mapping problem. Then the dissertation presents an argument that for programs with an established user base, dynamic updates must be backward compatible. The dissertation next presents a general definition of backward compatibility that specifies the allowed changes in program interaction between an old version and a new version and identified patterns of code evolution that results in backward compatible behavior. Thereafter the dissertation presents formal definitions of these patterns together with proof that any changes to programs in these patterns will result in backward compatible update. To show the applicability of the results, the dissertation presents SitBack, a program analysis tool that has an old version program and a new one as input and computes a partial state mapping under the assumption that the new version is backward compatible with the old version.

SitBack does not handle all kinds of changes and it reports to the user in incomplete part of a state mapping. The dissertation presents a detailed evaluation of SitBack which shows that the methodology of automatic state mapping is promising in deal with real world program updates. For example, SitBack produces state mappings for 17-75% of the changed functions. Furthermore, SitBack generates automatic state mapping that leads to successful DSU. In conclusion, the study presented in this dissertation does assist developers in developing state mappings for DSU by automating the construction of state mappings with a correctness guarantee, which helps the adoption of DSU ultimately.

ContributorsShen, Jun (Author) / Bazzi, Rida A (Thesis advisor) / Fainekos, Georgios (Committee member) / Neamtiu, Iulian (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2015

Dynamic analysis of embedded software

Description

Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation…

Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.

ContributorsSong, Young Wn (Author) / Lee, Yann-Hang (Thesis advisor) / Shrivastava, Aviral (Committee member) / Fainekos, Georgios (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2015

From Formal Requirement Analysis to Testing and Monitoring of Cyber-Physical Systems

Description

Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods…

Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods such as testing and monitoring of the CPS are considered in the industry. The formal representation of the CPS requirements is a challenging task. In addition, checking the system outputs with respect to requirements is a computationally complex problem. In this dissertation, these problems for the verification of CPS are addressed. The first method provides a formal requirement analysis framework which can find logical issues in the requirements and help engineers to correct the requirements. Also, a method is provided to detect tests which vacuously satisfy the requirement because of the requirement structure. This method is used to improve the test generation framework for CPS. Finally, two runtime verification algorithms are developed for off-line/on-line monitoring with respect to real-time requirements. These monitoring algorithms are computationally efficient, and they can be used in practical applications for monitoring CPS with low runtime overhead.

ContributorsDokhanchi, Adel (Author) / Fainekos, Georgios (Thesis advisor) / Lee, Yann-Hang (Committee member) / Sarjoughian, Hessam S. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2017

Hybrid Multiresolution Simulation & Model Checking: Network-On-Chip Systems

Description

Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to…

Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to software engineering practices, incremental model design is required for complex system design. As a result, models at early increments are significantly simpler relative to real systems. While experimenting (verification or validation) on models at early increments are computationally less demanding, the results of these experiments are less trustworthy and less rewarding. At any increment of design, a set of tools and technique are required for controlling the complexity of models and experimentation.

A complex system such as Network-on-Chip (NoC) may benefit from incremental design stages. Current design methods for NoC rely on multiple models developed using various modeling frameworks. It is useful to develop frameworks that can formalize the relationships among these models. Fine-grain models are derived using their coarse-grain counterparts. Moreover, validation and verification capability at various design stages enabled through disciplined model conversion is very beneficial.

In this research, Multiresolution Modeling (MRM) is used for system level design of NoC. MRM aids in creating a family of models at different levels of scale and complexity with well-formed relationships. In addition, a variant of the Discrete Event System Specification (DEVS) formalism is proposed which supports model checking. Hierarchical models of Network-on-Chip components may be created at different resolutions while each model can be validated using discrete-event simulation and verified via state exploration. System property expressions are defined in the DEVS language and developed as Transducers which can be applied seamlessly for model checking and simulation purposes.

Multiresolution Modeling with verification and validation capabilities of this framework complement one another. MRM manages the scale and complexity of models which in turn can reduces V&V time and effort and conversely the V&V helps ensure correctness of models at multiple resolutions. This framework is realized through extending the DEVS-Suite simulator and its applicability demonstrated for exemplar NoC models.

ContributorsGholami, Soroosh (Author) / Sarjoughian, Hessam S. (Thesis advisor) / Fainekos, Georgios (Committee member) / Ogras, Umit Y. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2017

Theses and Dissertations

Filtering by

System-level synthesis of dataplane subsystems for MPSoCs

StreamWorks: an energy-efficient embedded co-processor for stream computing

Sustainable cloud computing

Determining the integrity of applications and operating systems using remote and local attesters

Threshold logic properties and methods: applications to post-CMOS design automation and gene regulation modeling

Compilation of stream programs onto embedded multicore architectures

A study of backward compatible dynamic software update

Dynamic analysis of embedded software

From Formal Requirement Analysis to Testing and Monitoring of Cyber-Physical Systems

Hybrid Multiresolution Simulation & Model Checking: Network-On-Chip Systems