Search Content

System-level synthesis of dataplane subsystems for MPSoCs

Description

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a…

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP processors, and custom hardware accelerators. Typically, these sub-systems utilize scratchpad memories (SPM) rather than support cache coherency. The overall architecture is an integration of the various sub-systems through a high bandwidth system-level interconnect (such as a Network-on-Chip (NoC)). The shift to MPSoCs has been fueled by three major factors: demand for high performance, the use of component libraries, and short design turn around time. As customers continue to desire more and more complex applications on their embedded devices the performance demand for these devices continues to increase. Designers have turned to using MPSoCs to address this demand. By using pre-made IP libraries designers can quickly piece together a MPSoC that will meet the application demands of the end user with minimal time spent designing new hardware. Additionally, the use of MPSoCs allows designers to generate new devices very quickly and thus reducing the time to market. In this work, a complete MPSoC synthesis design flow is presented. We first present a technique \cite{leary1_intro} to address the synthesis of the interconnect architecture (particularly Network-on-Chip (NoC)). We then address the synthesis of the memory architecture of a MPSoC sub-system \cite{leary2_intro}. Lastly, we present a co-synthesis technique to generate the functional and memory architectures simultaneously. The validity and quality of each synthesis technique is demonstrated through extensive experimentation.

ContributorsLeary, Glenn (Author) / Chatha, Karamvir S (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Beraha, Rudy (Committee member) / Arizona State University (Publisher)

Created2013

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

Early age characterization and microstructural features of sustainable binder systems for concrete

Description

Concrete is the most widely used infrastructure material worldwide. Production of portland cement, the main binding component in concrete, has been shown to require significant energy and account for approximately 5-7% of global carbon dioxide production. The expected continued increased use of concrete over the coming decades indicates this is…

Concrete is the most widely used infrastructure material worldwide. Production of portland cement, the main binding component in concrete, has been shown to require significant energy and account for approximately 5-7% of global carbon dioxide production. The expected continued increased use of concrete over the coming decades indicates this is an ideal time to implement sustainable binder technologies. The current work aims to explore enhanced sustainability concretes, primarily in the context of limestone and flow. Aspects such as hydration kinetics, hydration product formation and pore structure add to the understanding of the strength development and potential durability characteristics of these binder systems. Two main strategies for enhancing this sustainability are explored in this work: (i) the use of high volume limestone in combination with other alternative cementitious materials to decrease the portland cement quantity in concrete and (ii) the use of geopolymers as the binder phase in concrete. The first phase of the work investigates the use of fine limestone as cement replacement from the perspective of hydration, strength development, and pore structure. The nature of the potential synergistic benefit of limestone and alumina will be explored. The second phase will focus on the rheological characterization of these materials in the fresh state, as well as a more general investigation of the rheological characterization of suspensions. The results of this work indicate several key ideas. (i) There is a potential synergistic benefit for strength, hydration, and pore structure by using alumina and in portland limestone cements, (ii) the limestone in these systems is shown to react to some extent, and fine limestone is shown to accelerate hydration, (iii) rheological characteristics of cementitious suspensions are complex, and strongly dependent on several key parameters including: the solid loading, interparticle forces, surface area of the particles present, particle size distribution of the particles, and rheological nature of the media in which the particles are suspended, and (iv) stress plateau method is proposed for the determination of rheological properties of concentrated suspensions, as it more accurately predicts apparent yield stress and is shown to correlate well with other viscoelastic properties of the suspensions.

ContributorsVance, Kirk (Author) / Neithalath, Narayanan (Thesis advisor) / Rajan, Subramaniam D. (Committee member) / Mobasher, Barzin (Committee member) / Chawla, Nikhilesh (Committee member) / Marzke, Robert (Committee member) / Arizona State University (Publisher)

Created2014

StreamWorks: an energy-efficient embedded co-processor for stream computing

Description

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The…

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements.

This work presents StreamWorks, a multi-core embedded architecture for energy-efficient stream computing. The basic processing element in the StreamWorks architecture is the StreamEngine (SE) which is responsible for iteratively executing a stream kernel. SE introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. Each instruction in a SE is locked to a Reservation Station (RS) and revitalizes itself after execution; thus never retiring from the RS. The entire kernel is hosted in RS Banks (RSBs) close to functional units for energy-efficient instruction delivery. The dataflow semantics of stream kernels are captured by a context-aware dataflow execution mode that efficiently exploits the Instruction Level Parallelism (ILP) and Data-level parallelism (DLP) within stream kernels.

Multiple SEs are grouped together to form a StreamCluster (SC) that communicate via a local interconnect. A novel software FIFO virtualization technique with split-join functionality is proposed for efficient and scalable stream communication across SEs. The proposed communication mechanism exploits the Task-level parallelism (TLP) of the stream application. The performance and scalability of the communication mechanism is evaluated against the existing data movement schemes for scratchpad based multi-core architectures. Further, overlay schemes and architectural support are proposed that allow hosting any number of kernels on the StreamWorks architecture. The proposed oevrlay schemes for code management supports kernel(context) switching for the most common use cases and can be adapted for any multi-core architecture that use software managed local memories.

The performance and energy-efficiency of the StreamWorks architecture is evaluated for stream kernel and application benchmarks by implementing the architecture in 45nm TSMC and comparison with a low power RISC core and a contemporary accelerator.

ContributorsPanda, Amrit (Author) / Chatha, Karam S. (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2014

Sustainable cloud computing

Description

Energy consumption of the data centers worldwide is rapidly growing fueled by ever-increasing demand for Cloud computing applications ranging from social networking to e-commerce. Understandably, ensuring energy-efficiency and sustainability of Cloud data centers without compromising performance is important for both economic and environmental reasons. This dissertation develops a cyber-physical multi-tier…

Energy consumption of the data centers worldwide is rapidly growing fueled by ever-increasing demand for Cloud computing applications ranging from social networking to e-commerce. Understandably, ensuring energy-efficiency and sustainability of Cloud data centers without compromising performance is important for both economic and environmental reasons. This dissertation develops a cyber-physical multi-tier server and workload management architecture which operates at the local and the global (geo-distributed) data center level. We devise optimization frameworks for each tier to optimize energy consumption, energy cost and carbon footprint of the data centers. The proposed solutions are aware of various energy management tradeoffs that manifest due to the cyber-physical interactions in data centers, while providing provable guarantee on the solutions' computation efficiency and energy/cost efficiency. The local data center level energy management takes into account the impact of server consolidation on the cooling energy, avoids cooling-computing power tradeoff, and optimizes the total energy (computing and cooling energy) considering the data centers' technology trends (servers' power proportionality and cooling system power efficiency). The global data center level cost management explores the diversity of the data centers to minimize the utility cost while satisfying the carbon cap requirement of the Cloud and while dealing with the adversity of the prediction error on the data center parameters. Finally, the synergy of the local and the global data center energy and cost optimization is shown to help towards achieving carbon neutrality (net-zero) in a cost efficient manner.

ContributorsAbbasi, Zahra (Author) / Gupta, Sandeep K. S. (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2014

Characterization and modeling of moisture flow through hydrating cement-based materials under early-age drying and shrinkage conditions

Description

Early-age cracks in fresh concrete occur mainly due to high rate of surface evaporation and restraint offered by the contracting solid phase. Available test methods that simulate severe drying conditions, however, were not originally designed to focus on evaporation and transport characteristics of the liquid-gas phases in a hydrating cementitious…

Early-age cracks in fresh concrete occur mainly due to high rate of surface evaporation and restraint offered by the contracting solid phase. Available test methods that simulate severe drying conditions, however, were not originally designed to focus on evaporation and transport characteristics of the liquid-gas phases in a hydrating cementitious microstructure. Therefore, these tests lack accurate measurement of the drying rate and data interpretation based on the principles of transport properties is limited. A vacuum-based test method capable of simulating early-age cracks in 2-D cement paste is developed which continuously monitors the weight loss and changes to the surface characteristics. 2-D crack evolution is documented using time-lapse photography. Effects of sample size, w/c ratio, initial curing and fiber content are studied. In the subsequent analysis, the cement paste phase is considered as a porous medium and moisture transport is described based on surface mass transfer and internal moisture transport characteristics. Results indicate that drying occurs in two stages: constant drying rate period (stage I), followed by a falling drying rate period (stage II). Vapor diffusion in stage I and unsaturated flow within porous medium in stage II determine the overall rate of evaporation. The mass loss results are analyzed using diffusion-based models. Results show that moisture diffusivity in stage I is higher than its value in stage II by more than one order of magnitude. The drying model is used in conjunction with a shrinkage model to predict the development of capillary pressures. Similar approach is implemented in drying restrained ring specimens to predict 1-D crack width development. An analytical approach relates diffusion, shrinkage, creep, tensile and fracture properties to interpret the experimental data. Evaporation potential is introduced based on the boundary layer concept, mass transfer, and a driving force consisting of the concentration gradient. Effect of wind velocity is reflected on Reynolds number which affects the boundary layer on sample surface. This parameter along with Schmidt and Sherwood numbers are used for prediction of mass transfer coefficient. Concentration gradient is shown to be a strong function of temperature and relative humidity and used to predict the evaporation potential. Results of modeling efforts are compared with a variety of test results reported in the literature. Diffusivity data and results of 1-D and 2-D image analyses indicate significant effects of fibers on controlling early-age cracks. Presented models are capable of predicting evaporation rates and moisture flow through hydrating cement-based materials during early-age drying and shrinkage conditions.

ContributorsBakhshi, Mehdi (Author) / Mobasher, Barzin (Thesis advisor) / Rajan, Subramaniam D. (Committee member) / Zapata, Claudia E. (Committee member) / Arizona State University (Publisher)

Created2011

Determining the integrity of applications and operating systems using remote and local attesters

Description

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute,…

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute, and the code generates results which are sent to the external entity. These results provide the external entity an assurance as to whether the client application and the OS are in pristine condition. This work also presents a technique where it can be verified that the application which was attested, did not get replaced by a different application after completion of the attestation. The implementation of these three techniques was achieved entirely in software and is backward compatible with legacy machines on the Intel x86 architecture. This research also presents two approaches to incorporating software based "root of trust" using Virtual Machine Monitors (VMMs). The first approach determines the integrity of an executing Guest OS from the Host OS using Linux Kernel-based Virtual Machine (KVM) and qemu emulation software. The second approach implements a small VMM called MIvmm that can be utilized as a trusted codebase to build security applications such as those implemented in this research. MIvmm was conceptualized and implemented without using any existing codebase; its minimal size allows it to be trustworthy. Both the VMM approaches leverage processor support for virtualization in the Intel x86 architecture.

ContributorsSrinivasan, Raghunathan (Author) / Dasgupta, Partha (Thesis advisor) / Colbourn, Charles (Committee member) / Shrivastava, Aviral (Committee member) / Huang, Dijiang (Committee member) / Dewan, Prashant (Committee member) / Arizona State University (Publisher)

Created2011

Multiscale modeling of mechanical shock behavior of environmentally-benign lead-free solders in electronic packaging

Description

With the increasing focus on developing environmentally benign electronic packages, lead-free solder alloys have received a great deal of attention. Mishandling of packages, during manufacture, assembly, or by the user may cause failure of solder joint. A fundamental understanding of the behavior of lead-free solders under mechanical shock conditions is…

With the increasing focus on developing environmentally benign electronic packages, lead-free solder alloys have received a great deal of attention. Mishandling of packages, during manufacture, assembly, or by the user may cause failure of solder joint. A fundamental understanding of the behavior of lead-free solders under mechanical shock conditions is lacking. Reliable experimental and numerical analysis of lead-free solder joints in the intermediate strain rate regime need to be investigated. This dissertation mainly focuses on exploring the mechanical shock behavior of lead-free tin-rich solder alloys via multiscale modeling and numerical simulations. First, the macroscopic stress/strain behaviors of three bulk lead-free tin-rich solders were tested over a range of strain rates from 0.001/s to 30/s. Finite element analysis was conducted to determine appropriate specimen geometry that could reach a homogeneous stress/strain field and a relatively high strain rate. A novel self-consistent true stress correction method is developed to compensate the inaccuracy caused by the triaxial stress state at the post-necking stage. Then the material property of micron-scale intermetallic was examined by micro-compression test. The accuracy of this measure is systematically validated by finite element analysis, and empirical adjustments are provided. Moreover, the interfacial property of the solder/intermetallic interface is investigated, and a continuum traction-separation law of this interface is developed from an atomistic-based cohesive element method. The macroscopic stress/strain relation and microstructural properties are combined together to form a multiscale material behavior via a stochastic approach for both solder and intermetallic. As a result, solder is modeled by porous plasticity with random voids, and intermetallic is characterized as brittle material with random vulnerable region. Thereafter, the porous plasticity fracture of the solders and the brittle fracture of the intermetallics are coupled together in one finite element model. Finally, this study yields a multiscale model to understand and predict the mechanical shock behavior of lead-free tin-rich solder joints. Different fracture patterns are observed for various strain rates and/or intermetallic thicknesses. The predictions have a good agreement with the theory and experiments.

ContributorsFei, Huiyang (Author) / Jiang, Hanqing (Thesis advisor) / Chawla, Nikhilesh (Thesis advisor) / Tasooji, Amaneh (Committee member) / Mobasher, Barzin (Committee member) / Rajan, Subramaniam D. (Committee member) / Arizona State University (Publisher)

Created2011

Smart compilers for reliable and power-efficient embedded computing

Description

Thanks to continuous technology scaling, intelligent, fast and smaller digital systems are now available at affordable costs. As a result, digital systems have found use in a wide range of application areas that were not even imagined before, including medical (e.g., MRI, remote or post-operative monitoring devices, etc.), automotive (e.g.,…

Thanks to continuous technology scaling, intelligent, fast and smaller digital systems are now available at affordable costs. As a result, digital systems have found use in a wide range of application areas that were not even imagined before, including medical (e.g., MRI, remote or post-operative monitoring devices, etc.), automotive (e.g., adaptive cruise control, anti-lock brakes, etc.), security systems (e.g., residential security gateways, surveillance devices, etc.), and in- and out-of-body sensing (e.g., capsule swallowed by patients measuring digestive system pH, heart monitors, etc.). Such computing systems, which are completely embedded within the application, are called embedded systems, as opposed to general purpose computing systems. In the design of such embedded systems, power consumption and reliability are indispensable system requirements. In battery operated portable devices, the battery is the single largest factor contributing to device cost, weight, recharging time, frequency and ultimately its usability. For example, in the Apple iPhone 4 smart-phone, the battery is $40\%$ of the device weight, occupies $36\%$ of its volume and allows only $7$ hours (over 3G) of talk time. As embedded systems find use in a range of sensitive applications, from bio-medical applications to safety and security systems, the reliability of the computations performed becomes a crucial factor. At our current technology-node, portable embedded systems are prone to expect failures due to soft errors at the rate of once-per-year; but with aggressive technology scaling, the rate is predicted to increase exponentially to once-per-hour. Over the years, researchers have been successful in developing techniques, implemented at different layers of the design-spectrum, to improve system power efficiency and reliability. Among the layers of design abstraction, I observe that the interface between the compiler and processor micro-architecture possesses a unique potential for efficient design optimizations. A compiler designer is able to observe and analyze the application software at a finer granularity; while the processor architect analyzes the system output (power, performance, etc.) for each executed instruction. At the compiler micro-architecture interface, if the system knowledge at the two design layers can be integrated, design optimizations at the two layers can be modified to efficiently utilize available resources and thereby achieve appreciable system-level benefits. To this effect, the thesis statement is that, ``by merging system design information at the compiler and micro-architecture design layers, smart compilers can be developed, that achieve reliable and power-efficient embedded computing through: i) Pure compiler techniques, ii) Hybrid compiler micro-architecture techniques, and iii) Compiler-aware architectures''. In this dissertation demonstrates, through contributions in each of the three compiler-based techniques, the effectiveness of smart compilers in achieving power-efficiency and reliability in embedded systems.

ContributorsJeyapaul, Reiley (Author) / Shrivastava, Aviral (Thesis advisor) / Vrudhula, Sarma (Committee member) / Clark, Lawrence (Committee member) / Colbourn, Charles (Committee member) / Arizona State University (Publisher)

Created2012

Threshold logic properties and methods: applications to post-CMOS design automation and gene regulation modeling

Description

Threshold logic has been studied by at least two independent group of researchers. One group of researchers studied threshold logic with the intention of building threshold logic circuits. The earliest research to this end was done in the 1960's. The major work at that time focused on studying mathematical properties…

Threshold logic has been studied by at least two independent group of researchers. One group of researchers studied threshold logic with the intention of building threshold logic circuits. The earliest research to this end was done in the 1960's. The major work at that time focused on studying mathematical properties of threshold logic as no efficient circuit implementations of threshold logic were available. Recently many post-CMOS (Complimentary Metal Oxide Semiconductor) technologies that implement threshold logic have been proposed along with efficient CMOS implementations. This has renewed the effort to develop efficient threshold logic design automation techniques. This work contributes to this ongoing effort. Another group studying threshold logic did so, because the building block of neural networks - the Perceptron, is identical to the threshold element implementing a threshold function. Neural networks are used for various purposes as data classifiers. This work contributes tangentially to this field by proposing new methods and techniques to study and analyze functions implemented by a Perceptron After completion of the Human Genome Project, it has become evident that most biological phenomenon is not caused by the action of single genes, but due to the complex interaction involving a system of genes. In recent times, the `systems approach' for the study of gene systems is gaining popularity. Many different theories from mathematics and computer science has been used for this purpose. Among the systems approaches, the Boolean logic gene model has emerged as the current most popular discrete gene model. This work proposes a new gene model based on threshold logic functions (which are a subset of Boolean logic functions). The biological relevance and utility of this model is argued illustrated by using it to model different in-vivo as well as in-silico gene systems.

ContributorsLinge Gowda, Tejaswi (Author) / Vrudhula, Sarma (Thesis advisor) / Shrivastava, Aviral (Committee member) / Chatha, Karamvir (Committee member) / Kim, Seungchan (Committee member) / Arizona State University (Publisher)

Created2012

ASU Electronic Theses and Dissertations

Filtering by

System-level synthesis of dataplane subsystems for MPSoCs

Compiler and runtime for memory management on software managed manycore processors

Early age characterization and microstructural features of sustainable binder systems for concrete

StreamWorks: an energy-efficient embedded co-processor for stream computing

Sustainable cloud computing

Characterization and modeling of moisture flow through hydrating cement-based materials under early-age drying and shrinkage conditions

Determining the integrity of applications and operating systems using remote and local attesters

Multiscale modeling of mechanical shock behavior of environmentally-benign lead-free solders in electronic packaging

Smart compilers for reliable and power-efficient embedded computing

Threshold logic properties and methods: applications to post-CMOS design automation and gene regulation modeling