Matching Items (46)
Filtering by

Clear all filters

155975-Thumbnail Image.png
Description
Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods

Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods such as testing and monitoring of the CPS are considered in the industry. The formal representation of the CPS requirements is a challenging task. In addition, checking the system outputs with respect to requirements is a computationally complex problem. In this dissertation, these problems for the verification of CPS are addressed. The first method provides a formal requirement analysis framework which can find logical issues in the requirements and help engineers to correct the requirements. Also, a method is provided to detect tests which vacuously satisfy the requirement because of the requirement structure. This method is used to improve the test generation framework for CPS. Finally, two runtime verification algorithms are developed for off-line/on-line monitoring with respect to real-time requirements. These monitoring algorithms are computationally efficient, and they can be used in practical applications for monitoring CPS with low runtime overhead.
ContributorsDokhanchi, Adel (Author) / Fainekos, Georgios (Thesis advisor) / Lee, Yann-Hang (Committee member) / Sarjoughian, Hessam S. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2017
156003-Thumbnail Image.png
Description
Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to

Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to software engineering practices, incremental model design is required for complex system design. As a result, models at early increments are significantly simpler relative to real systems. While experimenting (verification or validation) on models at early increments are computationally less demanding, the results of these experiments are less trustworthy and less rewarding. At any increment of design, a set of tools and technique are required for controlling the complexity of models and experimentation.

A complex system such as Network-on-Chip (NoC) may benefit from incremental design stages. Current design methods for NoC rely on multiple models developed using various modeling frameworks. It is useful to develop frameworks that can formalize the relationships among these models. Fine-grain models are derived using their coarse-grain counterparts. Moreover, validation and verification capability at various design stages enabled through disciplined model conversion is very beneficial.

In this research, Multiresolution Modeling (MRM) is used for system level design of NoC. MRM aids in creating a family of models at different levels of scale and complexity with well-formed relationships. In addition, a variant of the Discrete Event System Specification (DEVS) formalism is proposed which supports model checking. Hierarchical models of Network-on-Chip components may be created at different resolutions while each model can be validated using discrete-event simulation and verified via state exploration. System property expressions are defined in the DEVS language and developed as Transducers which can be applied seamlessly for model checking and simulation purposes.

Multiresolution Modeling with verification and validation capabilities of this framework complement one another. MRM manages the scale and complexity of models which in turn can reduces V&V time and effort and conversely the V&V helps ensure correctness of models at multiple resolutions. This framework is realized through extending the DEVS-Suite simulator and its applicability demonstrated for exemplar NoC models.
ContributorsGholami, Soroosh (Author) / Sarjoughian, Hessam S. (Thesis advisor) / Fainekos, Georgios (Committee member) / Ogras, Umit Y. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2017
155944-Thumbnail Image.png
Description
Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM

Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM called scratchpad memories or SPMs. SPMs consume significantly less area, and are more energy-efficient per access than caches, and therefore make the design of on-chip memories much simpler. Unlike caches, which fetch data from memories automatically, an SPM requires explicit instructions for data transfers. SPM-only architectures are thus named as software managed manycore (SMM), since the data movements of such architectures rely on software. SMM processors have been widely used in different areas, such as embedded computing, network processing, or even high performance computing. While SMM processors provide a low-power platform, the hardware alone does not guarantee power efficiency, if applications on such processors deliver low performance. Efficient software techniques are therefore required. A big body of management techniques for SMM architectures are compiler-directed, as inserting data movement operations by hand forces programmers to trace flow of data, which can be error-prone and sometimes difficult if not impossible. This thesis develops compiler-directed techniques to manage data transfers for embedded applications on SMMs efficiently. The techniques analyze and find out the proper program points and insert data movement instructions accordingly. The techniques manage code, stack and heap data of applications, and reduce execution time by 14%, 52% and 80% respectively compared to their predecessors on typical embedded applications. On top of managing local data, a technique is also developed for shared data in SMM architectures. Experimental results show it achieves more than 2X speedup than the previous technique on average.
ContributorsCai, Jian (Author) / Shrivastava, Aviral (Thesis advisor) / Wu, Carole (Committee member) / Ren, Fengbo (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)
Created2017
156791-Thumbnail Image.png
Description
General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs

General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions.

Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%.

Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications.

Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future.

In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.
ContributorsArunkumar, Akhil (Author) / Wu, Carole-Jean (Thesis advisor) / Shrivastava, Aviral (Committee member) / Lee, Yann-Hang (Committee member) / Bolotin, Evgeny (Committee member) / Arizona State University (Publisher)
Created2018
156829-Thumbnail Image.png
Description
Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number

of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can

Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number

of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-off-the-shelf microprocessors with adaptive and scalable reliability.

Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave the rest as unprotected. To improve the efficacy of instruction-level redundancy-based approaches, we developed several error detection and error correction schemes. nZDC (near Zero silent

Data Corruption) is an instruction duplication scheme which protects the execution of whole application. Rather than detecting errors on register operands of memory and control flow operations, nZDC checks the results of such operations. nZDC en

sures the correct execution of memory write instruction by reloading stored value and checking it against redundantly computed value. nZDC also introduces a novel control flow checking mechanism which replicates compare and branch instructions and

detects both wrong direction branches as well as unwanted jumps. Fault injection experiments show that nZDC can improve the error coverage of the state-of-the-art schemes by more than 10x, without incurring any more performance penalty. Further

more, we introduced two error recovery solutions. InCheck is our backward recovery solution which makes light-weighted error-free checkpoints at the basic block granularity. In the case of error, InCheck reverts the program execution to the beginning of last executed basic block and resumes the execution by the aid of preserved in formation. NEMESIS is our forward recovery scheme which runs three versions of computation and detects errors by checking the results of all memory write and branch

operations. In the case of a mismatch, NEMESIS diagnosis routine decides if the error is recoverable. If yes, NEMESIS recovery routine reverts the effect of error from the program state and resumes program normal execution from the error detection

point.
ContributorsDidehban, Moslem (Author) / Shrivastava, Aviral (Thesis advisor) / Wu, Carole-Jean (Committee member) / Clark, Lawrence (Committee member) / Mahlke, Scott (Committee member) / Arizona State University (Publisher)
Created2018
157100-Thumbnail Image.png
Description
One of the main goals of computer architecture design is to improve performance without much increase in the power consumption. It cannot be achieved by adding increasingly complex intelligent schemes in the hardware, since they will become increasingly less power-efficient. Therefore, parallelism comes up as the solution. In fact, the

One of the main goals of computer architecture design is to improve performance without much increase in the power consumption. It cannot be achieved by adding increasingly complex intelligent schemes in the hardware, since they will become increasingly less power-efficient. Therefore, parallelism comes up as the solution. In fact, the irrevocable trend of computer design in near future is still to keep increasing the number of cores while reducing the operating frequency. However, it is not easy to scale number of cores. One important challenge is that existing cores consume too much power. Another challenge is that cache-based memory hierarchy poses a serious limitation due to the rapidly increasing demand of area and power for coherence maintenance.

In this dissertation, opportunities to resolve the aforementioned issues were explored in two aspects.

Firstly, the possibility of removing hardware cache altogether, and replacing it with scratchpad memory with software management was explored. Scratchpad memory consumes much less power than caches. However, as data management logic is completely shifted to Software, how to reduce software overhead is challenging. This thesis presents techniques to manage scratchpad memory judiciously by exploiting application semantics and knowledge of data access patterns, thereby enabling optimization of data movement across the memory hierarchy. Experimental results show that the optimization was able to reduce stack data management overhead by 13X, produce better code mapping in more than 80% of the case, and improve performance by 83% in heap management.

Secondly, the possibility of using software branch hinting to replace hardware branch prediction to completely eliminate power consumption on corresponding hardware components was explored. As branch predictor is removed from hardware, software logic is responsible for reducing branch penalty. Techniques to minimize the branch penalty by optimizing branch hint placement were proposed, which can reduce branch penalty by 35.4% over the state-of-the-art.
ContributorsLu, Jing (Author) / Shrivastava, Aviral (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Wu, Carole-Jean (Committee member) / Doupe, Adam (Committee member) / Arizona State University (Publisher)
Created2019
156948-Thumbnail Image.png
Description
The Internet of Things ecosystem has spawned a wide variety of embedded real-time systems that complicate the identification and resolution of bugs in software. The methods of concurrent checkpoint provide a means to monitor the application state with the ability to replay the execution on like hardware and software,

The Internet of Things ecosystem has spawned a wide variety of embedded real-time systems that complicate the identification and resolution of bugs in software. The methods of concurrent checkpoint provide a means to monitor the application state with the ability to replay the execution on like hardware and software, without holding off and delaying the execution of application threads. In this thesis, it is accomplished by monitoring physical memory of the application using a soft-dirty page tracker and measuring the various types of overhead when employing concurrent checkpointing. The solution presented is an advancement of the Checkpoint and Replay In Userspace (CRIU) thereby eliminating the large stalls and parasitic operation for each successive checkpoint. Impact and performance is measured using the Parsec 3.0 Benchmark suite and 4.11.12-rt16+ Linux kernel on a MinnowBoard Turbot Quad-Core board.
ContributorsPrinke, Michael L (Author) / Lee, Yann-Hang (Thesis advisor) / Shrivastava, Aviral (Committee member) / Zhao, Ming (Committee member) / Arizona State University (Publisher)
Created2018
153668-Thumbnail Image.png
Description
Error correcting systems have put increasing demands on system designers, both due to increasing error correcting requirements and higher throughput targets. These requirements have led to greater silicon area, power consumption and have forced system designers to make trade-offs in Error Correcting Code (ECC) functionality. Solutions to increase the efficiency

Error correcting systems have put increasing demands on system designers, both due to increasing error correcting requirements and higher throughput targets. These requirements have led to greater silicon area, power consumption and have forced system designers to make trade-offs in Error Correcting Code (ECC) functionality. Solutions to increase the efficiency of ECC systems are very important to system designers and have become a heavily researched area.

Many such systems incorporate the Bose-Chaudhuri-Hocquenghem (BCH) method of error correcting in a multi-channel configuration. BCH is a commonly used code because of its configurability, low storage overhead, and low decoding requirements when compared to other codes. Multi-channel configurations are popular with system designers because they offer a straightforward way to increase bandwidth. The ECC hardware is duplicated for each channel and the throughput increases linearly with the number of channels. The combination of these two technologies provides a configurable and high throughput ECC architecture.

This research proposes a new method to optimize a BCH error correction decoder in multi-channel configurations. In this thesis, I examine how error frequency effects the utilization of BCH hardware. Rather than implement each decoder as a single pipeline of independent decoding stages, the channels are considered together and served by a pool of decoding stages. Modified hardware blocks for handling common cases are included and the pool is sized based on an acceptable, but negligible decrease in performance.
ContributorsDill, Russell (Author) / Shrivastava, Aviral (Thesis advisor) / Oh, Hyunok (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2015
154657-Thumbnail Image.png
Description
Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a

Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a metric that defines the probability of system-failure (reliability) through analytical models -- is the most effective mechanism for our current estimation and early design space exploration needs. Previous vulnerability estimation tools are based around the Sim-Alpha simulator which has been to shown to have several limitations. In this thesis, I present gemV: an accurate and comprehensive vulnerability estimation tool based on gem5. Gem5 is a popular cycle-accurate micro-architectural simulator that can model several different processor models in close to real hardware form. GemV can be used for fast and early design space exploration and also evaluate the protection afforded by commodity processors. gemV is comprehensive, since it models almost all sequential components of the processor. gemV is accurate because of fine-grain vulnerability tracking, accurate vulnerability modeling of squashed instructions, and accurate vulnerability modeling of shared data structures in gem5. gemV has been thoroughly validated against extensive fault injection experiments and achieves a 97\% accuracy with 95\% confidence. A micro-architect can use gemV to discover micro-architectural variants of a processor that minimize vulnerability for allowed performance penalty. A software developer can use gemV to explore the performance-vulnerability trade-off by choosing different algorithms and compiler optimizations, while the system designer can use gemV to explore the performance-vulnerability trade-offs of choosing different Insruction Set Architectures (ISA).
ContributorsTanikella, Srinivas Karthik (Author) / Shrivastava, Aviral (Thesis advisor) / Bazzi, Rida (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2016
155034-Thumbnail Image.png
Description
The availability of a wide range of general purpose as well as accelerator cores on

modern smartphones means that a significant number of applications can be executed

on a smartphone simultaneously, resulting in an ever increasing demand on the memory

subsystem. While the increased computation capability is intended for improving

user experience, memory requests

The availability of a wide range of general purpose as well as accelerator cores on

modern smartphones means that a significant number of applications can be executed

on a smartphone simultaneously, resulting in an ever increasing demand on the memory

subsystem. While the increased computation capability is intended for improving

user experience, memory requests from each concurrent application exhibit unique

memory access patterns as well as specific timing constraints. If not considered, this

could lead to significant memory contention and result in lowered user experience.

This work first analyzes the impact of memory degradation caused by the interference

at the memory system for a broad range of commonly-used smartphone applications.

The real system characterization results show that smartphone applications,

such as web browsing and media playback, suffer significant performance degradation.

This is caused by shared resource contention at the application processor’s last-level

cache, the communication fabric, and the main memory.

Based on the detailed characterization results, rest of this thesis focuses on the

design of an effective memory interference mitigation technique. Since web browsing,

being one of the most commonly-used smartphone applications and represents many

html-based smartphone applications, my thesis focuses on meeting the performance

requirement of a web browser on a smartphone in the presence of background processes

and co-scheduled applications. My thesis proposes a light-weight user space frequency

governor to mitigate the degradation caused by interfering applications, by predicting

the performance and power consumption of web browsing. The governor selects an

optimal energy-efficient frequency setting periodically by using the statically-trained

performance and power models with dynamically-varying architecture and system

conditions, such as the memory access intensity of background processes and/or coscheduled applications, and temperature of cores. The governor has been extensively evaluated on a Nexus 5 smartphone over a diverse range of mobile workloads. By

operating at the most energy-efficient frequency setting in the presence of interference,

energy efficiency is improved by as much as 35% and with an average of 18% compared

to the existing interactive governor, while maintaining the satisfactory performance

of web page loading under 3 seconds.
ContributorsShingari, Davesh (Author) / Wu, Carole-Jean (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2016