Matching Items (5)
Filtering by

Clear all filters

149518-Thumbnail Image.png
Description
Embedded Networked Systems (ENS) consist of various devices, which are embedded into physical objects (e.g., home appliances, vehicles, buidlings, people). With rapid advances in processing and networking technologies, these devices can be fully connected and pervasive in the environment. The devices can interact with the physical world, collaborate to share

Embedded Networked Systems (ENS) consist of various devices, which are embedded into physical objects (e.g., home appliances, vehicles, buidlings, people). With rapid advances in processing and networking technologies, these devices can be fully connected and pervasive in the environment. The devices can interact with the physical world, collaborate to share resources, and provide context-aware services. This dissertation focuses on collaboration in ENS to provide smart services. However, there are several challenges because the system must be - scalable to a huge number of devices; robust against noise, loss and failure; and secure despite communicating with strangers. To address these challenges, first, the dissertation focuses on designing a mobile gateway called Mobile Edge Computing Device (MECD) for Ubiquitous Sensor Networks (USN), a type of ENS. In order to reduce communication overhead with the server, an MECD is designed to provide local and distributed management of a network and data associated with a moving object (e.g., a person, car, pet). Furthermore, it supports collaboration with neighboring MECDs. The MECD is developed and tested for monitoring containers during shipment from Singapore to Taiwan and reachability to the remote server was a problem because of variance in connectivity (caused by high temperature variance) and high interference. The unreachability problem is addressed by using a mesh networking approach for collaboration of MECDs in sending data to a server. A hierarchical architecture is proposed in this regard to provide multi-level collaboration using dynamic mesh networks of MECDs at one layer. The mesh network is evaluated for an intelligent container scenario and results show complete connectivity with the server for temperature range from 25°C to 65°C. Finally, the authentication of mobile and pervasive devices in ENS for secure collaboration is investigated. This is a challenging problem because mutually unknown devices must be verified without knowledge of each other's identity. A self-organizing region-based authentication technique is proposed that uses environmental sound to autonomously verify if two devices are within the same region. The experimental results show sound could accurately authenticate devices within a small region.
ContributorsKim, Su-jin (Author) / Gupta, Sandeep K. S. (Thesis advisor) / Dasgupta, Partha (Committee member) / Davulcu, Hasan (Committee member) / Lee, Yann-Hang (Committee member) / Arizona State University (Publisher)
Created2010
154003-Thumbnail Image.png
Description
Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation

Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.
ContributorsSong, Young Wn (Author) / Lee, Yann-Hang (Thesis advisor) / Shrivastava, Aviral (Committee member) / Fainekos, Georgios (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)
Created2015
155975-Thumbnail Image.png
Description
Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods

Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods such as testing and monitoring of the CPS are considered in the industry. The formal representation of the CPS requirements is a challenging task. In addition, checking the system outputs with respect to requirements is a computationally complex problem. In this dissertation, these problems for the verification of CPS are addressed. The first method provides a formal requirement analysis framework which can find logical issues in the requirements and help engineers to correct the requirements. Also, a method is provided to detect tests which vacuously satisfy the requirement because of the requirement structure. This method is used to improve the test generation framework for CPS. Finally, two runtime verification algorithms are developed for off-line/on-line monitoring with respect to real-time requirements. These monitoring algorithms are computationally efficient, and they can be used in practical applications for monitoring CPS with low runtime overhead.
ContributorsDokhanchi, Adel (Author) / Fainekos, Georgios (Thesis advisor) / Lee, Yann-Hang (Committee member) / Sarjoughian, Hessam S. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2017
156791-Thumbnail Image.png
Description
General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs

General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions.

Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%.

Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications.

Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future.

In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.
ContributorsArunkumar, Akhil (Author) / Wu, Carole-Jean (Thesis advisor) / Shrivastava, Aviral (Committee member) / Lee, Yann-Hang (Committee member) / Bolotin, Evgeny (Committee member) / Arizona State University (Publisher)
Created2018
190906-Thumbnail Image.png
Description
Graphic Processing Units (GPUs) have become a key enabler of the big-data revolution, functioning as defacto co-processors to accelerate large-scale computation. As the GPU programming stack and tool support have matured, the technology has alsobecome accessible to programmers. However, optimizing programs to run efficiently on GPUs requires developers to have

Graphic Processing Units (GPUs) have become a key enabler of the big-data revolution, functioning as defacto co-processors to accelerate large-scale computation. As the GPU programming stack and tool support have matured, the technology has alsobecome accessible to programmers. However, optimizing programs to run efficiently on GPUs requires developers to have both detailed understanding of the application logic and significant knowledge of parallel programming and GPU architectures. This dissertation proposes GEVO, a tool for automatically tuning the performance of GPU kernels in the LLVM representation to meet desired criteria. GEVO uses population-based search to find edits to programs compiled to LLVM-IR which improves performance on desired criteria and retains required functionality. The evaluation of GEVO on the Rodinia benchmark suite demonstrates many runtime optimization techniques. One of the key insights is that semantic relaxation enables GEVO to discover these optimizations that are usually prohibited by the compiler. GEVO also explores many other optimizations, including architecture- and application-specific. A follow-up evaluation of three bioinformatics applications at their different stages of development suggests that GEVO can optimize programs as well as human experts, sometimes even into a code base that is beyond a programmer’s reach. Furthermore, to unshackle the constraint of GEVO in optimizing neural network (NN) models, GEVO-ML is proposed by extending the representation capability of GEVO, where NN models and the training/prediction process are uniformly represented in a single intermediate language. An evaluation of GEVO-ML shows that GEVO-ML can optimize network models similar to how human developers improve model design, for example, by changing learning rates or pruning non-essential parameters. These results showcase the potential of automated program optimization tools to both reduce the optimization burden for researchers and provide new insights for GPU experts.
ContributorsLiou, Jhe-Yu (Author) / Forrest, Stephanie (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Lee, Yann-Hang (Committee member) / Weimer, Westley (Committee member) / Arizona State University (Publisher)
Created2023