Matching Items (22)
Filtering by

Clear all filters

168845-Thumbnail Image.png
Description
Ethernet based technologies are emerging as the ubiquitous de facto form of communication due to their interoperability, capacity, cost, and reliability. Traditional Ethernet is designed with the goal of delivering best effort services. However, several real time and control applications require more precise deterministic requirements and Ultra Low Latency (ULL),

Ethernet based technologies are emerging as the ubiquitous de facto form of communication due to their interoperability, capacity, cost, and reliability. Traditional Ethernet is designed with the goal of delivering best effort services. However, several real time and control applications require more precise deterministic requirements and Ultra Low Latency (ULL), that Ethernet cannot be used for. Current Industrial Automation and Control Systems (IACS) applications use semi-proprietary technologies that provide deterministic communication behavior for sporadic and periodic traffic, but can lead to closed systems that do not interoperate effectively. The convergence between the informational and operational technologies in modern industrial control networks cannot be achieved using traditional Ethernet. Time Sensitive Networking (TSN) is a suite of IEEE standards designed by augmenting traditional Ethernet with real time deterministic properties ideal for Digital Signal Processing (DSP) applications. Similarly, Deterministic Networking (DetNet) is a Internet Engineering Task Force (IETF) standardization that enhances the network layer with the required deterministic properties needed for IACS applications. This dissertation provides an in-depth survey and literature review on both standards/research and 5G related material on ULL. Recognizing the limitations of several features of the standards, this dissertation provides an empirical evaluation of these approaches and presents novel enhancements to the shapers and schedulers involved in TSN. More specifically, this dissertation investigates Time Aware Shaper (TAS), Asynchronous Traffic Shaper (ATS), and Cyclic Queuing and Forwarding (CQF) schedulers. Moreover, the IEEE 802.1Qcc, centralized management and control, and the IEEE 802.1Qbv can be used to manage and control scheduled traffic streams with periodic properties along with best-effort traffic on the same network infrastructure. Both the centralized network/distributed user model (hybrid model) and the fully-distributed (decentralized) IEEE 802.1Qcc model are examined on a typical industrial control network with the goal of maximizing scheduled traffic streams. Finally, since industrial applications and cyber-physical systems require timely delivery, any channel or node faults can cause severe disruption to the operational continuity of the application. Therefore, the IEEE 802.1CB, Frame Replication and Elimination for Reliability (FRER), is examined and tested using machine learning models to predict faulty scenarios and issue remedies seamlessly.
ContributorsNasrallah, Ahmed (Author) / Reisslein, Martin (Thesis advisor) / Syrotiuk, Violet R. (Committee member) / LiKamWa, Robert (Committee member) / Thyagaturu, Akhilesh (Committee member) / Arizona State University (Publisher)
Created2022
190757-Thumbnail Image.png
Description
Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain

Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain from another. Bridging this gap is the key to unlocking new visual computing capabilities for end applications in commercial photography, industrial inspection, and robotics. This thesis explores avenues where hardware-software co-design of image sensors can be leveraged to replace conventional hardware components in an imaging system with software for enhanced reconfigurability. As a result, the user can program the image sensor in a way best suited to the end application. This is referred to as software-defined imaging (SDI), where image sensor behavior can be altered by the system software depending on the user's needs. The scope of this thesis covers the development and deployment of SDI algorithms for low-power computer vision. Strategies for sparse spatial sampling have been developed in this thesis for power optimization of the vision sensor. This dissertation shows how a hardware-compatible state-of-the-art object tracker can be coupled with a Kalman filter for energy gains at the sensor level. Extensive experiments reveal how adaptive spatial sampling of image frames with this hardware-friendly framework offers attractive energy-accuracy tradeoffs. Another thrust of this thesis is to demonstrate the benefits of reinforcement learning in this research avenue. A major finding reported in this dissertation shows how neural-network-based reinforcement learning can be exploited for the adaptive subsampling framework to achieve improved sampling performance, thereby optimizing the energy efficiency of the image sensor. The last thrust of this thesis is to leverage emerging event-based SDI technology for building a low-power navigation system. A homography estimation pipeline has been proposed in this thesis which couples the right data representation with a differential scale-invariant feature transform (SIFT) module to extract rich visual cues from event streams. Positional encoding is leveraged with a multilayer perceptron (MLP) network to get robust homography estimation from event data.
ContributorsIqbal, Odrika (Author) / Jayasuriya, Suren (Thesis advisor) / Spanias, Andreas (Thesis advisor) / LiKamWa, Robert (Committee member) / Owens, Chris (Committee member) / Arizona State University (Publisher)
Created2023
190906-Thumbnail Image.png
Description
Graphic Processing Units (GPUs) have become a key enabler of the big-data revolution, functioning as defacto co-processors to accelerate large-scale computation. As the GPU programming stack and tool support have matured, the technology has alsobecome accessible to programmers. However, optimizing programs to run efficiently on GPUs requires developers to have

Graphic Processing Units (GPUs) have become a key enabler of the big-data revolution, functioning as defacto co-processors to accelerate large-scale computation. As the GPU programming stack and tool support have matured, the technology has alsobecome accessible to programmers. However, optimizing programs to run efficiently on GPUs requires developers to have both detailed understanding of the application logic and significant knowledge of parallel programming and GPU architectures. This dissertation proposes GEVO, a tool for automatically tuning the performance of GPU kernels in the LLVM representation to meet desired criteria. GEVO uses population-based search to find edits to programs compiled to LLVM-IR which improves performance on desired criteria and retains required functionality. The evaluation of GEVO on the Rodinia benchmark suite demonstrates many runtime optimization techniques. One of the key insights is that semantic relaxation enables GEVO to discover these optimizations that are usually prohibited by the compiler. GEVO also explores many other optimizations, including architecture- and application-specific. A follow-up evaluation of three bioinformatics applications at their different stages of development suggests that GEVO can optimize programs as well as human experts, sometimes even into a code base that is beyond a programmer’s reach. Furthermore, to unshackle the constraint of GEVO in optimizing neural network (NN) models, GEVO-ML is proposed by extending the representation capability of GEVO, where NN models and the training/prediction process are uniformly represented in a single intermediate language. An evaluation of GEVO-ML shows that GEVO-ML can optimize network models similar to how human developers improve model design, for example, by changing learning rates or pruning non-essential parameters. These results showcase the potential of automated program optimization tools to both reduce the optimization burden for researchers and provide new insights for GPU experts.
ContributorsLiou, Jhe-Yu (Author) / Forrest, Stephanie (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Lee, Yann-Hang (Committee member) / Weimer, Westley (Committee member) / Arizona State University (Publisher)
Created2023
189373-Thumbnail Image.png
Description
Efficient visual sensing plays a pivotal role in enabling high-precision applications in augmented reality and low-power Internet of Things (IoT) devices. This dissertation addresses the primary challenges that hinder energy efficiency in visual sensing: the bottleneck of pixel traffic across camera and memory interfaces and the energy-intensive analog readout process

Efficient visual sensing plays a pivotal role in enabling high-precision applications in augmented reality and low-power Internet of Things (IoT) devices. This dissertation addresses the primary challenges that hinder energy efficiency in visual sensing: the bottleneck of pixel traffic across camera and memory interfaces and the energy-intensive analog readout process in image sensors. To overcome the bottleneck of pixel traffic, this dissertation proposes a visual sensing pipeline architecture that enables application developers to dynamically adapt the spatial resolution and update rates for specific regions within the scene. By selectively capturing and processing high-resolution frames only where necessary, the system significantly reduces energy consumption associated with memory traffic. This is achieved by encoding only the relevant pixels from the commercial image sensors with standard raster-scan pixel read-out patterns, thus minimizing the data stored in memory. The stored rhythmic pixel region stream is decoded into traditional frame-based representations, enabling seamless integration into existing video pipelines. Moreover, the system includes runtime support that allows flexible specification of the region labels, giving developers fine-grained control over the resolution adaptation process. Experimental evaluations conducted on a Xilinx Field Programmable Gate Array (FPGA) platform demonstrate substantial reductions of 43-64% in interface traffic, while maintaining controllable task accuracy. In addition to the pixel traffic bottleneck, the dissertation tackles the energy intensive analog readout process in image sensors. To address this, the dissertation proposes aggressive scaling of the analog voltage supplied to the camera. Extensive characterization on off-the-shelf sensors demonstrates that analog voltage scaling can significantly reduce sensor power, albeit at the expense of image quality. To mitigate this trade-off, this research develops a pipeline that allows application developers to adapt the sensor voltage on a frame-by-frame basis. A voltage controller is integrated into the existing Raspberry Pi (RPi) based video streaming pipeline, generating the sensor voltage. On top of that, the system provides a software interface for vision applications to specify the desired voltage levels. Evaluation of the system across a range of voltage scaling policies on popular vision tasks demonstrates that the technique can deliver up to 73% sensor power savings while maintaining reasonable task fidelity.
ContributorsKodukula, Venkatesh (Author) / LiKamWa, Robert (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Brunhaver, John (Committee member) / Nambi, Akshay (Committee member) / Arizona State University (Publisher)
Created2023
168714-Thumbnail Image.png
Description
Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices.

Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices. For instance, the intersection management of Connected Autonomous Vehicles (CAVs) requires running computationally intensive object recognition algorithms on low-power traffic cameras. This dissertation aims to study the effect of a dynamic hardware and software approach to address this issue. Characteristics of real-world applications can facilitate this dynamic adjustment and reduce the computation. Specifically, this dissertation starts with a dynamic hardware approach that adjusts itself based on the toughness of input and extracts deeper features if needed. Next, an adaptive learning mechanism has been studied that use extracted feature from previous inputs to improve system performance. Finally, a system (ARGOS) was proposed and evaluated that can be run on embedded systems while maintaining the desired accuracy. This system adopts shallow features at inference time, but it can switch to deep features if the system desires a higher accuracy. To improve the performance, ARGOS distills the temporal knowledge from deep features to the shallow system. Moreover, ARGOS reduces the computation furthermore by focusing on regions of interest. The response time and mean average precision are adopted for the performance evaluation to evaluate the proposed ARGOS system.
ContributorsFarhadi, Mohammad (Author) / Yang, Yezhou (Thesis advisor) / Vrudhula, Sarma (Committee member) / Wu, Carole-Jean (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)
Created2022
168629-Thumbnail Image.png
Description
With the rapid development of both hardware and software, mobile devices with their advantages in mobility, interactivity, and privacy have enabled various applications, including social networking, mixed reality, entertainment, authentication, and etc.In diverse forms such as smartphones, glasses, and watches, the number of mobile devices is expected to increase by

With the rapid development of both hardware and software, mobile devices with their advantages in mobility, interactivity, and privacy have enabled various applications, including social networking, mixed reality, entertainment, authentication, and etc.In diverse forms such as smartphones, glasses, and watches, the number of mobile devices is expected to increase by 1 billion per year in the future. These devices not only generate and exchange small data such as GPS data, but also large data including videos and point clouds. Such massive visual data presents many challenges for processing on mobile devices. First, continuously capturing and processing high resolution visual data is energy-intensive, which can drain the battery of a mobile device very quickly. Second, data offloading for edge or cloud computing is helpful, but users are afraid that their privacy can be exposed to malicious developers. Third, interactivity and user experience is degraded if mobile devices cannot process large scale visual data in real-time such as off-device high precision point clouds. To deal with these challenges, this work presents three solutions towards fine-grained control of visual data in mobile systems, revolving around two core ideas, enabling resolution-based tradeoffs and adopting split-process to protect visual data.In particular, this work introduces: (1) Banner media framework to remove resolution reconfiguration latency in the operating system for enabling seamless dynamic resolution-based tradeoffs; (2) LesnCap split-process application development framework to protect user's visual privacy against malicious data collection in cloud-based Augmented Reality (AR) applications by isolating the visual processing in a distinct process; (3) A novel voxel grid schema to enable adaptive sampling at the edge device that can sample point clouds flexibly for interactive 3D vision use cases across mobile devices and mobile networks. The evaluation in several mobile environments demonstrates that, by controlling visual data at a fine granularity, energy efficiency can be improved by 49% switching between resolutions, visual privacy can be protected through split-process with negligible overhead, and point clouds can be delivered at a high throughput meeting various requirements.Thus, this work can enable more continuous mobile vision applications for the future of a new reality.
ContributorsHu, Jinhan (Author) / LiKamWa, Robert (Thesis advisor) / Wu, Carole-Jean (Committee member) / Doupe, Adam (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2022
193693-Thumbnail Image.png
Description
Virtual reality (VR) provides significant opportunities for students to experience immersive education. In VR, students can travel to the international space station, or go through a science experiment at home. However, the current tactile feedback provided by these systems do not feel real. Controllers do not provide the same tactile

Virtual reality (VR) provides significant opportunities for students to experience immersive education. In VR, students can travel to the international space station, or go through a science experiment at home. However, the current tactile feedback provided by these systems do not feel real. Controllers do not provide the same tactile feedback experienced in the physical world. This dissertation aims to bridge the gap between the virtual and physical learning environments through the development of novel haptic devices capable of emulating tactile sensations found in physical science labs. My research explores haptic devices that can emulate the sensations of fluids in vessels within the virtual environment. Fluid handling is a cornerstone experience of science labs. I also explore how to emulate the handling of other science equipment. I describe and research on four novel devices. These are 1) SWISH: A shifting-weight interface of simulated hydrodynamics for haptic perception of virtual fluid vessels, 2) Geppetteau, 3) Vibr-eau, and 4) Pneutouch. SWISH simulates the sensation of virtual fluids in vessels using a rack and pinion mechanism, while Geppetteau employs a string-driven mechanism to provide haptic feedback for a variety of vessel shapes. Vibr-eau utilizes vibrotactile actuators in the vessel’s interior to emulate the behavior of virtual liquids. Finally, Pneutouch enables users to interact with virtual objects through pneumatic inflatables. Through systematic evaluations and comparisons with baseline comparisons, the usability and effectiveness of these haptic devices in enhancing virtual experiences is demonstrated. The development of these haptic mechanisms and interfaces represents a significant step towards creating transformative educational tools that provide customizable, hands-on learning environments in both Mixed (MR) and Virtual Reality (VR) - now called XR. This dissertation contributes to advancing the field of haptics for virtual education and lays the foundation for future research in immersive learning technologies.
ContributorsLiu, Frank (Author) / LiKamWa, Robert (Thesis advisor) / Lahey, Byron (Committee member) / Johnson-Glenberg, Mina (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2024
168720-Thumbnail Image.png
Description
Artificial intelligence (AI) has the potential to drive us towards a future in which all of humanity flourishes. It also comes with substantial risks of oppression and calamity. For example, social media platforms have knowingly and surreptitiously promoted harmful content, e.g., the rampant instances of disinformation and hate speech. Machine

Artificial intelligence (AI) has the potential to drive us towards a future in which all of humanity flourishes. It also comes with substantial risks of oppression and calamity. For example, social media platforms have knowingly and surreptitiously promoted harmful content, e.g., the rampant instances of disinformation and hate speech. Machine learning algorithms designed for combating hate speech were also found biased against underrepresented and disadvantaged groups. In response, researchers and organizations have been working to publish principles and regulations for the responsible use of AI. However, these conceptual principles also need to be turned into actionable algorithms to materialize AI for good. The broad aim of my research is to design AI systems that responsibly serve users and develop applications with social impact. This dissertation seeks to develop the algorithmic solutions for Socially Responsible AI (SRAI), a systematic framework encompassing the responsible AI principles and algorithms, and the responsible use of AI. In particular, it first introduces an interdisciplinary definition of SRAI and the AI responsibility pyramid, in which four types of AI responsibilities are described. It then elucidates the purpose of SRAI: how to bridge from the conceptual definitions to responsible AI practice through the three human-centered operations -- to Protect and Inform users, and Prevent negative consequences. They are illustrated in the social media domain given that social media has revolutionized how people live but has also contributed to the rise of many societal issues. The three representative tasks for each dimension are cyberbullying detection, disinformation detection and dissemination, and unintended bias mitigation. The means of SRAI is to develop responsible AI algorithms. Many issues (e.g., discrimination and generalization) can arise when AI systems are trained to improve accuracy without knowing the underlying causal mechanism. Causal inference, therefore, is intrinsically related to understanding and resolving these challenging issues in AI. As a result, this dissertation also seeks to gain an in-depth understanding of AI by looking into the precise relationships between causes and effects. For illustration, it introduces a recent work that applies deep learning to estimating causal effects and shows that causal learning algorithms can outperform traditional methods.
ContributorsCheng, Lu (Author) / Liu, Huan (Thesis advisor) / Varshney, Kush R. (Committee member) / Silva, Yasin N. (Committee member) / Wu, Carole-Jean (Committee member) / Candan, Kasim S. (Committee member) / Arizona State University (Publisher)
Created2022
156829-Thumbnail Image.png
Description
Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number

of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can

Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number

of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-off-the-shelf microprocessors with adaptive and scalable reliability.

Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave the rest as unprotected. To improve the efficacy of instruction-level redundancy-based approaches, we developed several error detection and error correction schemes. nZDC (near Zero silent

Data Corruption) is an instruction duplication scheme which protects the execution of whole application. Rather than detecting errors on register operands of memory and control flow operations, nZDC checks the results of such operations. nZDC en

sures the correct execution of memory write instruction by reloading stored value and checking it against redundantly computed value. nZDC also introduces a novel control flow checking mechanism which replicates compare and branch instructions and

detects both wrong direction branches as well as unwanted jumps. Fault injection experiments show that nZDC can improve the error coverage of the state-of-the-art schemes by more than 10x, without incurring any more performance penalty. Further

more, we introduced two error recovery solutions. InCheck is our backward recovery solution which makes light-weighted error-free checkpoints at the basic block granularity. In the case of error, InCheck reverts the program execution to the beginning of last executed basic block and resumes the execution by the aid of preserved in formation. NEMESIS is our forward recovery scheme which runs three versions of computation and detects errors by checking the results of all memory write and branch

operations. In the case of a mismatch, NEMESIS diagnosis routine decides if the error is recoverable. If yes, NEMESIS recovery routine reverts the effect of error from the program state and resumes program normal execution from the error detection

point.
ContributorsDidehban, Moslem (Author) / Shrivastava, Aviral (Thesis advisor) / Wu, Carole-Jean (Committee member) / Clark, Lawrence (Committee member) / Mahlke, Scott (Committee member) / Arizona State University (Publisher)
Created2018
156791-Thumbnail Image.png
Description
General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs

General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions.

Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%.

Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications.

Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future.

In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.
ContributorsArunkumar, Akhil (Author) / Wu, Carole-Jean (Thesis advisor) / Shrivastava, Aviral (Committee member) / Lee, Yann-Hang (Committee member) / Bolotin, Evgeny (Committee member) / Arizona State University (Publisher)
Created2018