Search Content

Characterization of Energy and Performance Bottlenecks in an Omni-directional Camera System

Description

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and…

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.

ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2018

Stagioni: Temperature management to enable near-sensor processing for performance, fidelity, and energy-efficiency of vision and imaging workloads

Description

Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked…

Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked image sensors with near-sensor vision processing units. The characterization reveals that near-sensor processing reduces system power but degrades image quality. For reasonable image fidelity, the sensor temperature needs to stay below a threshold, situationally determined by application needs. Fortunately, the characterization also identifies opportunities -- unique to the needs of near-sensor processing -- to regulate temperature based on dynamic visual task requirements and rapidly increase capture quality on demand.

Based on the characterization, the work proposes and investigate two thermal management strategies -- stop-capture-go and seasonal migration -- for imaging-aware thermal management. The work present parameters that govern the policy decisions and explore the trade-offs between system power and policy overhead. The work's evaluation shows that the novel dynamic thermal management strategies can unlock the energy-efficiency potential of near-sensor processing with minimal performance impact, without compromising image fidelity.

ContributorsKodukula, Venkatesh (Author) / LiKamWa, Robert (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Brunhaver, John (Committee member) / Arizona State University (Publisher)

Created2019

Generating Light Estimation for Mixed-reality Devices through Collaborative Visual Sensing

Description

Mixed reality mobile platforms co-locate virtual objects with physical spaces, creating immersive user experiences. To create visual harmony between virtual and physical spaces, the virtual scene must be accurately illuminated with realistic physical lighting. To this end, a system was designed that Generates Light Estimation Across Mixed-reality (GLEAM) devices to…

Mixed reality mobile platforms co-locate virtual objects with physical spaces, creating immersive user experiences. To create visual harmony between virtual and physical spaces, the virtual scene must be accurately illuminated with realistic physical lighting. To this end, a system was designed that Generates Light Estimation Across Mixed-reality (GLEAM) devices to continually sense realistic lighting of a physical scene in all directions. GLEAM optionally operate across multiple mobile mixed-reality devices to leverage collaborative multi-viewpoint sensing for improved estimation. The system implements policies that prioritize resolution, coverage, or update interval of the illumination estimation depending on the situational needs of the virtual scene and physical environment.

To evaluate the runtime performance and perceptual efficacy of the system, GLEAM was implemented on the Unity 3D Game Engine. The implementation was deployed on Android and iOS devices. On these implementations, GLEAM can prioritize dynamic estimation with update intervals as low as 15 ms or prioritize high spatial quality with update intervals of 200 ms. User studies across 99 participants and 26 scene comparisons reported a preference towards GLEAM over other lighting techniques in 66.67% of the presented augmented scenes and indifference in 12.57% of the scenes. A controlled lighting user study on 18 participants revealed a general preference for policies that strike a balance between resolution and update rate.

ContributorsPrakash, Siddhant (Author) / LiKamWa, Robert (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Hansford, Dianne (Committee member) / Arizona State University (Publisher)

Created2018

Fault-tolerance in Time Sensitive Network with Machine Learning Model

Description

Nowadays, demand from the Internet of Things (IoT), automotive networking, and video applications is driving the transformation of Ethernet. It is a shift towards time-sensitive Ethernet. As a large amount of data is transmitted, many errors occur in the network. For this increased traffic, a Time Sensitive Network (TSN) is…

Nowadays, demand from the Internet of Things (IoT), automotive networking, and video applications is driving the transformation of Ethernet. It is a shift towards time-sensitive Ethernet. As a large amount of data is transmitted, many errors occur in the network. For this increased traffic, a Time Sensitive Network (TSN) is important. Time-Sensitive Network (TSN) is a technology that provides a definitive service for time sensitive traffic in an Ethernet environment that provides time-synchronization. In order to efficiently manage these errors, countermeasures against errors are required. A system that maintains its function even in the event of an internal fault or failure is called a Fault-Tolerant system. For this, after configuring the network environment using the OMNET++ program, machine learning was used to estimate the optimal alternative routing path in case an error occurred in transmission. By setting an alternate path before an error occurs, I propose a method to minimize delay and minimize data loss when an error occurs. Various methods were compared. First, when no replication environment and secondly when ideal replication, thirdly random replication, and lastly replication using ML were tested. In these experiments, replication in an ideal environment showed the best results, which is because everything is optimal. However, except for such an ideal environment, replication prediction using the suggested ML showed the best results. These results suggest that the proposed method is effective, but there may be problems with efficiency and error control, so an additional overview is provided for further improvement.

ContributorsLee, Sang hee (Author) / Reisslein, Martin (Thesis advisor) / LiKamWa, Robert (Committee member) / Thyagaturu, Akhilesh (Committee member) / Arizona State University (Publisher)

Created2022

Networked System for Volumetric Athletic Coaching in Augmented Reality

Description

Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body and thus cannot give precise guidance to the athlete, limiting the…

Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body and thus cannot give precise guidance to the athlete, limiting the athlete’s improvement. To address these challenges, this paper proposes Augmented Coach, an augmented reality platform where coaches can view, manipulate and comment on athletes’ movement volumetric video data remotely via the network. In particular, this work includes a). Capturing the athlete’s movement video data with Kinects and converting it into point cloud format b). Transmitting the point cloud data to the coach’s Oculus headset via 5G or wireless network c). Coach’s commenting on the athlete’s joints. In addition, the evaluation of Augmented Coach includes an assessment of its performance from five metrics via the wireless network and 5G network environment, but also from the coaches’ and athletes’ experience of using it. The result shows that Augmented Coach enables coaches to instruct athletes from a distance and provide effective feedback for correcting athletes’ motions under the network.

ContributorsQiao, Yunhan (Author) / LiKamWa, Robert (Thesis advisor) / Bansal, Ajay (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2023

A Scalable and Programmable I/O Controller for Region-based Computing

Description

I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it fits in to rhythmic pixel regions, along with a studyon…

I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it fits in to rhythmic pixel regions, along with a studyon memory traffic of rhythmic pixel regions and how this translates to energy efficiency. This rhythmic pixel region-based camera pipeline has been jointly developed through Dr. Robert LiKamWa’s research lab. High spatiotemporal resolutions allow high precision for vision applications, such as for detecting features for augmented reality or face detection. High spatiotemporal resolution also comes with high memory throughput, leading to higher energy usage. This creates a tradeoff between high precision and energy efficiency, which becomes more important in mobile systems. In addition, not all pixels in a frame are necessary for the vision application, such as pixels that make up the background. Rhythmic pixel regions aim to reduce the tradeoff by creating a pipeline that allows an application developer to specify regions to capture at a non-uniform spatiotemporal resolution. This is accomplished by encoding the incoming image, and only sending the pixels within these specified regions. Later these encoded representations will be decoded to a standard frame representation usable by traditional vision applications. My contribution to this effort has been the design, testing and evaluation of the I/O controller.

ContributorsNguyen, Van (Author) / LiKamWa, Robert (Thesis advisor) / Jayasuriya, Suren (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

B-AWARE: Blockage Aware RSU Scheduling for 5G Enabled Autonomous Vehicles

Description

5G Millimeter Wave (mmWave) technology holds great promise for Connected Autonomous Vehicles (CAVs) due to its ability to achieve data rates in the Gbps range. However, mmWave suffers high beamforming overhead and requirement of line of sight (LOS) to maintain a strong connection. For Vehicle-to-Infrastructure (V2I) scenarios, where CAVs connect…

5G Millimeter Wave (mmWave) technology holds great promise for Connected Autonomous Vehicles (CAVs) due to its ability to achieve data rates in the Gbps range. However, mmWave suffers high beamforming overhead and requirement of line of sight (LOS) to maintain a strong connection. For Vehicle-to-Infrastructure (V2I) scenarios, where CAVs connect to roadside units (RSUs), these drawbacks become apparent. Because vehicles are dynamic, there is a large potential for link blockages, which in turn is detrimental to the connected applications running on the vehicle, such as cooperative perception and remote driver takeover. Existing RSU selection schemes base their decisions on signal strength and vehicle trajectory alone, which is not enough to prevent the blockage of links. Most recent CAVs motion planning algorithms routinely use other vehicle's near-future plans, either by explicit communication among vehicles, or by prediction. In this thesis, I make use of this knowledge (of the other vehicle's near future path plans) to further improve the RSU association mechanism for CAVs. I solve the RSU association problem by converting it to a shortest path problem with the objective to maximize the total communication bandwidth. Evaluations of B-AWARE in simulation using Simulated Urban Mobility (SUMO) and Digital twin for self-dRiving Intelligent VEhicles (DRIVE) on 12 highway and city street scenarios with varying traffic density and RSU placements show that B-AWARE results in a 1.05x improvement of the potential datarate in the average case and 1.28x in the best case vs. the state of the art. But more impressively, B-AWARE reduces the time spent with no connection by 48% in the average case and 251% in the best case as compared to the state-of-the-art methods. This is partly a result of B-AWARE reducing almost 100% of blockage occurrences in simulation.

ContributorsSzeto, Matthew (Author) / Shrivastava, Aviral (Thesis advisor) / LiKamWa, Robert (Committee member) / Meuth, Ryan (Committee member) / Arizona State University (Publisher)

Created2023

Accelerating Linear Algebra and Machine Learning Kernels on a Massively Parallel Reconfigurable Architecture

Description

This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD), QR Decomposition (QRD), and Matrix Inversion. The machine learning kernels…

This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD), QR Decomposition (QRD), and Matrix Inversion. The machine learning kernels include an LSTM (Long Short Term Memory) cell, and a GRU (gated Recurrent Unit) cell used in recurrent neural networks. The neural network based recommender systems engine consists of multiple kernels including fully connected layers, embedding layer, 1-D batchnorm, Adam optimizer, etc.

Transformer is a massively parallel reconfigurable multicore architecture designed at the University of Michigan. The Transformer configuration considered here is 4 tiles and 16 General Processing Elements (GPEs) per tile. It supports a two level cache hierarchy where the L1 and L2 caches can operate in shared (S) or private (P) modes. The architecture was modeled using Gem5 and cycle accurate simulations were done to evaluate the performance in terms of execution times, giga-operations per second per Watt (GOPS/W), and giga-floating-point-operations per second per Watt (GFLOPS/W).

This thesis shows that for linear algebra kernels, each kernel achieves high performance for a certain cache mode and that this cache mode can change when the matrix size changes. For instance, for smaller matrix sizes, L1P, L2P cache mode is best for TRSM, while L1S, L2S is the best cache mode for LUD, and L1P, L2S is the best for QRD. For each kernel, the optimal cache mode changes when the matrix size is increased. For instance, for TRSM, the L1P, L2P cache mode is best for smaller matrix sizes ($N=64, 128, 256, 512$) and it changes to L1S, L2P for larger matrix sizes ($N=1024$). For machine learning kernels, L1P, L2P is the best cache mode for all network parameter sizes.

Gem5 simulations show that the peak performance for TRSM, LUD, QRD and Matrix Inverse in the 14nm node is 97.5, 59.4, 133.0 and 83.05 GFLOPS/W, respectively. For LSTM and GRU, the peak performance is 44.06 and 69.3 GFLOPS/W.

The neural network based recommender system was implemented in L1S, L2S cache mode. It includes a forward pass and a backward pass and is significantly more complex in terms of both computational complexity and data movement. The most computationally intensive block is the fully connected layer followed by Adam optimizer. The overall performance of the recommender systems engine is 54.55 GFLOPS/W and 169.12 GOPS/W.

ContributorsSoorishetty, Anuraag (Author) / Chakrabarti, Chaitali (Thesis advisor) / Kim, Hun Seok (Committee member) / LiKamWa, Robert (Committee member) / Arizona State University (Publisher)

Created2019

Viewpoint Recommendation for Aesthetic Photography

Description

This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile robot or smartphone. Solving this problem enables to capture visually…

This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile robot or smartphone. Solving this problem enables to capture visually pleasing photographs autonomously in areal photography, wildlife photography, landscape photography or in personal photography.

The viewpoint recommendation problem can be divided into two stages: (a) generating a set of dense novel views based on the basis views captured about the subject. The dense novel views are useful to better understand the scene and to know how the subject looks from different viewpoints and (b) each novel is scored based on how aesthetically good it is. The viewpoint with the greatest aesthetic score is recommended for capturing a visually pleasing photograph.

ContributorsKatukuri, Sathish Kumar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2019

Protecting Visual Information in Augmented Reality from Malicious Application Developers

Description

Visual applications – those that use camera frames as part of the application – provide a rich, context-aware experience. The continued development of mixed and augmented reality (MR/AR) computing environments furthers the richness of this experience by providing applications a continuous vision experience, where visual information continuously provides context for…

Visual applications – those that use camera frames as part of the application – provide a rich, context-aware experience. The continued development of mixed and augmented reality (MR/AR) computing environments furthers the richness of this experience by providing applications a continuous vision experience, where visual information continuously provides context for applications and the real world is augmented by the virtual. To understand user privacy concerns in continuous vision computing environments, this work studies three MR/AR applications (augmented markers, augmented faces, and text capture) to show that in a modern mobile system, the typical user is exposed to potential mass collection of sensitive information, posing privacy and security deficiencies to be addressed in future systems.

To address such deficiencies, a development framework is proposed that provides resource isolation between user information contained in camera frames and application access to the network. The design is implemented using existing system utilities as a proof of concept on the Android operating system and demonstrates its viability with a modern state-of-the-art augmented reality library and several augmented reality applications. Evaluation is conducted on the design on a Samsung Galaxy S8 phone by comparing the applications from the case study with modified versions which better protect user privacy. Early results show that the new design efficiently protects users against data collection in MR/AR applications with less than 0.7% performance overhead.

ContributorsJensen, Jk (Author) / LiKamWa, Robert (Thesis advisor) / Doupe, Adam (Committee member) / Wang, Ruoyu (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by