Search Content

Characterization of Energy and Performance Bottlenecks in an Omni-directional Camera System

Description

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and…

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.

ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2018

Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition

Description

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use…

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements.

First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.

Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.

Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.

ContributorsDodge, Samuel Fuller (Author) / Karam, Lina (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2018

Stagioni: Temperature management to enable near-sensor processing for performance, fidelity, and energy-efficiency of vision and imaging workloads

Description

Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked…

Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked image sensors with near-sensor vision processing units. The characterization reveals that near-sensor processing reduces system power but degrades image quality. For reasonable image fidelity, the sensor temperature needs to stay below a threshold, situationally determined by application needs. Fortunately, the characterization also identifies opportunities -- unique to the needs of near-sensor processing -- to regulate temperature based on dynamic visual task requirements and rapidly increase capture quality on demand.

Based on the characterization, the work proposes and investigate two thermal management strategies -- stop-capture-go and seasonal migration -- for imaging-aware thermal management. The work present parameters that govern the policy decisions and explore the trade-offs between system power and policy overhead. The work's evaluation shows that the novel dynamic thermal management strategies can unlock the energy-efficiency potential of near-sensor processing with minimal performance impact, without compromising image fidelity.

ContributorsKodukula, Venkatesh (Author) / LiKamWa, Robert (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Brunhaver, John (Committee member) / Arizona State University (Publisher)

Created2019

Generating Light Estimation for Mixed-reality Devices through Collaborative Visual Sensing

Description

Mixed reality mobile platforms co-locate virtual objects with physical spaces, creating immersive user experiences. To create visual harmony between virtual and physical spaces, the virtual scene must be accurately illuminated with realistic physical lighting. To this end, a system was designed that Generates Light Estimation Across Mixed-reality (GLEAM) devices to…

Mixed reality mobile platforms co-locate virtual objects with physical spaces, creating immersive user experiences. To create visual harmony between virtual and physical spaces, the virtual scene must be accurately illuminated with realistic physical lighting. To this end, a system was designed that Generates Light Estimation Across Mixed-reality (GLEAM) devices to continually sense realistic lighting of a physical scene in all directions. GLEAM optionally operate across multiple mobile mixed-reality devices to leverage collaborative multi-viewpoint sensing for improved estimation. The system implements policies that prioritize resolution, coverage, or update interval of the illumination estimation depending on the situational needs of the virtual scene and physical environment.

To evaluate the runtime performance and perceptual efficacy of the system, GLEAM was implemented on the Unity 3D Game Engine. The implementation was deployed on Android and iOS devices. On these implementations, GLEAM can prioritize dynamic estimation with update intervals as low as 15 ms or prioritize high spatial quality with update intervals of 200 ms. User studies across 99 participants and 26 scene comparisons reported a preference towards GLEAM over other lighting techniques in 66.67% of the presented augmented scenes and indifference in 12.57% of the scenes. A controlled lighting user study on 18 participants revealed a general preference for policies that strike a balance between resolution and update rate.

ContributorsPrakash, Siddhant (Author) / LiKamWa, Robert (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Hansford, Dianne (Committee member) / Arizona State University (Publisher)

Created2018

Adaptive Lighting for Data-Driven Non-Line-Of-Sight 3D Localization

Description

Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven…

Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven approach for NLOS 3D local-

ization requiring only a conventional camera and projector. The localisation is performed

using a voxelisation and a regression problem. Accuracy of greater than 90% is achieved

in localizing a NLOS object to a 5cm × 5cm × 5cm volume in real data. By adopting

the regression approach an object of width 10cm to localised to approximately 1.5cm. To

generalize to line-of-sight (LOS) scenes with non-planar surfaces, an adaptive lighting al-

gorithm is adopted. This algorithm, based on radiosity, identifies and illuminates scene

patches in the LOS which most contribute to the NLOS light paths, and can factor in sys-

tem power constraints. Improvements ranging from 6%-15% in accuracy with a non-planar

LOS wall using adaptive lighting is reported, demonstrating the advantage of combining

the physics of light transport with active illumination for data-driven NLOS imaging.

ContributorsChandran, Sreenithy (Author) / Jayasuriya, Suren (Thesis advisor) / Turaga, Pavan (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)

Created2019

Performance Evaluation of Object Proposal Generators for Salient Object Detection

Description

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these…

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these methods are capable of detecting the salient objects in the scene when constraining the number of proposals that can be generated due to constraints on timing or computations during execution. Salient objects are objects that tend to be more fixated by human subjects. The detection of salient objects is important in applications such as image collection browsing, image display on small devices, and perceptual compression.

This thesis proposes a novel evaluation framework that analyses the performance of popular existing object proposal generators in detecting the most salient objects. This work also shows that, by incorporating saliency constraints, the number of generated object proposals and thus the computational cost can be decreased significantly for a target true positive detection rate (TPR).

As part of the proposed framework, salient ground-truth masks are generated from the given original ground-truth masks for a given dataset. Given an object detection dataset, this work constructs salient object location ground-truth data, referred to here as salient ground-truth data for short, that only denotes the locations of salient objects. This is obtained by first computing a saliency map for the input image and then using it to assign a saliency score to each object in the image. Objects whose saliency scores are sufficiently high are referred to as salient objects. The detection rates are analyzed for existing object proposal generators with respect to the original ground-truth masks and the generated salient ground-truth masks.

As part of this work, a salient object detection database with salient ground-truth masks was constructed from the PASCAL VOC 2007 dataset. Not only does this dataset aid in analyzing the performance of existing object detectors for salient object detection, but it also helps in the development of new object detection methods and evaluating their performance in terms of successful detection of salient objects.

ContributorsKotamraju, Sai Prajwal (Author) / Karam, Lina J (Thesis advisor) / Yu, Hongbin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2019

Exploring the Influence of Visualized Data: Inclusion and Collaboration Between University Members

Description

Visualizations are an integral component for communicating and evaluating modern networks. As data becomes more complex, info-graphics require a balance between visual noise and effective storytelling that is often restricted by layouts unsuitable for scalability. The challenge then rests upon researchers to effectively structure their information in a way that…

Visualizations are an integral component for communicating and evaluating modern networks. As data becomes more complex, info-graphics require a balance between visual noise and effective storytelling that is often restricted by layouts unsuitable for scalability. The challenge then rests upon researchers to effectively structure their information in a way that allows for flexible, transparent illustration. We propose network graphing as an operative alternative for demonstrating community behavior over traditional charts which are unable to look past numeric data. In this paper, we explore methods for manipulating, processing, cleaning, and aggregating data in Python; a programming language tailored for handling structured data, which can then be formatted for analysis and modeling of social network tendencies in Gephi. We implement this data by applying an algorithm known as the Fruchterman-Reingold force-directed layout to datasets of Arizona State University’s research and collaboration network. The result is a visualization that analyzes the university’s infrastructure by providing insight about community behaviors between colleges. Furthermore, we highlight how the flexibility of this visualization provides a foundation for specific use cases by demonstrating centrality measures to find important liaisons that connect distant communities.

ContributorsMcMichael, Jacob Andrew (Author) / LiKamWa, Robert (Thesis director) / Anderson, Derrick (Committee member) / Goshert, Maxwell (Committee member) / Arts, Media and Engineering Sch T (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Battleship: A Case Study of the Augmented Reality User Experience

Description

Emerging technologies, such as augmented reality (AR), are growing in popularity and accessibility at a fast pace. Developers are building more and more games and applications with this technology but few have stopped to think about what the best practices are for creating a good user experience (UX). Currently, there…

Emerging technologies, such as augmented reality (AR), are growing in popularity and accessibility at a fast pace. Developers are building more and more games and applications with this technology but few have stopped to think about what the best practices are for creating a good user experience (UX). Currently, there are no universally accepted human-computer interaction guidelines for augmented reality because it is still relatively new. This paper examines three features - virtual content scale, indirect selection, and virtual buttons - in an attempt to discover their impact on the user experience in augmented reality. A Battleship game was developed using the Unity game engine with Vuforia, an augmented reality platform, and built as an iOS application to test these features. The hypothesis was that both virtual content scale and indirect selection would result in a more enjoyable and engaging user experience whereas the virtual button would be too confusing for users to fully appreciate the feature. Usability testing was conducted to gauge participants' responses to these features. After playing a base version of the game with no additional features and then a second version with one of the three features, participants rated their experiences and provided feedback in a four-part survey. It was observed during testing that people did not inherently move their devices around the augmented space and needed guidance to navigate the game. Most users were fascinated with the visuals of the game and two of the tested features. It was found that movement around the augmented space and feedback from the virtual content were critical aspects in creating a good user experience in augmented reality.

ContributorsBauman, Kirsten (Co-author) / Benson, Meera (Co-author) / Olson, Loren (Thesis director) / LiKamWa, Robert (Committee member) / School of the Arts, Media and Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

ECOAcoustic: A VR Experience

DescriptionAcoustic Ecology is an undervalued field of study of the relationship between the environment and sound. This project aims to educate people on this topic and show people the importance by immersing them in virtual reality scenes. The scenes were created using VR180 content as well as 360° spatial audio.

ContributorsNeel, Jordan Tanner (Author) / LiKamWa, Robert (Thesis director) / Feisst, Sabine (Committee member) / Arts, Media and Engineering Sch T (Contributor) / Department of Psychology (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Creative Project: Dale and Edna

Description

Dale and Edna is a hybrid animated film and videogame experienced in virtual reality with dual storylines that increases in potential meanings through player interaction. Developed and played within Unreal Engine 4 using the HTC Vive, Oculus, or PlayStation VR, Dale and Edna allows for players to passively enjoy the…

Dale and Edna is a hybrid animated film and videogame experienced in virtual reality with dual storylines that increases in potential meanings through player interaction. Developed and played within Unreal Engine 4 using the HTC Vive, Oculus, or PlayStation VR, Dale and Edna allows for players to passively enjoy the film element of the project or partake in the active videogame portion. Exploration of the virtual story world yields more information about that world, which may or may not alter the audience’s perception of the world. The film portion of the project is a static narrative with a plot that cannot be altered by players within the virtual world. In the static plot, the characters Dale and Edna discover and subsequently combat an alien invasion that appears to have the objective of demolishing Dale’s prize pumpkin. However, the aliens in the film plot are merely projections created by AR headsets that are reflecting Jimmy’s gameplay on his tablet. The audience is thus invited to question their perception of reality through combined use of VR and AR. The game element is a dynamic narrative scaffold that does not unfold as a traditional narrative might. Instead, what a player observes and interacts with within the sandbox level will determine the meaning those players come away from this project with. Both elements of the project feature modular code construction so developers can return to both the film and game portions of the project and make additions. This paper will analyze the chronological development of the project along with the guiding philosophy that was revealed in the result.
Keywords: virtual reality, film, videogame, sandbox

ContributorsKemp, Adam Lee (Co-author) / Kemp, Bradley (Co-author) / Kemp, Claire (Co-author) / LiKamWa, Robert (Thesis director) / Gilfillan, Daniel (Committee member) / Arts, Media and Engineering Sch T (Contributor) / Thunderbird School of Global Management (Contributor) / School of Film, Dance and Theatre (Contributor) / School of International Letters and Cultures (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05