Matching Items (46)
Filtering by

Clear all filters

Description
Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.
ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2018
156747-Thumbnail Image.png
Description
Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements.

First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.



Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.

Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.
ContributorsDodge, Samuel Fuller (Author) / Karam, Lina (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2018
157215-Thumbnail Image.png
Description
Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven

Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven approach for NLOS 3D local-

ization requiring only a conventional camera and projector. The localisation is performed

using a voxelisation and a regression problem. Accuracy of greater than 90% is achieved

in localizing a NLOS object to a 5cm × 5cm × 5cm volume in real data. By adopting

the regression approach an object of width 10cm to localised to approximately 1.5cm. To

generalize to line-of-sight (LOS) scenes with non-planar surfaces, an adaptive lighting al-

gorithm is adopted. This algorithm, based on radiosity, identifies and illuminates scene

patches in the LOS which most contribute to the NLOS light paths, and can factor in sys-

tem power constraints. Improvements ranging from 6%-15% in accuracy with a non-planar

LOS wall using adaptive lighting is reported, demonstrating the advantage of combining

the physics of light transport with active illumination for data-driven NLOS imaging.
ContributorsChandran, Sreenithy (Author) / Jayasuriya, Suren (Thesis advisor) / Turaga, Pavan (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)
Created2019
157065-Thumbnail Image.png
Description
The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these

The detection and segmentation of objects appearing in a natural scene, often referred to as Object Detection, has gained a lot of interest in the computer vision field. Although most existing object detectors aim to detect all the objects in a given scene, it is important to evaluate whether these methods are capable of detecting the salient objects in the scene when constraining the number of proposals that can be generated due to constraints on timing or computations during execution. Salient objects are objects that tend to be more fixated by human subjects. The detection of salient objects is important in applications such as image collection browsing, image display on small devices, and perceptual compression.

This thesis proposes a novel evaluation framework that analyses the performance of popular existing object proposal generators in detecting the most salient objects. This work also shows that, by incorporating saliency constraints, the number of generated object proposals and thus the computational cost can be decreased significantly for a target true positive detection rate (TPR).

As part of the proposed framework, salient ground-truth masks are generated from the given original ground-truth masks for a given dataset. Given an object detection dataset, this work constructs salient object location ground-truth data, referred to here as salient ground-truth data for short, that only denotes the locations of salient objects. This is obtained by first computing a saliency map for the input image and then using it to assign a saliency score to each object in the image. Objects whose saliency scores are sufficiently high are referred to as salient objects. The detection rates are analyzed for existing object proposal generators with respect to the original ground-truth masks and the generated salient ground-truth masks.

As part of this work, a salient object detection database with salient ground-truth masks was constructed from the PASCAL VOC 2007 dataset. Not only does this dataset aid in analyzing the performance of existing object detectors for salient object detection, but it also helps in the development of new object detection methods and evaluating their performance in terms of successful detection of salient objects.
ContributorsKotamraju, Sai Prajwal (Author) / Karam, Lina J (Thesis advisor) / Yu, Hongbin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019
134018-Thumbnail Image.png
Description
Approximately 248 million people in the world are currently living with chronic Hepatitis B virus (HBV) infection. HBV and HCV infections are the primary cause of liver diseases such as cirrhosis and hepatocellular carcinomas in the world with an estimated 1.4 million deaths annually. HBV in the Republic of Peru

Approximately 248 million people in the world are currently living with chronic Hepatitis B virus (HBV) infection. HBV and HCV infections are the primary cause of liver diseases such as cirrhosis and hepatocellular carcinomas in the world with an estimated 1.4 million deaths annually. HBV in the Republic of Peru was used as a case study of an emerging and rapidly spreading disease in a developing nation. Wherein, clinical diagnosis of HBV infections in at-risk communities such the Amazon Region and the Andes Mountains are challenging due to a myriad of reasons. High prices of clinical diagnosis and limited access to treatment are alone the most significant deterrent for individuals living in at-risk communities to get the much need help. Additionally, limited testing facilities, lack of adequate testing policies or national guidelines, poor laboratory capacity, resource-limited settings, geographical isolation, and public mistrust are among the chief reasons for low HBV testing. Although, preventative vaccination programs deployed by the Peruvian health officials have reduced the number of infected individuals by year and region. To significantly reduce or eradicate HBV in hyperendemic areas and countries such as Peru, preventative clinical diagnosis and vaccination programs are an absolute necessity. Consequently, the need for a portable low-priced diagnostic platform for the detection of HBV and other diseases is substantial and urgent not only in Peru but worldwide. Some of these concerns were addressed by designing a low-cost, rapid detection platform. In that, an immunosignature technology (IMST) slide used to test for reactivity against the presence of antibodies in the serum-sample was used to test for picture resolution and clarity. IMST slides were scanned using a smartphone camera placed on top of the designed device housing a circuit of 32 LED lights at 647 nm, an optical magnifier at 15X, and a linear polarizing film sheet. Tow 9V batteries powered the scanning device LED circuit ensuring enough lighting. The resulting pictures from the first prototype showed that by lighting the device at 647 nm and using a smartphone camera, the camera could capture high-resolution images. These results conclusively indicate that with any modern smartphone camera, a small box lighted to 647 nm, and optical magnifier; a powerful and expensive laboratory scanning machine can be replaced by another that is inexpensive, portable and ready to use anywhere.
ContributorsMakimaa, Heyde (Author) / Holechek, Susan (Thesis director) / Stafford, Phillip (Committee member) / Jayasuriya, Suren (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
147587-Thumbnail Image.png
Description

The purpose of this project is to create a useful tool for musicians that utilizes the harmonic content of their playing to recommend new, relevant chords to play. This is done by training various Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) on the lead sheets of 100 different jazz

The purpose of this project is to create a useful tool for musicians that utilizes the harmonic content of their playing to recommend new, relevant chords to play. This is done by training various Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) on the lead sheets of 100 different jazz standards. A total of 200 unique datasets were produced and tested, resulting in the prediction of nearly 51 million chords. A note-prediction accuracy of 82.1% and a chord-prediction accuracy of 34.5% were achieved across all datasets. Methods of data representation that were rooted in valid music theory frameworks were found to increase the efficacy of harmonic prediction by up to 6%. Optimal LSTM input sizes were also determined for each method of data representation.

ContributorsRangaswami, Sriram Madhav (Author) / Lalitha, Sankar (Thesis director) / Jayasuriya, Suren (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2021-05
168739-Thumbnail Image.png
Description
Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the lit­erature, but there is a lack of work

Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the lit­erature, but there is a lack of work on evaluation of the efficiency of the image sensors. In this thesis, two methods are evaluated: adaptive image sensor quantization for traditional camera pipelines as well as new event­-based sensors for low­-power computer vision.The first contribution in this thesis is an evaluation of performing varying levels of sen­sor linear and logarithmic quantization with the task of visual simultaneous localization and mapping (SLAM). This unconventional method can provide efficiency benefits with a trade­ off between accuracy of the task and energy-­efficiency. A new sensor quantization method, gradient­-based quantization, is introduced to improve the accuracy of the task. This method only lowers the bit level of parts of the image that are less likely to be important in the SLAM algorithm since lower bit levels signify better energy­-efficiency, but worse task accuracy. The third contribution is an evaluation of the efficiency and accuracy of event­-based camera inten­sity representations for the task of optical flow. The results of performing a learning based optical flow are provided for each of five different reconstruction methods along with ablation studies. Lastly, the challenges of an event feature­-based SLAM system are presented with re­sults demonstrating the necessity for high quality and high­ resolution event data. The work in this thesis provides studies useful for examining trade­offs for an efficient visual navigation system with traditional and event vision sensors. The results of this thesis also provide multiple directions for future work.
ContributorsChristie, Olivia Catherine (Author) / Jayasuriya, Suren (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
Description
Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.
ContributorsDredd, Dominique (Author) / Jayasuriya, Suren (Thesis director) / Barnett, Jessica (Committee member) / Barrett, The Honors College (Contributor) / Arts, Media and Engineering Sch T (Contributor)
Created2023-05
Description

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.

ContributorsDredd, Dominique (Author) / Jayasuriya, Suren (Thesis director) / Barnett, Jessica (Committee member) / Barrett, The Honors College (Contributor) / Arts, Media and Engineering Sch T (Contributor)
Created2023-05
Description

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow

Finding Flow was inspired by a previous research project, Zen and the Art of STEAM. The concept of flow was developed by Mihaly Csikszentmihalyi and can be described as "being in the zone." The previous research project focused on digital culture students and whether they could find states of flow within their coursework. This thesis project aimed to develop a website prototype that could be used to help students who struggled to find flow.

ContributorsDredd, Dominique (Author) / Jayasuriya, Suren (Thesis director) / Barnett, Jessica (Committee member) / Barrett, The Honors College (Contributor) / Arts, Media and Engineering Sch T (Contributor)
Created2023-05