Matching Items (23)
Filtering by

Clear all filters

130884-Thumbnail Image.png
Description
Commonly, image processing is handled on a CPU that is connected to the image sensor by a wire. In these far-sensor processing architectures, there is energy loss associated with sending data across an interconnect from the sensor to the CPU. In an effort to increase energy efficiency, near-sensor processing architectures

Commonly, image processing is handled on a CPU that is connected to the image sensor by a wire. In these far-sensor processing architectures, there is energy loss associated with sending data across an interconnect from the sensor to the CPU. In an effort to increase energy efficiency, near-sensor processing architectures have been developed, in which the sensor and processor are stacked directly on top of each other. This reduces energy loss associated with sending data off-sensor. However, processing near the image sensor causes the sensor to heat up. Reports of thermal noise in near-sensor processing architectures motivated us to study how temperature affects image quality on a commercial image sensor and how thermal noise affects computer vision task accuracy. We analyzed image noise across nine different temperatures and three sensor configurations to determine how image noise responds to an increase in temperature. Ultimately, our team used this information, along with transient analysis of a stacked image sensor’s thermal behavior, to advise thermal management strategies that leverage the benefits of near-sensor processing and prevent accuracy loss at problematic temperatures.
ContributorsJones, Britton Steele (Author) / LiKamWa, Robert (Thesis director) / Jayasuriya, Suren (Committee member) / Watts College of Public Service & Community Solut (Contributor) / Electrical Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2020-12
190757-Thumbnail Image.png
Description
Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain

Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain from another. Bridging this gap is the key to unlocking new visual computing capabilities for end applications in commercial photography, industrial inspection, and robotics. This thesis explores avenues where hardware-software co-design of image sensors can be leveraged to replace conventional hardware components in an imaging system with software for enhanced reconfigurability. As a result, the user can program the image sensor in a way best suited to the end application. This is referred to as software-defined imaging (SDI), where image sensor behavior can be altered by the system software depending on the user's needs. The scope of this thesis covers the development and deployment of SDI algorithms for low-power computer vision. Strategies for sparse spatial sampling have been developed in this thesis for power optimization of the vision sensor. This dissertation shows how a hardware-compatible state-of-the-art object tracker can be coupled with a Kalman filter for energy gains at the sensor level. Extensive experiments reveal how adaptive spatial sampling of image frames with this hardware-friendly framework offers attractive energy-accuracy tradeoffs. Another thrust of this thesis is to demonstrate the benefits of reinforcement learning in this research avenue. A major finding reported in this dissertation shows how neural-network-based reinforcement learning can be exploited for the adaptive subsampling framework to achieve improved sampling performance, thereby optimizing the energy efficiency of the image sensor. The last thrust of this thesis is to leverage emerging event-based SDI technology for building a low-power navigation system. A homography estimation pipeline has been proposed in this thesis which couples the right data representation with a differential scale-invariant feature transform (SIFT) module to extract rich visual cues from event streams. Positional encoding is leveraged with a multilayer perceptron (MLP) network to get robust homography estimation from event data.
ContributorsIqbal, Odrika (Author) / Jayasuriya, Suren (Thesis advisor) / Spanias, Andreas (Thesis advisor) / LiKamWa, Robert (Committee member) / Owens, Chris (Committee member) / Arizona State University (Publisher)
Created2023
161987-Thumbnail Image.png
Description
Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with

Machine learning (ML) and deep learning (DL) has become an intrinsic part of multiple fields. The ability to solve complex problems makes machine learning a panacea. In the last few years, there has been an explosion of data generation, which has greatly improvised machine learning models. But this comes with a cost of high computation, which invariably increases power usage and cost of the hardware. In this thesis we explore applications of ML techniques, applied to two completely different fields - arts, media and theater and urban climate research using low-cost and low-powered edge devices. The multi-modal chatbot uses different machine learning techniques: natural language processing (NLP) and computer vision (CV) to understand inputs of the user and accordingly perform in the play and interact with the audience. This system is also equipped with other interactive hardware setups like movable LED systems, together they provide an experiential theatrical play tailored to each user. I will discuss how I used edge devices to achieve this AI system which has created a new genre in theatrical play. I will then discuss MaRTiny, which is an AI-based bio-meteorological system that calculates mean radiant temperature (MRT), which is an important parameter for urban climate research. It is also equipped with a vision system that performs different machine learning tasks like pedestrian and shade detection. The entire system costs around $200 which can potentially replace the existing setup worth $20,000. I will further discuss how I overcame the inaccuracies in MRT value caused by the system, using machine learning methods. These projects although belonging to two very different fields, are implemented using edge devices and use similar ML techniques. In this thesis I will detail out different techniques that are shared between these two projects and how they can be used in several other applications using edge devices.
ContributorsKulkarni, Karthik Kashinath (Author) / Jayasuriya, Suren (Thesis advisor) / Middel, Ariane (Thesis advisor) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)
Created2021
168739-Thumbnail Image.png
Description
Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the lit­erature, but there is a lack of work

Visual navigation is a useful and important task for a variety of applications. As the preva­lence of robots increase, there is an increasing need for energy-­efficient navigation methods as well. Many aspects of efficient visual navigation algorithms have been implemented in the lit­erature, but there is a lack of work on evaluation of the efficiency of the image sensors. In this thesis, two methods are evaluated: adaptive image sensor quantization for traditional camera pipelines as well as new event­-based sensors for low­-power computer vision.The first contribution in this thesis is an evaluation of performing varying levels of sen­sor linear and logarithmic quantization with the task of visual simultaneous localization and mapping (SLAM). This unconventional method can provide efficiency benefits with a trade­ off between accuracy of the task and energy-­efficiency. A new sensor quantization method, gradient­-based quantization, is introduced to improve the accuracy of the task. This method only lowers the bit level of parts of the image that are less likely to be important in the SLAM algorithm since lower bit levels signify better energy­-efficiency, but worse task accuracy. The third contribution is an evaluation of the efficiency and accuracy of event­-based camera inten­sity representations for the task of optical flow. The results of performing a learning based optical flow are provided for each of five different reconstruction methods along with ablation studies. Lastly, the challenges of an event feature­-based SLAM system are presented with re­sults demonstrating the necessity for high quality and high­ resolution event data. The work in this thesis provides studies useful for examining trade­offs for an efficient visual navigation system with traditional and event vision sensors. The results of this thesis also provide multiple directions for future work.
ContributorsChristie, Olivia Catherine (Author) / Jayasuriya, Suren (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
165564-Thumbnail Image.png
Description

Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for

Video playback is currently the primary method coaches and athletes use in sports training to give feedback on the athlete’s form and timing. Athletes will commonly record themselves using a phone or camera when practicing a sports movement, such as shooting a basketball, to then send to their coach for feedback on how to improve. In this work, we present Augmented Coach, an augmented reality tool for coaches to give spatiotemporal feedback through a 3-dimensional point cloud of the athlete. The system allows coaches to view a pre-recorded video of their athlete in point cloud form, and provides them with the proper tools in order to go frame by frame to both analyze the athlete’s form and correct it. The result is a fundamentally new concept of an interactive video player, where the coach can remotely view the athlete in a 3-dimensional form and create annotations to help improve their form. We then conduct a user study with subject matter experts to evaluate the usability and capabilities of our system. As indicated by the results, Augmented Coach successfully acts as a supplement to in-person coaching, since it allows coaches to break down the video recording in a 3-dimensional space and provide feedback spatiotemporally. The results also indicate that Augmented Coach can be a complete coaching solution in a remote setting. This technology will be extremely relevant in the future as coaches look for new ways to improve their feedback methods, especially in a remote setting.

ContributorsChannar, Sameer (Author) / Dbeis, Yasser (Co-author) / Richards, Connor (Co-author) / LiKamWa, Robert (Thesis director) / Jayasuriya, Suren (Committee member) / Barrett, The Honors College (Contributor) / Dean, W.P. Carey School of Business (Contributor) / Computer Science and Engineering Program (Contributor)
Created2022-05
190765-Thumbnail Image.png
Description
Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.
ContributorsZhang, Jianwei (Author) / Jayasuriya, Suren (Thesis advisor) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)
Created2023
193509-Thumbnail Image.png
Description
In the rapidly evolving field of computer vision, propelled by advancements in deeplearning, the integration of hardware-software co-design has become crucial to overcome the limitations of traditional imaging systems. This dissertation explores the integration of hardware-software co-design in computational imaging, particularly in light transport acquisition and Non-Line-of-Sight (NLOS) imaging. By leveraging projector-camera systems and

In the rapidly evolving field of computer vision, propelled by advancements in deeplearning, the integration of hardware-software co-design has become crucial to overcome the limitations of traditional imaging systems. This dissertation explores the integration of hardware-software co-design in computational imaging, particularly in light transport acquisition and Non-Line-of-Sight (NLOS) imaging. By leveraging projector-camera systems and computational techniques, this thesis address critical challenges in imaging complex environments, such as adverse weather conditions, low-light scenarios, and the imaging of reflective or transparent objects. The first contribution in this thesis is the theory, design, and implementation of a slope disparity gating system, which is a vertically aligned configuration of a synchronized raster scanning projector and rolling-shutter camera, facilitating selective imaging through disparity-based triangulation. This system introduces a novel, hardware-oriented approach to selective imaging, circumventing the limitations of post-capture processing. The second contribution of this thesis is the realization of two innovative approaches for spotlight optimization to improve localization and tracking for NLOS imaging. The first approach utilizes radiosity-based optimization to improve 3D localization and object identification for small-scale laboratory settings. The second approach introduces a learningbased illumination network along with a differentiable renderer and NLOS estimation network to optimize human 2D localization and activity recognition. This approach is validated on a large, room-scale scene with complex line-of-sight geometries and occluders. The third contribution of this thesis is an attention-based neural network for passive NLOS settings where there is no controllable illumination. The thesis demonstrates realtime, dynamic NLOS human tracking where the camera is moving on a mobile robotic platform. In addition, this thesis contains an appendix featuring temporally consistent relighting for portrait videos with applications in computer graphics and vision.
ContributorsChandran, Sreenithy (Author) / Jayasuriya, Suren (Thesis advisor) / Turaga, Pavan (Committee member) / Dasarathy, Gautam (Committee member) / Kubo, Hiroyuki (Committee member) / Arizona State University (Publisher)
Created2024
193693-Thumbnail Image.png
Description
Virtual reality (VR) provides significant opportunities for students to experience immersive education. In VR, students can travel to the international space station, or go through a science experiment at home. However, the current tactile feedback provided by these systems do not feel real. Controllers do not provide the same tactile

Virtual reality (VR) provides significant opportunities for students to experience immersive education. In VR, students can travel to the international space station, or go through a science experiment at home. However, the current tactile feedback provided by these systems do not feel real. Controllers do not provide the same tactile feedback experienced in the physical world. This dissertation aims to bridge the gap between the virtual and physical learning environments through the development of novel haptic devices capable of emulating tactile sensations found in physical science labs. My research explores haptic devices that can emulate the sensations of fluids in vessels within the virtual environment. Fluid handling is a cornerstone experience of science labs. I also explore how to emulate the handling of other science equipment. I describe and research on four novel devices. These are 1) SWISH: A shifting-weight interface of simulated hydrodynamics for haptic perception of virtual fluid vessels, 2) Geppetteau, 3) Vibr-eau, and 4) Pneutouch. SWISH simulates the sensation of virtual fluids in vessels using a rack and pinion mechanism, while Geppetteau employs a string-driven mechanism to provide haptic feedback for a variety of vessel shapes. Vibr-eau utilizes vibrotactile actuators in the vessel’s interior to emulate the behavior of virtual liquids. Finally, Pneutouch enables users to interact with virtual objects through pneumatic inflatables. Through systematic evaluations and comparisons with baseline comparisons, the usability and effectiveness of these haptic devices in enhancing virtual experiences is demonstrated. The development of these haptic mechanisms and interfaces represents a significant step towards creating transformative educational tools that provide customizable, hands-on learning environments in both Mixed (MR) and Virtual Reality (VR) - now called XR. This dissertation contributes to advancing the field of haptics for virtual education and lays the foundation for future research in immersive learning technologies.
ContributorsLiu, Frank (Author) / LiKamWa, Robert (Thesis advisor) / Lahey, Byron (Committee member) / Johnson-Glenberg, Mina (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2024
193470-Thumbnail Image.png
Description
This thesis explores the development and integration of a wrist-worn pneumatic haptic interface, Pneutouch, into multiplayer virtual reality (VR) environments. The study investigates the impact of haptics on multiplayer experiences, with a specific focus on presence, collaboration, and communication. Evaluation and investigation were performed using three mini-games, each targeting specific

This thesis explores the development and integration of a wrist-worn pneumatic haptic interface, Pneutouch, into multiplayer virtual reality (VR) environments. The study investigates the impact of haptics on multiplayer experiences, with a specific focus on presence, collaboration, and communication. Evaluation and investigation were performed using three mini-games, each targeting specific interactions and investigating presence, collaboration, and communication. It was found that haptics enhanced user presence and object realism, increased user seriousness towards tasks, and shifted the focus of interactions from user-user to user-object. In collaborative tasks, haptics increased realism but did not improve efficiency for simple tasks. In communication tasks, a unique interaction modality, termed "haptic mirroring," was introduced, which explored a new form of communication that could be implemented with haptic devices. It was found that with new communication modalities, users experience an associated learning curve. Together, these findings suggest a new set of multiplayer haptic design considerations, such as how haptics increase seriousness, shift focus from social to physical interactions, generally increase realism but decrease task efficiency, and have associated learning curves. These findings contribute to the growing body of research on haptics in VR, particularly in multiplayer settings, and provide insights that can be further investigated or utilized in the implementation of VR experiences.
ContributorsManetta, Mason (Author) / LiKamWa, Robert (Thesis advisor) / Lahey, Byron (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2024
187685-Thumbnail Image.png
Description
Computed tomography (CT) and synthetic aperture sonar (SAS) are tomographic imaging techniques that are fundamental for applications within medical and remote sensing. Despite their successes, a number of factors constrain their image quality. For example, a time-varying scene during measurement acquisition yields image artifacts. Additionally, factors such as bandlimited or

Computed tomography (CT) and synthetic aperture sonar (SAS) are tomographic imaging techniques that are fundamental for applications within medical and remote sensing. Despite their successes, a number of factors constrain their image quality. For example, a time-varying scene during measurement acquisition yields image artifacts. Additionally, factors such as bandlimited or sparse measurements limit image resolution. This thesis presents novel algorithms and techniques to account for these factors during image formation and outperform traditional reconstruction methods. In particular, this thesis formulates analysis-by-synthesis optimizations that leverage neural fields to predict the scene and differentiable physics models that incorporate prior knowledge about image formation. The specific contributions include: (1) a method for reconstructing CT measurements from time-varying (non-stationary) scenes; (2) a method for deconvolving SAS images, which benefits image quality; (3) a method that couples neural fields and a differentiable acoustic model for 3D SAS reconstructions.
ContributorsReed, Albert William (Author) / Jayasuriya, Suren (Thesis advisor) / Brown, Daniel C (Committee member) / Dasarathy, Gautam (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2023