Matching Items (61)
Filtering by

Clear all filters

149503-Thumbnail Image.png
Description
The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images captured from a camera mounted on it. VO offers a

The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images captured from a camera mounted on it. VO offers a cheap and relatively accurate alternative to conventional odometry techniques like wheel odometry, inertial measurement systems and global positioning system (GPS). This thesis implements and analyzes the performance of a two camera based VO called Stereo based visual odometry (SVO) in presence of various deterrent factors like shadows, extremely bright outdoors, wet conditions etc... To allow the implementation of VO on any generic vehicle, a discussion on porting of the VO algorithm to android handsets is presented too. The SVO is implemented in three steps. In the first step, a dense disparity map for a scene is computed. To achieve this we utilize sum of absolute differences technique for stereo matching on rectified and pre-filtered stereo frames. Epipolar geometry is used to simplify the matching problem. The second step involves feature detection and temporal matching. Feature detection is carried out by Harris corner detector. These features are matched between two consecutive frames using the Lucas-Kanade feature tracker. The 3D co-ordinates of these matched set of features are computed from the disparity map obtained from the first step and are mapped into each other by a translation and a rotation. The rotation and translation is computed using least squares minimization with the aid of Singular Value Decomposition. Random Sample Consensus (RANSAC) is used for outlier detection. This comprises the third step. The accuracy of the algorithm is quantified based on the final position error, which is the difference between the final position computed by the SVO algorithm and the final ground truth position as obtained from the GPS. The SVO showed an error of around 1% under normal conditions for a path length of 60 m and around 3% in bright conditions for a path length of 130 m. The algorithm suffered in presence of shadows and vibrations, with errors of around 15% and path lengths of 20 m and 100 m respectively.
ContributorsDhar, Anchit (Author) / Saripalli, Srikanth (Thesis advisor) / Li, Baoxin (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2010
171997-Thumbnail Image.png
Description
In the recent years, deep learning has gained popularity for its ability to be utilized for several computer vision applications without any apriori knowledge. However, to introduce better inductive bias incorporating prior knowledge along with learnedinformation is critical. To that end, human intervention including choice of algorithm, data and model

In the recent years, deep learning has gained popularity for its ability to be utilized for several computer vision applications without any apriori knowledge. However, to introduce better inductive bias incorporating prior knowledge along with learnedinformation is critical. To that end, human intervention including choice of algorithm, data and model in deep learning pipelines can be considered a prior. Thus, it is extremely important to select effective priors for a given application. This dissertation explores different aspects of a deep learning pipeline and provides insights as to why a particular prior is effective for the corresponding application. For analyzing the effect of model priors, three applications which involvesequential modelling problems i.e. Audio Source Separation, Clinical Time-series (Electroencephalogram (EEG)/Electrocardiogram(ECG)) based Differential Diagnosis and Global Horizontal Irradiance Forecasting for Photovoltaic (PV) Applications are chosen. For data priors, the application of image classification is chosen and a new algorithm titled,“Invenio” that can effectively use data semantics for both task and distribution shift scenarios is proposed. Finally, the effectiveness of a data selection prior is shown using the application of object tracking wherein the aim is to maintain the tracking performance while prolonging the battery usage of image sensors by optimizing the data selected for reading from the environment. For every research contribution of this dissertation, several empirical studies are conducted on benchmark datasets. The proposed design choices demonstrate significant performance improvements in comparison to the existing application specific state-of-the-art deep learning strategies.
ContributorsKatoch, Sameeksha (Author) / Spanias, Andreas (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Thiagarajan, Jayaraman J. (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)
Created2022
164535-Thumbnail Image.png
Description

Speedsolving, the art of solving twisty puzzles like the Rubik's Cube as fast as possible, has recently benefitted from the arrival of smartcubes which have special hardware for tracking the cube's face turns and transmitting them via Bluetooth. However, due to their embedded electronics, existing smartcubes cannot be used in

Speedsolving, the art of solving twisty puzzles like the Rubik's Cube as fast as possible, has recently benefitted from the arrival of smartcubes which have special hardware for tracking the cube's face turns and transmitting them via Bluetooth. However, due to their embedded electronics, existing smartcubes cannot be used in competition, reducing their utility in personal speedcubing practice. This thesis proposes a sound-based design for tracking the face turns of a standard, non-smart speedcube consisting of an audio processing receiver in software and a small physical speaker configured as a transmitter. Special attention has been given to ensuring that installing the transmitter requires only a reversible centercap replacement on the original cube. This allows the cube to benefit from smartcube features during practice, while still maintaining compliance with competition regulations. Within a controlled test environment, the software receiver perfectly detected a variety of transmitted move sequences. Furthermore, all components required for the physical transmitter were demonstrated to fit within the centercap of a Gans 356 speedcube.

ContributorsHale, Joseph (Author) / Heinrichs, Robert (Thesis director) / Li, Baoxin (Committee member) / Barrett, The Honors College (Contributor) / Software Engineering (Contributor) / School of International Letters and Cultures (Contributor)
Created2022-05
193509-Thumbnail Image.png
Description
In the rapidly evolving field of computer vision, propelled by advancements in deeplearning, the integration of hardware-software co-design has become crucial to overcome the limitations of traditional imaging systems. This dissertation explores the integration of hardware-software co-design in computational imaging, particularly in light transport acquisition and Non-Line-of-Sight (NLOS) imaging. By leveraging projector-camera systems and

In the rapidly evolving field of computer vision, propelled by advancements in deeplearning, the integration of hardware-software co-design has become crucial to overcome the limitations of traditional imaging systems. This dissertation explores the integration of hardware-software co-design in computational imaging, particularly in light transport acquisition and Non-Line-of-Sight (NLOS) imaging. By leveraging projector-camera systems and computational techniques, this thesis address critical challenges in imaging complex environments, such as adverse weather conditions, low-light scenarios, and the imaging of reflective or transparent objects. The first contribution in this thesis is the theory, design, and implementation of a slope disparity gating system, which is a vertically aligned configuration of a synchronized raster scanning projector and rolling-shutter camera, facilitating selective imaging through disparity-based triangulation. This system introduces a novel, hardware-oriented approach to selective imaging, circumventing the limitations of post-capture processing. The second contribution of this thesis is the realization of two innovative approaches for spotlight optimization to improve localization and tracking for NLOS imaging. The first approach utilizes radiosity-based optimization to improve 3D localization and object identification for small-scale laboratory settings. The second approach introduces a learningbased illumination network along with a differentiable renderer and NLOS estimation network to optimize human 2D localization and activity recognition. This approach is validated on a large, room-scale scene with complex line-of-sight geometries and occluders. The third contribution of this thesis is an attention-based neural network for passive NLOS settings where there is no controllable illumination. The thesis demonstrates realtime, dynamic NLOS human tracking where the camera is moving on a mobile robotic platform. In addition, this thesis contains an appendix featuring temporally consistent relighting for portrait videos with applications in computer graphics and vision.
ContributorsChandran, Sreenithy (Author) / Jayasuriya, Suren (Thesis advisor) / Turaga, Pavan (Committee member) / Dasarathy, Gautam (Committee member) / Kubo, Hiroyuki (Committee member) / Arizona State University (Publisher)
Created2024
189305-Thumbnail Image.png
Description
Quantum computing has the potential to revolutionize the signal-processing field by providing more efficient methods for analyzing signals. This thesis explores the application of quantum computing in signal analysis synthesis for compression applications. More specifically, the study focuses on two key approaches: quantum Fourier transform (QFT) and quantum linear prediction

Quantum computing has the potential to revolutionize the signal-processing field by providing more efficient methods for analyzing signals. This thesis explores the application of quantum computing in signal analysis synthesis for compression applications. More specifically, the study focuses on two key approaches: quantum Fourier transform (QFT) and quantum linear prediction (QLP). The research is motivated by the potential advantages offered by quantum computing in massive signal processing tasks and presents novel quantum circuit designs for QFT, quantum autocorrelation, and QLP, enabling signal analysis synthesis using quantum algorithms. The two approaches are explained as follows. The Quantum Fourier transform (QFT) demonstrates the potential for improved speed in quantum computing compared to classical methods. This thesis focuses on quantum encoding of signals and designing quantum algorithms for signal analysis synthesis, and signal compression using QFTs. Comparative studies are conducted to evaluate quantum computations for Fourier transform applications, considering Signal-to-Noise-Ratio results. The effects of qubit precision and quantum noise are also analyzed. The QFT algorithm is also developed in the J-DSP simulation environment, providing hands-on laboratory experiences for signal-processing students. User-friendly simulation programs on QFT-based signal analysis synthesis using peak picking, and perceptual selection using psychoacoustics in the J-DSP are developed. Further, this research is extended to analyze the autocorrelation of the signal using QFTs and develop a quantum linear prediction (QLP) algorithm for speech processing applications. QFTs and IQFTs are used to compute the quantum autocorrelation of the signal, and the HHL algorithm is modified and used to compute the solutions of the linear equations using quantum computing. The performance of the QLP algorithm is evaluated for system identification, spectral estimation, and speech analysis synthesis, and comparisons are performed for QLP and CLP results. The results demonstrate the following: effective quantum circuits for accurate QFT-based speech analysis synthesis, evaluation of performance with quantum noise, design of accurate quantum autocorrelation, and development of a modified HHL algorithm for efficient QLP. Overall, this thesis contributes to the research on quantum computing for signal processing applications and provides a foundation for further exploration of quantum algorithms for signal analysis synthesis.
ContributorsSharma, Aradhita (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2023
191748-Thumbnail Image.png
Description
Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however, have several challenges that need to be addressed to realize

Millimeter-wave (mmWave) and sub-terahertz (sub-THz) systems aim to utilize the large bandwidth available at these frequencies. This has the potential to enable several future applications that require high data rates, such as autonomous vehicles and digital twins. These systems, however, have several challenges that need to be addressed to realize their gains in practice. First, they need to deploy large antenna arrays and use narrow beams to guarantee sufficient receive power. Adjusting the narrow beams of the large antenna arrays incurs massive beam training overhead. Second, the sensitivity to blockages is a key challenge for mmWave and THz networks. Since these networks mainly rely on line-of-sight (LOS) links, sudden link blockages highly threaten the reliability of the networks. Further, when the LOS link is blocked, the network typically needs to hand off the user to another LOS basestation, which may incur critical time latency, especially if a search over a large codebook of narrow beams is needed. A promising way to tackle both these challenges lies in leveraging additional side information such as visual, LiDAR, radar, and position data. These sensors provide rich information about the wireless environment, which can be utilized for fast beam and blockage prediction. This dissertation presents a machine-learning framework for sensing-aided beam and blockage prediction. In particular, for beam prediction, this work proposes to utilize visual and positional data to predict the optimal beam indices. For the first time, this work investigates the sensing-aided beam prediction task in a real-world vehicle-to-infrastructure and drone communication scenario. Similarly, for blockage prediction, this dissertation proposes a multi-modal wireless communication solution that utilizes bimodal machine learning to perform proactive blockage prediction and user hand-off. Evaluations on both real-world and synthetic datasets illustrate the promising performance of the proposed solutions and highlight their potential for next-generation communication and sensing systems.
ContributorsCharan, Gouranga (Author) / Alkhateeb, Ahmed (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Turaga, Pavan (Committee member) / Michelusi, Nicolò (Committee member) / Arizona State University (Publisher)
Created2024
156802-Thumbnail Image.png
Description
Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees

Human movement is a complex process influenced by physiological and psychological factors. The execution of movement is varied from person to person, and the number of possible strategies for completing a specific movement task is almost infinite. Different choices of strategies can be perceived by humans as having different degrees of quality, and the quality can be defined with regard to aesthetic, athletic, or health-related ratings. It is useful to measure and track the quality of a person's movements, for various applications, especially with the prevalence of low-cost and portable cameras and sensors today. Furthermore, based on such measurements, feedback systems can be designed for people to practice their movements towards certain goals. In this dissertation, I introduce symmetry as a family of measures for movement quality, and utilize recent advances in computer vision and differential geometry to model and analyze different types of symmetry in human movements. Movements are modeled as trajectories on different types of manifolds, according to the representations of movements from sensor data. The benefit of such a universal framework is that it can accommodate different existing and future features that describe human movements. The theory and tools developed in this dissertation will also be useful in other scientific areas to analyze symmetry from high-dimensional signals.
ContributorsWang, Qiao (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Srivastava, Anuj (Committee member) / Sha, Xin Wei (Committee member) / Arizona State University (Publisher)
Created2018
Description
Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and

Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power.

We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.
ContributorsGunnam, Sridhar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2018
157215-Thumbnail Image.png
Description
Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven

Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumina-

tion source is a challenging task with vital applications including surveillance and robotics.

Recent NLOS reconstruction advances have been achieved using time-resolved measure-

ments. Acquiring these time-resolved measurements requires expensive and specialized

detectors and laser sources. In work proposes a data-driven approach for NLOS 3D local-

ization requiring only a conventional camera and projector. The localisation is performed

using a voxelisation and a regression problem. Accuracy of greater than 90% is achieved

in localizing a NLOS object to a 5cm × 5cm × 5cm volume in real data. By adopting

the regression approach an object of width 10cm to localised to approximately 1.5cm. To

generalize to line-of-sight (LOS) scenes with non-planar surfaces, an adaptive lighting al-

gorithm is adopted. This algorithm, based on radiosity, identifies and illuminates scene

patches in the LOS which most contribute to the NLOS light paths, and can factor in sys-

tem power constraints. Improvements ranging from 6%-15% in accuracy with a non-planar

LOS wall using adaptive lighting is reported, demonstrating the advantage of combining

the physics of light transport with active illumination for data-driven NLOS imaging.
ContributorsChandran, Sreenithy (Author) / Jayasuriya, Suren (Thesis advisor) / Turaga, Pavan (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)
Created2019
157645-Thumbnail Image.png
Description
Disentangling latent spaces is an important research direction in the interpretability of unsupervised machine learning. Several recent works using deep learning are very effective at producing disentangled representations. However, in the unsupervised setting, there is no way to pre-specify which part of the latent space captures specific factors of

Disentangling latent spaces is an important research direction in the interpretability of unsupervised machine learning. Several recent works using deep learning are very effective at producing disentangled representations. However, in the unsupervised setting, there is no way to pre-specify which part of the latent space captures specific factors of variations. While this is generally a hard problem because of the non-existence of analytical expressions to capture these variations, there are certain factors like geometric

transforms that can be expressed analytically. Furthermore, in existing frameworks, the disentangled values are also not interpretable. The focus of this work is to disentangle these geometric factors of variations (which turn out to be nuisance factors for many applications) from the semantic content of the signal in an interpretable manner which in turn makes the features more discriminative. Experiments are designed to show the modularity of the approach with other disentangling strategies as well as on multiple one-dimensional (1D) and two-dimensional (2D) datasets, clearly indicating the efficacy of the proposed approach.
ContributorsKoneripalli Seetharam, Kaushik (Author) / Turaga, Pavan (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2019