Search Content

Software-Defined Imaging for Embedded Computer Vision: Adaptive Subsampling and Event-based Visual Navigation

Description

Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain…

Huge advancements have been made over the years in terms of modern image-sensing hardware and visual computing algorithms (e.g. computer vision, image processing, computational photography). However, to this day, there still exists a current gap between the hardware and software design in an imaging system, which silos one research domain from another. Bridging this gap is the key to unlocking new visual computing capabilities for end applications in commercial photography, industrial inspection, and robotics. This thesis explores avenues where hardware-software co-design of image sensors can be leveraged to replace conventional hardware components in an imaging system with software for enhanced reconfigurability. As a result, the user can program the image sensor in a way best suited to the end application. This is referred to as software-defined imaging (SDI), where image sensor behavior can be altered by the system software depending on the user's needs. The scope of this thesis covers the development and deployment of SDI algorithms for low-power computer vision. Strategies for sparse spatial sampling have been developed in this thesis for power optimization of the vision sensor. This dissertation shows how a hardware-compatible state-of-the-art object tracker can be coupled with a Kalman filter for energy gains at the sensor level. Extensive experiments reveal how adaptive spatial sampling of image frames with this hardware-friendly framework offers attractive energy-accuracy tradeoffs. Another thrust of this thesis is to demonstrate the benefits of reinforcement learning in this research avenue. A major finding reported in this dissertation shows how neural-network-based reinforcement learning can be exploited for the adaptive subsampling framework to achieve improved sampling performance, thereby optimizing the energy efficiency of the image sensor. The last thrust of this thesis is to leverage emerging event-based SDI technology for building a low-power navigation system. A homography estimation pipeline has been proposed in this thesis which couples the right data representation with a differential scale-invariant feature transform (SIFT) module to extract rich visual cues from event streams. Positional encoding is leveraged with a multilayer perceptron (MLP) network to get robust homography estimation from event data.

ContributorsIqbal, Odrika (Author) / Jayasuriya, Suren (Thesis advisor) / Spanias, Andreas (Thesis advisor) / LiKamWa, Robert (Committee member) / Owens, Chris (Committee member) / Arizona State University (Publisher)

Created2023

Building Reliable and Robust Deep Neural Networks with Improved Representations using Model Distillation and Deep Constraints

Description

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other…

This thesis encompasses a comprehensive research effort dedicated to overcoming the critical bottlenecks that hinder the current generation of neural networks, thereby significantly advancing their reliability and performance. Deep neural networks, with their millions of parameters, suffer from over-parameterization and lack of constraints, leading to limited generalization capabilities. In other words, the complex architecture and millions of parameters present challenges in finding the right balance between capturing useful patterns and avoiding noise in the data. To address these issues, this thesis explores novel solutions based on knowledge distillation, enabling the learning of robust representations. Leveraging the capabilities of large-scale networks, effective learning strategies are developed. Moreover, the limitations of dependency on external networks in the distillation process, which often require large-scale models, are effectively overcome by proposing a self-distillation strategy. The proposed approach empowers the model to generate high-level knowledge within a single network, pushing the boundaries of knowledge distillation. The effectiveness of the proposed method is not only demonstrated across diverse applications, including image classification, object detection, and semantic segmentation but also explored in practical considerations such as handling data scarcity and assessing the transferability of the model to other learning tasks. Another major obstacle hindering the development of reliable and robust models lies in their black-box nature, impeding clear insights into the contributions toward the final predictions and yielding uninterpretable feature representations. To address this challenge, this thesis introduces techniques that incorporate simple yet powerful deep constraints rooted in Riemannian geometry. These constraints confer geometric qualities upon the latent representation, thereby fostering a more interpretable and insightful representation. In addition to its primary focus on general tasks like image classification and activity recognition, this strategy offers significant benefits in real-world applications where data scarcity is prevalent. Moreover, its robustness in feature removal showcases its potential for edge applications. By successfully tackling these challenges, this research contributes to advancing the field of machine learning and provides a foundation for building more reliable and robust systems across various application domains.

ContributorsChoi, Hongjun (Author) / Turaga, Pavan (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Wenwen (Committee member) / Fazli, Pooyan (Committee member) / Arizona State University (Publisher)

Created2023

Learning Robust and Repeatable Speech Features for Clinical Applications

Description

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on…

Speech analysis for clinical applications has emerged as a burgeoning field, providing valuable insights into an individual's physical and physiological state. Researchers have explored speech features for clinical applications, such as diagnosing, predicting, and monitoring various pathologies. Before presenting the new deep learning frameworks, this thesis introduces a study on conventional acoustic feature changes in subjects with post-traumatic headache (PTH) attributed to mild traumatic brain injury (mTBI). This work demonstrates the effectiveness of using speech signals to assess the pathological status of individuals. At the same time, it highlights some of the limitations of conventional acoustic and linguistic features, such as low repeatability and generalizability. Two critical characteristics of speech features are (1) good robustness, as speech features need to generalize across different corpora, and (2) high repeatability, as speech features need to be invariant to all confounding factors except the pathological state of targets. This thesis presents two research thrusts in the context of speech signals in clinical applications that focus on improving the robustness and repeatability of speech features, respectively. The first thrust introduces a deep learning framework to generate acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss combined with a classification loss is used to train the model jointly, and data-warping techniques are employed to improve the robustness of embeddings. Empirical results demonstrate that the proposed method achieves high in-corpus and cross-corpus classification accuracy and generates good embeddings sensitive to voice quality and robust across different corpora. The second thrust introduces using the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. A novel regularizer, the ICC regularizer, is proposed to regularize deep neural networks to produce embeddings with higher repeatability. This ICC regularizer is implemented and applied to three speech applications: a clinical application, speaker verification, and voice style conversion. The experimental results reveal that the ICC regularizer improves the repeatability of learned embeddings compared to the contrastive loss, leading to enhanced performance in downstream tasks.

ContributorsZhang, Jianwei (Author) / Jayasuriya, Suren (Thesis advisor) / Berisha, Visar (Thesis advisor) / Liss, Julie (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2023

Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition

Description

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use…

Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements.

First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance.

Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model.

Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.

ContributorsDodge, Samuel Fuller (Author) / Karam, Lina (Thesis advisor) / Jayasuriya, Suren (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2018

Confocal Laser Endomicroscopy Image Analysis with Deep Convolutional Neural Networks

Description

Rapid intraoperative diagnosis of brain tumors is of great importance for planning treatment and guiding the surgeon about the extent of resection. Currently, the standard for the preliminary intraoperative tissue analysis is frozen section biopsy that has major limitations such as tissue freezing and cutting artifacts, sampling errors, lack of…

Rapid intraoperative diagnosis of brain tumors is of great importance for planning treatment and guiding the surgeon about the extent of resection. Currently, the standard for the preliminary intraoperative tissue analysis is frozen section biopsy that has major limitations such as tissue freezing and cutting artifacts, sampling errors, lack of immediate interaction between the pathologist and the surgeon, and time consuming.

Handheld, portable confocal laser endomicroscopy (CLE) is being explored in neurosurgery for its ability to image histopathological features of tissue at cellular resolution in real time during brain tumor surgery. Over the course of examination of the surgical tumor resection, hundreds to thousands of images may be collected. The high number of images requires significant time and storage load for subsequent reviewing, which motivated several research groups to employ deep convolutional neural networks (DCNNs) to improve its utility during surgery. DCNNs have proven to be useful in natural and medical image analysis tasks such as classification, object detection, and image segmentation.

This thesis proposes using DCNNs for analyzing CLE images of brain tumors. Particularly, it explores the practicality of DCNNs in three main tasks. First, off-the shelf DCNNs were used to classify images into diagnostic and non-diagnostic. Further experiments showed that both ensemble modeling and transfer learning improved the classifier’s accuracy in evaluating the diagnostic quality of new images at test stage. Second, a weakly-supervised learning pipeline was developed for localizing key features of diagnostic CLE images from gliomas. Third, image style transfer was used to improve the diagnostic quality of CLE images from glioma tumors by transforming the histology patterns in CLE images of fluorescein sodium-stained tissue into the ones in conventional hematoxylin and eosin-stained tissue slides.

These studies suggest that DCNNs are opted for analysis of CLE images. They may assist surgeons in sorting out the non-diagnostic images, highlighting the key regions and enhancing their appearance through pattern transformation in real time. With recent advances in deep learning such as generative adversarial networks and semi-supervised learning, new research directions need to be followed to discover more promises of DCNNs in CLE image analysis.

ContributorsIzady Yazdanabadi, Mohammadhassan (Author) / Preul, Mark (Thesis advisor) / Yang, Yezhou (Thesis advisor) / Nakaji, Peter (Committee member) / Vernon, Brent (Committee member) / Arizona State University (Publisher)

Created2019

Improved techniques for cardiovascular flow experiments

Description

Aortic pathologies such as coarctation, dissection, and aneurysm represent a

particularly emergent class of cardiovascular diseases and account for significant cardiovascular morbidity and mortality worldwide. Computational simulations of aortic flows are growing increasingly important as tools for gaining understanding of these pathologies and for planning their surgical repair. In vitro experiments…

Aortic pathologies such as coarctation, dissection, and aneurysm represent a

particularly emergent class of cardiovascular diseases and account for significant cardiovascular morbidity and mortality worldwide. Computational simulations of aortic flows are growing increasingly important as tools for gaining understanding of these pathologies and for planning their surgical repair. In vitro experiments are required to validate these simulations against real world data, and a pulsatile flow pump system can provide physiologic flow conditions characteristic of the aorta.

This dissertation presents improved experimental techniques for in vitro aortic blood flow and the increasingly larger parts of the human cardiovascular system. Specifically, this work develops new flow management and measurement techniques for cardiovascular flow experiments with the aim to improve clinical evaluation and treatment planning of aortic diseases.

The hypothesis of this research is that transient flow driven by a step change in volume flux in a piston-based pulsatile flow pump system behaves differently from transient flow driven by a step change in pressure gradient, the development time being substantially reduced in the former. Due to this difference in behavior, the response to a piston-driven pump can be predicted in order to establish inlet velocity and flow waveforms at a downstream phantom model.

The main objectives of this dissertation were: 1) to design, construct, and validate a piston-based flow pump system for aortic flow experiments, 2) to characterize temporal and spatial development of start-up flows driven by a piston pump that produces a step change from zero flow to a constant volume flux in realistic (finite) tube geometries for physiologic Reynolds numbers, and 3) to develop a method to predict downstream velocity and flow waveforms at the inlet of an aortic phantom model and determine the input waveform needed to achieve the intended waveform at the test section. Application of these newly improved flow management tools and measurement techniques were then demonstrated through in vitro experiments in patient-specific coarctation of aorta flow phantom models manufactured in-house and compared to computational simulations to inform and execute future experiments and simulations.

ContributorsChaudhury, Rafeed Ahmed (Author) / Frakes, David (Thesis advisor) / Adrian, Ronald J (Thesis advisor) / Vernon, Brent (Committee member) / Pizziconi, Vincent (Committee member) / Caplan, Michael (Committee member) / Arizona State University (Publisher)

Created2015

Characterization of the effects of cerebral aneurysm geometry on hemodynamics and endovascular treatment outcomes

Description

Cerebral aneurysms are pathological balloonings of blood vessels in the brain, commonly found in the arterial network at the base of the brain. Cerebral aneurysm rupture can lead to a dangerous medical condition, subarachnoid hemorrhage, that is associated with high rates of morbidity and mortality. Effective evaluation and management of…

Cerebral aneurysms are pathological balloonings of blood vessels in the brain, commonly found in the arterial network at the base of the brain. Cerebral aneurysm rupture can lead to a dangerous medical condition, subarachnoid hemorrhage, that is associated with high rates of morbidity and mortality. Effective evaluation and management of cerebral aneurysms is therefore essential to public health. The goal of treating an aneurysm is to isolate the aneurysm from its surrounding circulation, thereby preventing further growth and rupture. Endovascular treatment for cerebral aneurysms has gained popularity over traditional surgical techniques due to its minimally invasive nature and shorter associated recovery time. The hemodynamic modifications that the treatment effects can promote thrombus formation within the aneurysm leading to eventual isolation. However, different treatment devices can effect very different hemodynamic outcomes in aneurysms with different geometries.

Currently, cerebral aneurysm risk evaluation and treatment planning in clinical practice is largely based on geometric features of the aneurysm including the dome size, dome-to-neck ratio, and parent vessel geometry. Hemodynamics, on the other hand, although known to be deeply involved in cerebral aneurysm initiation and progression, are considered to a lesser degree. Previous work in the field of biofluid mechanics has demonstrated that geometry is a driving factor behind aneurysmal hemodynamics.

The goal of this research is to develop a more combined geometric/hemodynamic basis for informing clinical decisions. Geometric main effects were analyzed to quantify contributions made by geometric factors that describe cerebral aneurysms (i.e., dome size, dome-to-neck ratio, and inflow angle) to clinically relevant hemodynamic responses (i.e., wall shear stress, root mean square velocity magnitude and cross-neck flow). Computational templates of idealized bifurcation and sidewall aneurysms were created to satisfy a two-level full factorial design, and examined using computational fluid dynamics. A subset of the computational bifurcation templates was also translated into physical models for experimental validation using particle image velocimetry. The effects of geometry on treatment were analyzed by virtually treating the aneurysm templates with endovascular devices. The statistical relationships between geometry, treatment, and flow that emerged have the potential to play a valuable role in clinical practice.

ContributorsNair, Priya (Author) / Frakes, David (Thesis advisor) / Vernon, Brent (Committee member) / Chong, Brian (Committee member) / Pizziconi, Vincent (Committee member) / Adrian, Ronald (Committee member) / Arizona State University (Publisher)

Created2016

Modeling cardiac function with particle image velocimetry

Description

The application of novel visualization and modeling methods to the study of cardiovascular disease is vital to the development of innovative diagnostic techniques, including those that may aid in the early detection and prevention of cardiovascular disorders. This dissertation focuses on the application of particle image velocimetry (PIV) to the…

The application of novel visualization and modeling methods to the study of cardiovascular disease is vital to the development of innovative diagnostic techniques, including those that may aid in the early detection and prevention of cardiovascular disorders. This dissertation focuses on the application of particle image velocimetry (PIV) to the study of intracardiac hemodynamics. This is accomplished primarily though the use of ultrasound based PIV, which allows for in vivo visualization of intracardiac flow without the requirement for optical access, as is required with traditional camera-based PIV methods.

The fundamentals of ultrasound PIV are introduced, including experimental methods for its implementation as well as a discussion on estimating and mitigating measurement error. Ultrasound PIV is then compared to optical PIV; this is a highly developed technique with proven accuracy; through rigorous examination it has become the “gold standard” of two-dimensional flow visualization. Results show good agreement between the two methods.

Using a mechanical left heart model, a multi-plane ultrasound PIV technique is introduced and applied to quantify a complex, three-dimensional flow that is analogous to the left intraventricular flow. Changes in ventricular flow dynamics due to the rotational orientation of mechanical heart valves are studied; the results demonstrate the importance of multi-plane imaging techniques when trying to assess the strongly three-dimensional intraventricular flow.

The potential use of ultrasound PIV as an early diagnosis technique is demonstrated through the development of a novel elasticity estimation technique. A finite element analysis routine is couple with an ensemble Kalman filter to allow for the estimation of material elasticity using forcing and displacement data derived from PIV. Results demonstrate that it is possible to estimate elasticity using forcing data derived from a PIV vector field, provided vector density is sufficient.

ContributorsWesterdale, John Curtis (Author) / Adrian, Ronald (Thesis advisor) / Belohlavek, Marek (Committee member) / Squires, Kyle (Committee member) / Trimble, Steve (Committee member) / Frakes, David (Committee member) / Arizona State University (Publisher)

Created2015

Robust Deep Learning Through Selective Feature Regeneration.

Description

In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of image distortion such as blur and additive noise during image…

In recent years, the widespread use of deep neural networks (DNNs) has facilitated great improvements in performance for computer vision tasks like image classification and object recognition. In most realistic computer vision applications, an input image undergoes some form of image distortion such as blur and additive noise during image acquisition or transmission. Deep networks trained on pristine images perform poorly when tested on such distortions. DNN predictions have also been shown to be vulnerable to carefully crafted adversarial perturbations. Specifically, so-called universal adversarial perturbations are image-agnostic perturbations that can be added to any image and can fool a target network into making erroneous predictions. This work proposes selective DNN feature regeneration to improve the robustness of existing DNNs to image distortions and universal adversarial perturbations.

In the context of common naturally occurring image distortions, a metric is proposed to identify the most susceptible DNN convolutional filters and rank them in order of the highest gain in classification accuracy upon correction. The proposed approach called DeepCorrect applies small stacks of convolutional layers with residual connections at the output of these ranked filters and trains them to correct the most distortion-affected filter activations, whilst leaving the rest of the pre-trained filter outputs in the network unchanged. Performance results show that applying DeepCorrect models for common vision tasks significantly improves the robustness of DNNs against distorted images and outperforms other alternative approaches.

In the context of universal adversarial perturbations, departing from existing defense strategies that work mostly in the image domain, a novel and effective defense which only operates in the DNN feature domain is presented. This approach identifies pre-trained convolutional features that are most vulnerable to adversarial perturbations and deploys trainable feature regeneration units which transform these DNN filter activations into resilient features that are robust to universal perturbations. Regenerating only the top 50% adversarially susceptible activations in at most 6 DNN layers and leaving all remaining DNN activations unchanged can outperform existing defense strategies across different network architectures and across various universal attacks.

ContributorsBorkar, Tejas Shyam (Author) / Karam, Lina J (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Arizona State University (Publisher)

Created2020

Building Constraints, Geometric Invariants and Interpretability in Deep Learning: Applications in Computational Imaging and Vision

Description

Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is…

Over the last decade, deep neural networks also known as deep learning, combined with large databases and specialized hardware for computation, have made major strides in important areas such as computer vision, computational imaging and natural language processing. However, such frameworks currently suffer from some drawbacks. For example, it is generally not clear how the architectures are to be designed for different applications, or how the neural networks behave under different input perturbations and it is not easy to make the internal representations and parameters more interpretable. In this dissertation, I propose building constraints into feature maps, parameters and and design of algorithms involving neural networks for applications in low-level vision problems such as compressive imaging and multi-spectral image fusion, and high-level inference problems including activity and face recognition. Depending on the application, such constraints can be used to design architectures which are invariant/robust to certain nuisance factors, more efficient and, in some cases, more interpretable. Through extensive experiments on real-world datasets, I demonstrate these advantages of the proposed methods over conventional frameworks.

ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by