Matching Items (44)
Filtering by

Clear all filters

171940-Thumbnail Image.png
Description
In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide

In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide which features to extract, which hypothesis class to condition on, which hyperparameters to select, and how to train the model. The design process is iterative with the developer trying different classifiers, feature sets, and hyper-parameters and using cross-validation to pick the model with the lowest error. As there are no guidelines for when to stop searching, developers can continue "optimizing" the model to the point where they begin to "fit to the dataset". These problems are amplified in the active learning setting, where the initial dataset may be unlabeled and label acquisition is costly. The aim in this dissertation is to develop algorithms that provide ML developers with additional information about the complexity of the underlying problem to guide downstream model development. I introduce the concept of "meta-features" - features extracted from a dataset that characterize the complexity of the underlying data generating process. In the context of classification, the complexity of the problem can be characterized by understanding two complementary meta-features: (a) the amount of overlap between classes, and (b) the geometry/topology of the decision boundary. Across three complementary works, I present a series of estimators for the meta-features that characterize overlap and geometry/topology of the decision boundary, and demonstrate how they can be used in algorithm development.
ContributorsLi, Weizhi (Author) / Berisha, Visar (Thesis advisor) / Dasarathy, Gautam (Thesis advisor) / Natesan Ramamurthy, Karthikeyan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2022
Description
Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s

Realistic lighting is important to improve immersion and make mixed reality applications seem more plausible. To properly blend the AR objects in the real scene, it is important to study the lighting of the environment. The existing illuminationframeworks proposed by Google’s ARCore (Google’s Augmented Reality Software Development Kit) and Apple’s ARKit (Apple’s Augmented Reality Software Development Kit) are computationally expensive and have very slow refresh rates, which make them incompatible for dynamic environments and low-end mobile devices. Recently, there have been other illumination estimation frameworks such as GLEAM, Xihe, which aim at providing better illumination with faster refresh rates. GLEAM is an illumination estimation framework that understands the real scene by collecting pixel data from a reflecting spherical light probe. GLEAM uses this data to form environment cubemaps which are later mapped onto a reflection probe to generate illumination for AR objects. It is noticed that from a single viewpoint only one half of the light probe can be observed at a time which does not give complete information about the environment. This leads to the idea of having a multi-viewpoint estimation for better performance. This thesis work analyzes the multi-viewpoint capabilities of AR illumination frameworks that use physical light probes to understand the environment. The current work builds networking using TCP and UDP protocols on GLEAM. This thesis work also documents how processor load sharing has been done while networking devices and how that benefits the performance of GLEAM on mobile devices. Some enhancements using multi-threading have also been made to the already existing GLEAM model to improve its performance.
ContributorsGurram, Sahithi (Author) / LiKamWa, Robert (Thesis advisor) / Jayasuriya, Suren (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2022
161945-Thumbnail Image.png
Description
Statistical Shape Modeling is widely used to study the morphometrics of deformable objects in computer vision and biomedical studies. There are mainly two viewpoints to understand the shapes. On one hand, the outer surface of the shape can be taken as a two-dimensional embedding in space. On the other hand,

Statistical Shape Modeling is widely used to study the morphometrics of deformable objects in computer vision and biomedical studies. There are mainly two viewpoints to understand the shapes. On one hand, the outer surface of the shape can be taken as a two-dimensional embedding in space. On the other hand, the outer surface along with its enclosed internal volume can be taken as a three-dimensional embedding of interests. Most studies focus on the surface-based perspective by leveraging the intrinsic features on the tangent plane. But a two-dimensional model may fail to fully represent the realistic properties of shapes with both intrinsic and extrinsic properties. In this thesis, severalStochastic Partial Differential Equations (SPDEs) are thoroughly investigated and several methods are originated from these SPDEs to try to solve the problem of both two-dimensional and three-dimensional shape analyses. The unique physical meanings of these SPDEs inspired the findings of features, shape descriptors, metrics, and kernels in this series of works. Initially, the data generation of high-dimensional shapes, here, the tetrahedral meshes, is introduced. The cerebral cortex is taken as the study target and an automatic pipeline of generating the gray matter tetrahedral mesh is introduced. Then, a discretized Laplace-Beltrami operator (LBO) and a Hamiltonian operator (HO) in tetrahedral domain with Finite Element Method (FEM) are derived. Two high-dimensional shape descriptors are defined based on the solution of the heat equation and Schrödinger’s equation. Considering the fact that high-dimensional shape models usually contain massive redundancies, and the demands on effective landmarks in many applications, a Gaussian process landmarking on tetrahedral meshes is further studied. A SIWKS-based metric space is used to define a geometry-aware Gaussian process. The study of the periodic potential diffusion process further inspired the idea of a new kernel call the geometry-aware convolutional kernel. A series of Bayesian learning methods are then introduced to tackle the problem of shape retrieval and classification. Experiments of every single item are demonstrated. From the popular SPDE such as the heat equation and Schrödinger’s equation to the general potential diffusion equation and the specific periodic potential diffusion equation, it clearly shows that classical SPDEs play an important role in discovering new features, metrics, shape descriptors and kernels. I hope this thesis could be an example of using interdisciplinary knowledge to solve problems.
ContributorsFan, Yonghui (Author) / Wang, Yalin (Thesis advisor) / Lepore, Natasha (Committee member) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2021
162001-Thumbnail Image.png
Description
Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these trash is a hazardous and time-consuming task. Employing autonomous robots for these tasks is a better approach since it is more efficient and faster than humans. However, for a robot to clean the trash objects, a good detection algorithm is required. Real-time object detection on water surfaces is a challenging issue due to nature of the environment and the volatility of the water surface. In addition to this, running an object detection algorithm on an on-board processor of a robot limits the amount of CPU consumption that the algorithm can utilize. In this thesis, a computationally low cost object detection approach for robust detection of trash objects that was run on an on-board processor of a multirotor is presented. To account for specular reflections on the water surface, we use a polarization filter and integrate a specularity removal algorithm on our approach as well. The challenges faced during testing and the means taken to eliminate those challenges are also discussed. The algorithm was compared with two other object detectors using 4 different metrics. The testing was carried out using videos of 5 different objects collected at different illumination conditions over a lake using a multirotor. The results indicate that our algorithm is much suitable to be employed in real-time since it had the highest processing speed of 21 FPS, the lowest CPU consumption of 37.5\% and considerably high precision and recall values in detecting the object.
ContributorsSyed, Danish Faraaz (Author) / Zhang, Wenlong (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2021
161804-Thumbnail Image.png
Description
The field of Computer Vision has seen great accomplishments in the last decade due to the advancements in Deep Learning. With the advent of Convolutional Neural Networks, the task of image classification has achieved unimaginable success when perceived through the traditional Computer Vision lens. With that being said, the

The field of Computer Vision has seen great accomplishments in the last decade due to the advancements in Deep Learning. With the advent of Convolutional Neural Networks, the task of image classification has achieved unimaginable success when perceived through the traditional Computer Vision lens. With that being said, the state-of-the-art results in the image classification task were produced under a closed set assumption i.e. the input samples and the target datasets have knowledge of class labels in the testing phase. When any real-world scenario is considered, the model encounters unknown instances in the data. The task of identifying these unknown instances is called Open-Set Classification. This dissertation talks about the detection of unknown classes and the classification of the known classes. The problem is approached by using a neural network architecture called Deep Hierarchical Reconstruction Nets (DHRNets). It is dealt with by leveraging the reconstruction part of the DHRNets to identify the known class labels from the data. Experiments were also conducted on Convolutional Neural Networks (CNN) on the basis of softmax probability, Autoencoders on the basis of reconstruction loss, and Mahalanobis distance on CNN's to approach this problem.
ContributorsAinala, Kalyan (Author) / Turaga, Pavan (Thesis advisor) / Moraffah, Bahman (Committee member) / Demakethepalli Venkateswara, Hemanth Kumar (Committee member) / Arizona State University (Publisher)
Created2021
193482-Thumbnail Image.png
Description
The field of unmanned aerial vehicle, or UAV, navigation has been moving towards collision inclusive path planning, yet work has not been done to consider what a UAV is colliding with, and if it should or not. Therefore, there is a need for a framework that allows a UAV to

The field of unmanned aerial vehicle, or UAV, navigation has been moving towards collision inclusive path planning, yet work has not been done to consider what a UAV is colliding with, and if it should or not. Therefore, there is a need for a framework that allows a UAV to consider what is around it and find the best collision candidate. The following work presents a framework that allows UAVs to do so, by considering what an object is and the properties associated with it. Specifically, it considers an object’s material and monetary value to decide if it is good to collide with or not. This information is then published on a binary occupancy map that contains the objects’ size and location with respect to the current position of the UAV. The intent is that the generated binary occupancy map can be used with a path planner to decide what the UAV should collide with. The framework was designed to be as modular as possible and to work with conventional UAV's that have some degree of crash resistance incorporated into their design. The framework was tested by using it to identify various objects that could be collision candidates or not, and then carrying out collisions with some of the objects to test the framework’s accuracy. The purpose of this research was to further the field of collision inclusive path planning by allowing UAVs to know, in a way, what they are intending to collide with and decide if they should or not in order to make safer and more efficient collisions.
ContributorsMolnar, Madelyn Helena (Author) / Zhang, Wenlong (Thesis advisor) / Sugar, Thomas (Committee member) / Guo, Shenghan (Committee member) / Arizona State University (Publisher)
Created2024
187454-Thumbnail Image.png
Description
This dissertation presents novel solutions for improving the generalization capabilities of deep learning based computer vision models. Neural networks are known to suffer a large drop in performance when tested on samples from a different distribution than the one on which they were trained. The proposed solutions, based on latent

This dissertation presents novel solutions for improving the generalization capabilities of deep learning based computer vision models. Neural networks are known to suffer a large drop in performance when tested on samples from a different distribution than the one on which they were trained. The proposed solutions, based on latent space geometry and meta-learning, address this issue by improving the robustness of these models to distribution shifts. Through the use of geometrical alignment, state-of-the-art domain adaptation and source-free test-time adaptation strategies are developed. Additionally, geometrical alignment can allow classifiers to be progressively adapted to new, unseen test domains without requiring retraining of the feature extractors. The dissertation also presents algorithms for enabling in-the-wild generalization without needing access to any samples from the target domain. Other causes of poor generalization, such as data scarcity in critical applications and training data with high levels of noise and variance, are also explored. To address data scarcity in fine-grained computer vision tasks such as object detection, novel context-aware augmentations are suggested. While the first four chapters focus on general-purpose computer vision models, strategies are also developed to improve robustness in specific applications. The efficiency of training autonomous agents for visual navigation is improved by incorporating semantic knowledge, and the integration of domain experts' knowledge allows for the realization of a low-cost, minimally invasive generalizable automated rehabilitation system. Lastly, new tools for explainability and model introspection using counter-factual explainers trained through interval-based uncertainty calibration objectives are presented.
ContributorsThopalli, Kowshik (Author) / Turaga, Pavan (Thesis advisor) / Thiagarajan, Jayaraman J (Committee member) / Li, Baoxin (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2023
187459-Thumbnail Image.png
Description
In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public datasets and advances in deep learning, data-driven approaches in the

In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public datasets and advances in deep learning, data-driven approaches in the computer vision domain have demonstrated superior performance with high adaptability on various data and tasks. Meanwhile, signal processing has long been dominated by techniques derived from rigorous mathematical models built upon prior knowledge of signals. Due to the lack of adaptability to real data and applications, model-based methods often suffer from performance degradation and engineering difficulties. In this dissertation, multiple signal processing problems are studied from vision-inspired data representation and learning perspectives to address the major limitation on adaptability. Corresponding data-driven solutions are proposed to achieve significantly improved performance over conventional solutions. Specifically, in the compressive sensing domain, an open-source image compressive sensing toolbox and benchmark to standardize the implementation and evaluation of reconstruction methods are first proposed. Then a plug-and-play compression ratio adapter is proposed to enable the adaptability of end-to-end data-driven reconstruction methods to variable compression ratios. Lastly, the problem of transfer learning from images to bioelectric signals is experimentally studied to demonstrate the improved performance of data-driven reconstruction. In the image subsampling domain, task-adaptive data-driven image subsampling is studied to reduce data redundancy and retain information of interest simultaneously. In the semiconductor analysis domain, the data-driven automatic error detection problem is studied in the context of integrated circuit segmentation for the first time. In the light detection and ranging(LiDAR) camera calibration domain, the calibration accuracy degradation problem in low-resolution LiDAR scenarios is addressed with data-driven techniques.
ContributorsZhang, Zhikang (Author) / Ren, Fengbo (Thesis advisor) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2023
187693-Thumbnail Image.png
Description
Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world.

Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world. Advanced information is necessary to address these limitations. Fortunately, recent developments in learning-based 3D reconstruction allow robots to not only detect semantic meanings, but also recognize the 3D structure of objects from a few images. By combining this 3D structural information, SLAM can be improved from a low-level approach to a structure-aware approach. This work propose a novel approach for multi-view 3D reconstruction using recurrent transformer. This approach allows robots to accumulate information from multiple views and encode them into a compact latent space. The resulting latent representations are then decoded to produce 3D structural landmarks, which can be used to improve robot localization and mapping.
ContributorsHuang, Chi-Yao (Author) / Yang, Yezhou (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2023
193546-Thumbnail Image.png
Description
In the age of artificial intelligence, Machine Learning (ML) has become a pervasive force, impacting countless aspects of our lives. As ML’s influence expands, concerns about its reliability and trustworthiness have intensified, with security and robustness emerging as significant challenges. For instance, it has been demonstrated that slight perturbations to

In the age of artificial intelligence, Machine Learning (ML) has become a pervasive force, impacting countless aspects of our lives. As ML’s influence expands, concerns about its reliability and trustworthiness have intensified, with security and robustness emerging as significant challenges. For instance, it has been demonstrated that slight perturbations to a stop sign can cause ML classifiers to misidentify it as a speed limit sign, raising concerns about whether ML algorithms are suitable for real-world deployments. To tackle these issues, Responsible Machine Learning (Responsible ML) has emerged with a clear mission: to develop secure and robust ML algorithms. This dissertation aims to develop Responsible Machine Learning algorithms under real-world constraints. Specifically, recognizing the role of adversarial attacks in exposing security vulnerabilities and robustifying the ML methods, it lays down the foundation of Responsible ML by outlining a novel taxonomy of adversarial attacks within real-world settings, categorizing them into black-box target-specific, and target-agnostic attacks. Subsequently, it proposes potent adversarial attacks in each category, aiming to obtain effectiveness and efficiency. Transcending conventional boundaries, it then introduces the notion of causality into Responsible ML (a.k.a., Causal Responsible ML), presenting the causal adversarial attack. This represents the first principled framework to explain the transferability of adversarial attacks to unknown models by identifying their common source of vulnerabilities, thereby exposing the pinnacle of threat and vulnerability: conducting successful attacks on any model with no prior knowledge. Finally, acknowledging the surge of Generative AI, this dissertation explores Responsible ML for Generative AI. It introduces a novel adversarial attack that unveils their adversarial vulnerabilities and devises a strong defense mechanism to bolster the models’ robustness against potential attacks.
ContributorsMoraffah, Raha (Author) / Liu, Huan (Thesis advisor) / Yang, Yezhou (Committee member) / Xiao, Chaowei (Committee member) / Turaga, Pavan (Committee member) / Carley, Kathleen (Committee member) / Arizona State University (Publisher)
Created2024