Search Content

Bayesian Methods for Tuning Hyperparameters of Loss Functions in Machine Learning

Description

The introduction of parameterized loss functions for robustness in machine learning has led to questions as to how hyperparameter(s) of the loss functions can be tuned. This thesis explores how Bayesian methods can be leveraged to tune such hyperparameters. Specifically, a modified Gibbs sampling scheme is used to generate a…

The introduction of parameterized loss functions for robustness in machine learning has led to questions as to how hyperparameter(s) of the loss functions can be tuned. This thesis explores how Bayesian methods can be leveraged to tune such hyperparameters. Specifically, a modified Gibbs sampling scheme is used to generate a distribution of loss parameters of tunable loss functions. The modified Gibbs sampler is a two-block sampler that alternates between sampling the loss parameter and optimizing the other model parameters. The sampling step is performed using slice sampling, while the optimization step is performed using gradient descent. This thesis explores the application of the modified Gibbs sampler to alpha-loss, a tunable loss function with a single parameter $\alpha \in (0,\infty]$, that is designed for the classification setting. Theoretically, it is shown that the Markov chain generated by a modified Gibbs sampling scheme is ergodic; that is, the chain has, and converges to, a unique stationary (posterior) distribution. Further, the modified Gibbs sampler is implemented in two experiments: a synthetic dataset and a canonical image dataset. The results show that the modified Gibbs sampler performs well under label noise, generating a distribution indicating preference for larger values of alpha, matching the outcomes of previous experiments.

ContributorsCole, Erika Lingo (Author) / Sankar, Lalitha (Thesis advisor) / Lan, Shiwei (Thesis advisor) / Pedrielli, Giulia (Committee member) / Hahn, Paul (Committee member) / Arizona State University (Publisher)

Created2022

Recursive Bayesian Estimation on Projective Spaces: Theoretical Foundations and Practical Algorithms

Description

This thesis develops geometrically and statistically rigorous foundations for multivariate analysis and bayesian inference posed on grassmannian manifolds. Requisite to the development of key elements of statistical theory in a geometric realm are closed-form, analytic expressions for many differential geometric objects, e.g., tangent vectors, metrics, geodesics, volume forms. The first…

This thesis develops geometrically and statistically rigorous foundations for multivariate analysis and bayesian inference posed on grassmannian manifolds. Requisite to the development of key elements of statistical theory in a geometric realm are closed-form, analytic expressions for many differential geometric objects, e.g., tangent vectors, metrics, geodesics, volume forms. The first part of this thesis is devoted to a mathematical exposition of these. In particular, it leverages the classical work of Alan James to derive the exterior calculus of differential forms on special grassmannians for invariant measures with respect to which integration is permissible. Motivated by various multi-sensor remote sensing applications, the second part of this thesis describes the problem of recursively estimating the state of a dynamical system propagating on the Grassmann manifold. Fundamental to the bayesian treatment of this problem is the choice of a suitable probability distribution to a priori model the state. Using the Method of Maximum Entropy, a derivation of maximum-entropy probability distributions on the state space that uses the developed geometric theory is characterized. Statistical analyses of these distributions, including parameter estimation, are also presented. These probability distributions and the statistical analysis thereof are original contributions. Using the bayesian framework, two recursive estimation algorithms, both of which rely on noisy measurements on (special cases of) the Grassmann manifold, are the devised and implemented numerically. The first is applied to an idealized scenario, the second to a more practically motivated scenario. The novelty of both of these algorithms lies in the use of thederived maximumentropy probability measures as models for the priors. Numerical simulations demonstrate that, under mild assumptions, both estimation algorithms produce accurate and statistically meaningful outputs. This thesis aims to chart the interface between differential geometry and statistical signal processing. It is my deepest hope that the geometric-statistical approach underlying this work facilitates and encourages the development of new theories and new computational methods in geometry. Application of these, in turn, will bring new insights and bettersolutions to a number of extant and emerging problems in signal processing.

ContributorsCrider, Lauren N (Author) / Cochran, Douglas (Thesis advisor) / Kotschwar, Brett (Committee member) / Scharf, Louis (Committee member) / Taylor, Thomas (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2021

Harnessing Structure in Discrete and Non-convex optimization with applications in online learning, multi-agent systems, and phase retrieval

Description

This thesis examines the critical relationship between data, complex models, and other methods to measure and analyze them. As models grow larger and more intricate, they require more data, making it vital to use that data effectively. The document starts with a deep dive into nonconvex functions, a fundamental element…

This thesis examines the critical relationship between data, complex models, and other methods to measure and analyze them. As models grow larger and more intricate, they require more data, making it vital to use that data effectively. The document starts with a deep dive into nonconvex functions, a fundamental element of modern complex systems, identifying key conditions that ensure these systems can be analyzed efficiently—a crucial consideration in an era of vast amounts of variables. Loss functions, traditionally seen as mere optimization tools, are analyzed and recast as measures of how accurately a model reflects reality. This redefined perspective permits the refinement of data-sourcing strategies for a better data economy. The aim of the investigation is the model itself, which is used to understand and harness the underlying patterns of complex systems. By incorporating structure both implicitly (through periodic patterns) and explicitly (using graphs), the model's ability to make sense of the data is enhanced. Moreover, online learning principles are applied to a crucial practical scenario: robotic resource monitoring. The results established in this thesis, backed by simulations and theoretical proofs, highlight the advantages of online learning methods over traditional ones commonly used in robotics. In sum, this thesis presents an integrated approach to measuring complex systems, providing new insights and methods that push forward the capabilities of machine learning.

ContributorsThaker, Parth Kashyap (Author) / Dasarathy, Gautam (Thesis advisor) / Sankar, Lalitha (Committee member) / Nedich, Angelia (Committee member) / Michelusi, Nicolò (Committee member) / Arizona State University (Publisher)

Created2024

Development and analysis of stochastic boundary coverage strategies for multi-robot systems

Description

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions…

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions or features of interest and achieve target objectives that derive from their resulting spatial configurations, such as forming a connected communication network or acquiring sensor data around the entire boundary. We refer to this spatial allocation problem as boundary coverage. Possible swarm tasks that will involve boundary coverage include cooperative load manipulation for applications in construction, manufacturing, and disaster response.

In this work, I address the challenges of controlling a swarm of resource-constrained robots to achieve boundary coverage, which I refer to as the problem of stochastic boundary coverage. I first examined an instance of this behavior in the biological phenomenon of group food retrieval by desert ants, and developed a hybrid dynamical system model of this process from experimental data. Subsequently, with the aid of collaborators, I used a continuum abstraction of swarm population dynamics, adapted from a modeling framework used in chemical kinetics, to derive stochastic robot control policies that drive a swarm to target steady-state allocations around multiple boundaries in a way that is robust to environmental variations.

Next, I determined the statistical properties of the random graph that is formed by a group of robots, each with the same capabilities, that have attached to a boundary at random locations. I also computed the probability density functions (pdfs) of the robot positions and inter-robot distances for this case.

I then extended this analysis to cases in which the robots have heterogeneous communication/sensing radii and attach to a boundary according to non-uniform, non-identical pdfs. I proved that these more general coverage strategies generate random graphs whose probability of connectivity is Sharp-P Hard to compute. Finally, I investigated possible approaches to validating our boundary coverage strategies in multi-robot simulations with realistic Wi-fi communication.

ContributorsPeruvemba Kumar, Ganesh (Author) / Berman, Spring M (Thesis advisor) / Fainekos, Georgios (Thesis advisor) / Bazzi, Rida (Committee member) / Syrotiuk, Violet (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)

Created2016

Statistical and dynamical modeling of Riemannian trajectories with application to human movement analysis

Description

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon…

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.

ContributorsAnirudh, Rushil (Author) / Turaga, Pavan (Thesis advisor) / Cochran, Douglas (Committee member) / Runger, George C. (Committee member) / Taylor, Thomas (Committee member) / Arizona State University (Publisher)

Created2016

ASU Electronic Theses and Dissertations

Filtering by

Bayesian Methods for Tuning Hyperparameters of Loss Functions in Machine Learning

Recursive Bayesian Estimation on Projective Spaces: Theoretical Foundations and Practical Algorithms

Harnessing Structure in Discrete and Non-convex optimization with applications in online learning, multi-agent systems, and phase retrieval

Development and analysis of stochastic boundary coverage strategies for multi-robot systems

Statistical and dynamical modeling of Riemannian trajectories with application to human movement analysis