Search Content

Modeling and Exploiting the Structure of Data via Meta-Features for Robust and Efficient Machine Learning

Description

In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide…

In the standard pipeline for machine learning model development, several design decisions are made largely based on trial and error. Take the classification problem as an example. The starting point for classifier design is a dataset with samples from the classes of interest. From this, the algorithm developer must decide which features to extract, which hypothesis class to condition on, which hyperparameters to select, and how to train the model. The design process is iterative with the developer trying different classifiers, feature sets, and hyper-parameters and using cross-validation to pick the model with the lowest error. As there are no guidelines for when to stop searching, developers can continue "optimizing" the model to the point where they begin to "fit to the dataset". These problems are amplified in the active learning setting, where the initial dataset may be unlabeled and label acquisition is costly. The aim in this dissertation is to develop algorithms that provide ML developers with additional information about the complexity of the underlying problem to guide downstream model development. I introduce the concept of "meta-features" - features extracted from a dataset that characterize the complexity of the underlying data generating process. In the context of classification, the complexity of the problem can be characterized by understanding two complementary meta-features: (a) the amount of overlap between classes, and (b) the geometry/topology of the decision boundary. Across three complementary works, I present a series of estimators for the meta-features that characterize overlap and geometry/topology of the decision boundary, and demonstrate how they can be used in algorithm development.

ContributorsLi, Weizhi (Author) / Berisha, Visar (Thesis advisor) / Dasarathy, Gautam (Thesis advisor) / Natesan Ramamurthy, Karthikeyan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2022

Quantum Scattering and Machine Learning in Dirac Materials

Description

A remarkable phenomenon in contemporary physics is quantum scarring in classically chaoticsystems, where the wave functions tend to concentrate on classical periodic orbits. Quantum scarring has been studied for more than four decades, but the problem of efficiently detecting quantum scars has remained to be challenging, relying mostly on human visualization of wave…

A remarkable phenomenon in contemporary physics is quantum scarring in classically chaoticsystems, where the wave functions tend to concentrate on classical periodic orbits. Quantum scarring has been studied for more than four decades, but the problem of efficiently detecting quantum scars has remained to be challenging, relying mostly on human visualization of wave function patterns. This paper develops a machine learning approach to detecting quantum scars in an automated and highly efficient manner. In particular, this paper exploits Meta learning. The first step is to construct a few-shot classification algorithm, under the requirement that the one-shot classification accuracy be larger than 90%. Then propose a scheme based on a combination of neural networks to improve the accuracy. This paper shows that the machine learning scheme can find the correct quantum scars from thousands images of wave functions, without any human intervention, regardless of the symmetry of the underlying classical system. This will be the first application of Meta learning to quantum systems. Interacting spin networks are fundamental to quantum computing. Data-based tomography oftime-independent spin networks has been achieved, but an open challenge is to ascertain the structures of time-dependent spin networks using time series measurements taken locally from a small subset of the spins. Physically, the dynamical evolution of a spin network under time-dependent driving or perturbation is described by the Heisenberg equation of motion. Motivated by this basic fact, this paper articulates a physics-enhanced machine learning framework whose core is Heisenberg neural networks. This paper demonstrates that, from local measurements, not only the local Hamiltonian can be recovered but the Hamiltonian reflecting the interacting structure of the whole system can also be faithfully reconstructed. Using Heisenberg neural machine on spin networks of a variety of structures. In the extreme case where measurements are taken from only one spin, the achieved tomography fidelity values can reach about 90%. The developed machine learning framework is applicable to any time-dependent systems whose quantum dynamical evolution is governed by the Heisenberg equation of motion.

ContributorsHan, Chendi (Author) / Lai, Ying-Cheng (Thesis advisor) / Yu, Hongbin (Committee member) / Dasarathy, Gautam (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2022

Text to Speech: Extension to Text to Braille Project

Description

Visual impairment is a significant challenge that affects millions of people worldwide. Access to written text, such as books, documents, and other printed materials, can be particularly difficult for individuals with visual impairments. In order to address this issue, our project aims to develop a text-to-Braille and speech translating device…

Visual impairment is a significant challenge that affects millions of people worldwide. Access to written text, such as books, documents, and other printed materials, can be particularly difficult for individuals with visual impairments. In order to address this issue, our project aims to develop a text-to-Braille and speech translating device that will help people with visual impairments to access written text more easily and independently.

ContributorsNguyen, Vu (Author) / Yu, Hongbin (Thesis director) / Dasarathy, Gautam (Committee member) / Barrett, The Honors College (Contributor) / Electrical Engineering Program (Contributor)

Created2023-05

Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient

Description

Deep neural networks (DNNs) have had tremendous success in a variety of

statistical learning applications due to their vast expressive power. Most

applications run DNNs on the cloud on parallelized architectures. There is a need

for for efficient DNN inference on edge with low precision hardware and analog

accelerators. To make trained models more…

Deep neural networks (DNNs) have had tremendous success in a variety of

statistical learning applications due to their vast expressive power. Most

applications run DNNs on the cloud on parallelized architectures. There is a need

for for efficient DNN inference on edge with low precision hardware and analog

accelerators. To make trained models more robust for this setting, quantization and

analog compute noise are modeled as weight space perturbations to DNNs and an

information theoretic regularization scheme is used to penalize the KL-divergence

between perturbed and unperturbed models. This regularizer has similarities to

both natural gradient descent and knowledge distillation, but has the advantage of

explicitly promoting the network to and a broader minimum that is robust to

weight space perturbations. In addition to the proposed regularization,

KL-divergence is directly minimized using knowledge distillation. Initial validation

on FashionMNIST and CIFAR10 shows that the information theoretic regularizer

and knowledge distillation outperform existing quantization schemes based on the

straight through estimator or L2 constrained quantization.

ContributorsKadambi, Pradyumna (Author) / Berisha, Visar (Thesis advisor) / Dasarathy, Gautam (Committee member) / Seo, Jae-Sun (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2019

Quantifying Information Leakage via Adversarial Loss Functions: Theory and Practice

Description

Modern digital applications have significantly increased the leakage of private and sensitive personal data. While worst-case measures of leakage such as Differential Privacy (DP) provide the strongest guarantees, when utility matters, average-case information-theoretic measures can be more relevant. However, most such information-theoretic measures do not have clear operational meanings. This…

Modern digital applications have significantly increased the leakage of private and sensitive personal data. While worst-case measures of leakage such as Differential Privacy (DP) provide the strongest guarantees, when utility matters, average-case information-theoretic measures can be more relevant. However, most such information-theoretic measures do not have clear operational meanings. This dissertation addresses this challenge.

This work introduces a tunable leakage measure called maximal $\alpha$-leakage which quantifies the maximal gain of an adversary in inferring any function of a data set. The inferential capability of the adversary is modeled by a class of loss functions, namely, $\alpha$-loss. The choice of $\alpha$ determines specific adversarial actions ranging from refining a belief for $\alpha =1$ to guessing the best posterior for $\alpha = \infty$, and for the two specific values maximal $\alpha$-leakage simplifies to mutual information and maximal leakage, respectively. Maximal $\alpha$-leakage is proved to have a composition property and be robust to side information.

There is a fundamental disjoint between theoretical measures of information leakages and their applications in practice. This issue is addressed in the second part of this dissertation by proposing a data-driven framework for learning Censored and Fair Universal Representations (CFUR) of data. This framework is formulated as a constrained minimax optimization of the expected $\alpha$-loss where the constraint ensures a measure of the usefulness of the representation. The performance of the CFUR framework with $\alpha=1$ is evaluated on publicly accessible data sets; it is shown that multiple sensitive features can be effectively censored to achieve group fairness via demographic parity while ensuring accuracy for several \textit{a priori} unknown downstream tasks.

Finally, focusing on worst-case measures, novel information-theoretic tools are used to refine the existing relationship between two such measures, $(\epsilon,\delta)$-DP and R\'enyi-DP. Applying these tools to the moments accountant framework, one can track the privacy guarantee achieved by adding Gaussian noise to Stochastic Gradient Descent (SGD) algorithms. Relative to state-of-the-art, for the same privacy budget, this method allows about 100 more SGD rounds for training deep learning models.

ContributorsLiao, Jiachun (Author) / Sankar, Lalitha (Thesis advisor) / Kosut, Oliver (Committee member) / Zhang, Junshan (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)

Created2020

Machine Learning-based Analysis of the Relationship Between the Human Gut Microbiome and Bone Health

Description

The Human Gut Microbiome (GM) modulates a variety of structural, metabolic, and protective functions to benefit the host. A few recent studies also support the role of the gut microbiome in the regulation of bone health. The relationship between GM and bone health was analyzed based on the data collected…

The Human Gut Microbiome (GM) modulates a variety of structural, metabolic, and protective functions to benefit the host. A few recent studies also support the role of the gut microbiome in the regulation of bone health. The relationship between GM and bone health was analyzed based on the data collected from a group of twenty-three adolescent boys and girls who participated in a controlled feeding study, during which two different doses (0 g/d fiber and 12 g/d fiber) of Soluble Corn Fiber (SCF) were added to their diet. This analysis was performed by predicting measures of Bone Mineral Density (BMD) and Bone Mineral Content (BMC) which are indicators of bone strength, using the GM sequence of proportions of 178 microbes collected from 23 subjects, by building a machine learning regression model. The model developed was evaluated by calculating performance metrics such as Root Mean Squared Error, Pearson’s correlation coefficient, and Spearman’s rank correlation coefficient, using cross-validation. A noticeable correlation was observed between the GM and bone health, and it was observed that the overall prediction correlation was higher with SCF intervention (r ~ 0.51). The genera of microbes that played an important role in this relationship were identified. Eubacterium (g), Bacteroides (g), Megamonas (g), Acetivibrio (g), Faecalibacterium (g), and Paraprevotella (g) were some of the microbes that showed an increase in proportion with SCF intervention.

ContributorsKetha Hazarath, Pravallika Reddy (Author) / Bliss, Daniel (Thesis advisor) / Whisner, Corrie (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)

Created2020

Differentiable Programming for Physics-based Hyperspectral Unmixing

Description

Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visible-to-infrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this thesis, physics-based approaches…

Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visible-to-infrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this thesis, physics-based approaches are incorporated into an end-to-end spectral unmixing algorithm via differentiable programming. First, sparse regularization and constraints are implemented by adding differentiable penalty terms to a cost function to avoid unrealistic predictions. Secondly, a physics-based dispersion model is introduced to simulate realistic spectral variation, and an efficient method to fit the parameters is presented. Then, this dispersion model is utilized as a generative model within an analysis-by-synthesis spectral unmixing algorithm. Further, a technique for inverse rendering using a convolutional neural network to predict parameters of the generative model is introduced to enhance performance and speed when training data are available. Results achieve state-of-the-art on both infrared and visible-to-near-infrared (VNIR) datasets as compared to baselines, and show promise for the synergy between physics-based models and deep learning in hyperspectral unmixing in the future.

ContributorsJaniczek, John (Author) / Jayasuriya, Suren (Thesis advisor) / Dasarathy, Gautam (Thesis advisor) / Christensen, Phil (Committee member) / Arizona State University (Publisher)

Created2020

Bayesian nonparametric modeling and inference for multiple object tracking

Description

The problem of multiple object tracking seeks to jointly estimate the time-varying cardinality and trajectory of each object. There are numerous challenges that are encountered in tracking multiple objects including a time-varying number of measurements, under varying constraints, and environmental conditions. In this thesis, the proposed statistical methods integrate the…

The problem of multiple object tracking seeks to jointly estimate the time-varying cardinality and trajectory of each object. There are numerous challenges that are encountered in tracking multiple objects including a time-varying number of measurements, under varying constraints, and environmental conditions. In this thesis, the proposed statistical methods integrate the use of physical-based models with Bayesian nonparametric methods to address the main challenges in a tracking problem. In particular, Bayesian nonparametric methods are exploited to efficiently and robustly infer object identity and learn time-dependent cardinality; together with Bayesian inference methods, they are also used to associate measurements to objects and estimate the trajectory of objects. These methods differ from the current methods to the core as the existing methods are mainly based on random finite set theory.

The first contribution proposes dependent nonparametric models such as the dependent Dirichlet process and the dependent Pitman-Yor process to capture the inherent time-dependency in the problem at hand. These processes are used as priors for object state distributions to learn dependent information between previous and current time steps. Markov chain Monte Carlo sampling methods exploit the learned information to sample from posterior distributions and update the estimated object parameters.

The second contribution proposes a novel, robust, and fast nonparametric approach based on a diffusion process over infinite random trees to infer information on object cardinality and trajectory. This method follows the hierarchy induced by objects entering and leaving a scene and the time-dependency between unknown object parameters. Markov chain Monte Carlo sampling methods integrate the prior distributions over the infinite random trees with time-dependent diffusion processes to update object states.

The third contribution develops the use of hierarchical models to form a prior for statistically dependent measurements in a single object tracking setup. Dependency among the sensor measurements provides extra information which is incorporated to achieve the optimal tracking performance. The hierarchical Dirichlet process as a prior provides the required flexibility to do inference. Bayesian tracker is integrated with the hierarchical Dirichlet process prior to accurately estimate the object trajectory.

The fourth contribution proposes an approach to model both the multiple dependent objects and multiple dependent measurements. This approach integrates the dependent Dirichlet process modeling over the dependent object with the hierarchical Dirichlet process modeling of the measurements to fully capture the dependency among both object and measurements. Bayesian nonparametric models can successfully associate each measurement to the corresponding object and exploit dependency among them to more accurately infer the trajectory of objects. Markov chain Monte Carlo methods amalgamate the dependent Dirichlet process with the hierarchical Dirichlet process to infer the object identity and object cardinality.

Simulations are exploited to demonstrate the improvement in multiple object tracking performance when compared to approaches that are developed based on random finite set theory.

ContributorsMoraffah, Bahman (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Bliss, Daniel W. (Committee member) / Richmond, Christ D. (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)

Created2019

Mean Field Games for Continuous Time Density Dependent Markov Chains

Description

The seminal work of Lasry and Lion showed the existence of Nash equilibria in thecontinuum limit of agents who try to optimize their own utility functions. However, a lot of work in this region is predicated on strong assumptions on the asymptotic independence of the agents and their homogeneity. This work explores…

The seminal work of Lasry and Lion showed the existence of Nash equilibria in thecontinuum limit of agents who try to optimize their own utility functions. However, a lot of work in this region is predicated on strong assumptions on the asymptotic independence of the agents and their homogeneity. This work explores the existence of Equilibria under the limit for Markov Decision Processes for density dependent continuous time Markov chains. Under suitable conditions it is possible to show that the empirical measure of the agents converges in finite time to a time invariant distribution which makes the solution of the MDP tractable. This key step allows one to show not only the existence of equilibria for these MDPs without asymptotic independence but also a tractable means to find said equilibria. Finally, this work shows that a fixed point does exist in the in finite state limit. However, to show that such a limit is indeed a Nash equilibrium remains an open problem.

ContributorsNarasimha, Dheeraj (Author) / Ying, Lei (Thesis advisor) / Dasarathy, Gautam (Thesis advisor) / Liu, Yongmin (Committee member) / Shakkottai, Srinivas (Committee member) / Arizona State University (Publisher)

Created2021

Bayesian Nonparametric Reinforcement Learning in LTE and Wi-Fi Coexistence

Description

With the formation of next generation wireless communication, a growing number of new applications like internet of things, autonomous car, and drone is crowding the unlicensed spectrum. Licensed network such as LTE also comes to the unlicensed spectrum for better providing high-capacity contents with low cost. However, LTE was not…

With the formation of next generation wireless communication, a growing number of new applications like internet of things, autonomous car, and drone is crowding the unlicensed spectrum. Licensed network such as LTE also comes to the unlicensed spectrum for better providing high-capacity contents with low cost. However, LTE was not designed for sharing spectrum with others. A cooperation center for these networks is costly because they possess heterogeneous properties and everyone can enter and leave the spectrum unrestrictedly, so the design will be challenging. Since it is infeasible to incorporate potentially infinite scenarios with one unified design, an alternative solution is to let each network learn its own coexistence policy. Previous solutions only work on fixed scenarios. In this work we present a reinforcement learning algorithm to cope with the coexistence between Wi-Fi and LTE-LAA agents in 5 GHz unlicensed spectrum. The coexistence problem was modeled as a Dec-POMDP and Bayesian approach was adopted for policy learning with nonparametric prior to accommodate the uncertainty of policy for different agents. A fairness measure was introduced in the reward function to encourage fair sharing between agents. We turned the reinforcement learning into an optimization problem by transforming the value function as likelihood and variational inference for posterior approximation. Simulation results demonstrate that this algorithm can reach high value with compact policy representations, and stay computationally efficient when applying to agent set.

ContributorsSHIH, PO-KAN (Author) / Moraffah, Bahman (Thesis advisor) / Papandreou-Suppappola, Antonia (Thesis advisor) / Dasarathy, Gautam (Committee member) / Shih, YiChang (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by