Algorithms for Bayesian Conditional Density Estimation on a Large Dataset

Description
This dissertation considers algorithmic techniques for non- or semi-parametric Bayesian estimation of density functions or conditional density functions. Specifically, compu- tational methods are developed for performing Markov chain Monte Carlo (MCMC) simulation from posterior distributions over density functions when the

This dissertation considers algorithmic techniques for non- or semi-parametric Bayesian estimation of density functions or conditional density functions. Specifically, compu- tational methods are developed for performing Markov chain Monte Carlo (MCMC) simulation from posterior distributions over density functions when the dataset being analyzed is quite large, say, millions of observations on dozens or hundreds of covari- ates. The motivating scientific problem is the relationship between low birth weight and various factors such as maternal attributes and prenatal circumstances.Low birth weight is a critical public health concern, contributing to infant mor- tality and childhood disabilities. The dissertation utilizes birth records to investigate the impact of maternal attributes and prenatal circumstances on birth weight through a statistical method called density regression. However, the challenges arise from the estimation of the density function, the presence of outliers, irregular structures in the data, and the complexity of selecting an appropriate method. To address these challenges, the study employs a Bayesian Gaussian mixture model inspired by kernel density methods. Additionally, it develops a fast MCMC algorithm tailored to handle the computational demands of large datasets. A targeted sample selection procedure is introduced to overcome difficulties in analyzing weakly informative data. To further enhance the study’s approach to addressing challenges, a sophisticated clustering methodology is incorporated. The study leverages the creation of clus- ters based on different sizes, emphasizing the scalability and complexity of cluster formation within a dichotomous state space. Valid clusters, representing unique com- binations of data points from distinct feature states, offer a granular understanding of patterns in the dataset.

Details

Contributors
Date Created
2024
Resource Type
Language
  • eng
Note
  • Partial requirement for: Ph.D., Arizona State University, 2024
  • Field of study: Applied Mathematics

Additional Information

English
Extent
  • 64 pages
Open Access
Peer-reviewed