Search Content

Learning Causality with Networked Observational Data

Description

This dissertation considers the question of how convenient access to copious networked observational data impacts our ability to learn causal knowledge. It investigates in what ways learning causality from such data is different from -- or the same as -- the traditional causal inference which often deals with small scale…

This dissertation considers the question of how convenient access to copious networked observational data impacts our ability to learn causal knowledge. It investigates in what ways learning causality from such data is different from -- or the same as -- the traditional causal inference which often deals with small scale i.i.d. data collected from randomized controlled trials? For example, how can we exploit network information for a series of tasks in the area of learning causality? To answer this question, the dissertation is written toward developing a suite of novel causal learning algorithms that offer actionable insights for a series of causal inference tasks with networked observational data. The work aims to benefit real-world decision-making across a variety of highly influential applications. In the first part of this dissertation, it investigates the task of inferring individual-level causal effects from networked observational data. First, it presents a representation balancing-based framework for handling the influence of hidden confounders to achieve accurate estimates of causal effects. Second, it extends the framework with an adversarial learning approach to properly combine two types of existing heuristics: representation balancing and treatment prediction. The second part of the dissertation describes a framework for counterfactual evaluation of treatment assignment policies with networked observational data. A novel framework that captures patterns of hidden confounders is developed to provide more informative input for downstream counterfactual evaluation methods. The third part presents a framework for debiasing two-dimensional grid-based e-commerce search with observational search log data where there is an implicit network connecting neighboring products in a search result page. A novel inverse propensity scoring framework that models user behavior patterns for two-dimensional display in e-commerce websites is developed, which aims to optimize online performance of ranking algorithms with offline log data.

ContributorsGuo, Ruocheng (Author) / Liu, Huan (Thesis advisor) / Candan, K. Selcuk (Committee member) / Xue, Guoliang (Committee member) / Kiciman, Emre (Committee member) / Arizona State University (Publisher)

Created2021

Tree Ensemble Algorithms for Causal Machine Learning

Description

This dissertation centers on treatment effect estimation in the field of causal inference, and aims to expand the toolkit for effect estimation when the treatment variable is binary. Two new stochastic tree-ensemble methods for treatment effect estimation in the continuous outcome setting are presented. The Accelerated Bayesian Causal Forrest (XBCF)…

This dissertation centers on treatment effect estimation in the field of causal inference, and aims to expand the toolkit for effect estimation when the treatment variable is binary. Two new stochastic tree-ensemble methods for treatment effect estimation in the continuous outcome setting are presented. The Accelerated Bayesian Causal Forrest (XBCF) model handles variance via a group-specific parameter, and the Heteroskedastic version of XBCF (H-XBCF) uses a separate tree ensemble to learn covariate-dependent variance. This work also contributes to the field of survival analysis by proposing a new framework for estimating survival probabilities via density regression. Within this framework, the Heteroskedastic Accelerated Bayesian Additive Regression Trees (H-XBART) model, which is also developed as part of this work, is utilized in treatment effect estimation for right-censored survival outcomes. All models have been implemented as part of the XBART R package, and their performance is evaluated via extensive simulation studies with appropriate sets of comparators. The contributed methods achieve similar levels of performance, while being orders of magnitude (sometimes as much as 100x) faster than comparator state-of-the-art methods, thus offering an exciting opportunity for treatment effect estimation in the large data setting.

ContributorsKrantsevich, Nikolay (Author) / Hahn, P Richard (Thesis advisor) / McCulloch, Robert (Committee member) / Zhou, Shuang (Committee member) / Lan, Shiwei (Committee member) / He, Jingyu (Committee member) / Arizona State University (Publisher)

Created2023

Machine Learning and Causal Inference: Theory, Examples, and Computational Results

Description

This dissertation covers several topics in machine learning and causal inference. First, the question of “feature selection,” a common byproduct of regularized machine learning methods, is investigated theoretically in the context of treatment effect estimation. This involves a detailed review and extension of frameworks for estimating causal effects and in-depth…

This dissertation covers several topics in machine learning and causal inference. First, the question of “feature selection,” a common byproduct of regularized machine learning methods, is investigated theoretically in the context of treatment effect estimation. This involves a detailed review and extension of frameworks for estimating causal effects and in-depth theoretical study. Next, various computational approaches to estimating causal effects with machine learning methods are compared with these theoretical desiderata in mind. Several improvements to current methods for causal machine learning are identified and compelling angles for further study are pinpointed. Finally, a common method used for “explaining” predictions of machine learning algorithms, SHAP, is evaluated critically through a statistical lens.

ContributorsHerren, Andrew (Author) / Hahn, P Richard (Thesis advisor) / Kao, Ming-Hung (Committee member) / Lopes, Hedibert (Committee member) / McCulloch, Robert (Committee member) / Zhou, Shuang (Committee member) / Arizona State University (Publisher)

Created2023

Case Studies in Machine Learning of Reduced Form Models for Causal Inference

Description

This dissertation develops versatile modeling tools to estimate causal effects when conditional unconfoundedness is not immediately satisfied. Chapter 2 provides a brief overview ofcommon techniques in causal inference, with a focus on models relevant to the data explored in later chapters. The rest of the dissertation focuses on the development of…

This dissertation develops versatile modeling tools to estimate causal effects when conditional unconfoundedness is not immediately satisfied. Chapter 2 provides a brief overview ofcommon techniques in causal inference, with a focus on models relevant to the data explored in later chapters. The rest of the dissertation focuses on the development of novel “reduced form” models which are designed to assess the particular challenges of different datasets. Chapter 3 explores the question of whether or not forecasts of bankruptcy cause bankruptcy. The question arises from the observation that companies issued going concern opinions were more likely to go bankrupt in the following year, leading people to speculate that the opinions themselves caused the bankruptcy via a “self-fulfilling prophecy”. A Bayesian machine learning sensitivity analysis is developed to answer this question. In exchange for additional flexibility and fewer assumptions, this approach loses point identification of causal effects and thus a sensitivity analysis is developed to study a wide range of plausible scenarios of the causal effect of going concern opinions on bankruptcy. Reported in the simulations are different performance metrics of the model in comparison with other popular methods and a robust analysis of the sensitivity of the model to mis-specification. Results on empirical data indicate that forecasts of bankruptcies likely do have a small causal effect. Chapter 4 studies the effects of vaccination on COVID-19 mortality at the state level in the United States. The dynamic nature of the pandemic complicates more straightforward regression adjustments and invalidates many alternative models. The chapter comments on the limitations of mechanistic approaches as well as traditional statistical methods to epidemiological data. Instead, a state space model is developed that allows the study of the ever-changing dynamics of the pandemic’s progression. In the first stage, the model decomposes the observed mortality data into component surges, and later uses this information in a semi-parametric regression model for causal analysis. Results are investigated thoroughly for empirical justification and stress-tested in simulated settings.

ContributorsPapakostas, Demetrios (Author) / Hahn, Paul (Thesis advisor) / McCulloch, Robert (Committee member) / Zhou, Shuang (Committee member) / Kao, Ming-Hung (Committee member) / Lan, Shiwei (Committee member) / Arizona State University (Publisher)

Created2023

Computational Challenges in BART Modeling: Extrapolation, Classification, and Causal Inference

Description

This dissertation centers on Bayesian Additive Regression Trees (BART) and Accelerated BART (XBART) and presents a series of models that tackle extrapolation, classification, and causal inference challenges. To improve extrapolation in tree-based models, I propose a method called local Gaussian Process (GP) that combines Gaussian process regression with trained BART…

This dissertation centers on Bayesian Additive Regression Trees (BART) and Accelerated BART (XBART) and presents a series of models that tackle extrapolation, classification, and causal inference challenges. To improve extrapolation in tree-based models, I propose a method called local Gaussian Process (GP) that combines Gaussian process regression with trained BART trees. This allows for extrapolation based on the most relevant data points and covariate variables determined by the trees' structure. The local GP technique is extended to the Bayesian causal forest (BCF) models to address the positivity violation issue in causal inference. Additionally, I introduce the LongBet model to estimate time-varying, heterogeneous treatment effects in panel data. Furthermore, I present a Poisson-based model, with a modified likelihood for XBART for the multi-class classification problem.

ContributorsWang, Meijia (Author) / Hahn, Paul (Thesis advisor) / He, Jingyu (Committee member) / Lan, Shiwei (Committee member) / McCulloch, Robert (Committee member) / Zhou, Shuang (Committee member) / Arizona State University (Publisher)

Created2024

Filtering by

Learning Causality with Networked Observational Data

Tree Ensemble Algorithms for Causal Machine Learning

Machine Learning and Causal Inference: Theory, Examples, and Computational Results

Case Studies in Machine Learning of Reduced Form Models for Causal Inference

Computational Challenges in BART Modeling: Extrapolation, Classification, and Causal Inference