Matching Items (6)

Filtering by

Clear all filters

151957-Thumbnail Image.png

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

Contributors

Agent

Created

Date Created
2013

149971-Thumbnail Image.png

The sensitivity of confirmatory factor analytic fit indices to violations of factorial invariance across latent classes: a simulation study

Description

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as those denoted by sex or ethnicity. While establishing factorial invariance

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as those denoted by sex or ethnicity. While establishing factorial invariance across observed groups is a requisite step in making meaningful cross-group comparisons, failure to attend to possible sources of latent class heterogeneity in the form of class-based differences in factor structure has the potential to compromise conclusions with respect to observed groups and may result in misguided attempts at instrument development and theory refinement. The present studies examined the sensitivity of two widely used confirmatory factor analytic model fit indices, the chi-square test of model fit and RMSEA, to latent class differences in factor structure. Two primary questions were addressed. The first of these concerned the impact of latent class differences in factor loadings with respect to model fit in a single sample reflecting a mixture of classes. The second question concerned the impact of latent class differences in configural structure on tests of factorial invariance across observed groups. The results suggest that both indices are highly insensitive to class-based differences in factor loadings. Across sample size conditions, models with medium (0.2) sized loading differences were rejected by the chi-square test of model fit at rates just slightly higher than the nominal .05 rate of rejection that would be expected under a true null hypothesis. While rates of rejection increased somewhat when the magnitude of loading difference increased, even the largest sample size with equal class representation and the most extreme violations of loading invariance only had rejection rates of approximately 60%. RMSEA was also insensitive to class-based differences in factor loadings, with mean values across conditions suggesting a degree of fit that would generally be regarded as exceptionally good in practice. In contrast, both indices were sensitive to class-based differences in configural structure in the context of a multiple group analysis in which each observed group was a mixture of classes. However, preliminary evidence suggests that this sensitivity may contingent on the form of the cross-group model misspecification.

Contributors

Agent

Created

Date Created
2011

150618-Thumbnail Image.png

Regression analysis of grouped counts and frequencies using the generalized linear model

Description

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.

Contributors

Agent

Created

Date Created
2012

154939-Thumbnail Image.png

Performance of contextual multilevel models for comparing between-person and within-person effects

Description

The comparison of between- versus within-person relations addresses a central issue in psychological research regarding whether group-level relations among variables generalize to individual group members. Between- and within-person effects may differ in magnitude as well as direction, and contextual multilevel

The comparison of between- versus within-person relations addresses a central issue in psychological research regarding whether group-level relations among variables generalize to individual group members. Between- and within-person effects may differ in magnitude as well as direction, and contextual multilevel models can accommodate this difference. Contextual multilevel models have been explicated mostly for cross-sectional data, but they can also be applied to longitudinal data where level-1 effects represent within-person relations and level-2 effects represent between-person relations. With longitudinal data, estimating the contextual effect allows direct evaluation of whether between-person and within-person effects differ. Furthermore, these models, unlike single-level models, permit individual differences by allowing within-person slopes to vary across individuals. This study examined the statistical performance of the contextual model with a random slope for longitudinal within-person fluctuation data.

A Monte Carlo simulation was used to generate data based on the contextual multilevel model, where sample size, effect size, and intraclass correlation (ICC) of the predictor variable were varied. The effects of simulation factors on parameter bias, parameter variability, and standard error accuracy were assessed. Parameter estimates were in general unbiased. Power to detect the slope variance and contextual effect was over 80% for most conditions, except some of the smaller sample size conditions. Type I error rates for the contextual effect were also high for some of the smaller sample size conditions. Conclusions and future directions are discussed.

Contributors

Agent

Created

Date Created
2016

156112-Thumbnail Image.png

Examining dose-response effects in randomized experiments with partial adherence

Description

Understanding how adherence affects outcomes is crucial when developing and assigning interventions. However, interventions are often evaluated by conducting randomized experiments and estimating intent-to-treat effects, which ignore actual treatment received. Dose-response effects can supplement intent-to-treat effects when participants

Understanding how adherence affects outcomes is crucial when developing and assigning interventions. However, interventions are often evaluated by conducting randomized experiments and estimating intent-to-treat effects, which ignore actual treatment received. Dose-response effects can supplement intent-to-treat effects when participants are offered the full dose but many only receive a partial dose due to nonadherence. Using these data, we can estimate the magnitude of the treatment effect at different levels of adherence, which serve as a proxy for different levels of treatment. In this dissertation, I conducted Monte Carlo simulations to evaluate when linear dose-response effects can be accurately and precisely estimated in randomized experiments comparing a no-treatment control condition to a treatment condition with partial adherence. Specifically, I evaluated the performance of confounder adjustment and instrumental variable methods when their assumptions were met (Study 1) and when their assumptions were violated (Study 2). In Study 1, the confounder adjustment and instrumental variable methods provided unbiased estimates of the dose-response effect across sample sizes (200, 500, 2,000) and adherence distributions (uniform, right skewed, left skewed). The adherence distribution affected power for the instrumental variable method. In Study 2, the confounder adjustment method provided unbiased or minimally biased estimates of the dose-response effect under no or weak (but not moderate or strong) unobserved confounding. The instrumental variable method provided extremely biased estimates of the dose-response effect under violations of the exclusion restriction (no direct effect of treatment assignment on the outcome), though less severe violations of the exclusion restriction should be investigated.

Contributors

Agent

Created

Date Created
2018

156631-Thumbnail Image.png

Comparison of methods for estimating longitudinal indirect effects

Description

Mediation analysis is used to investigate how an independent variable, X, is related to an outcome variable, Y, through a mediator variable, M (MacKinnon, 2008). If X represents a randomized intervention it is difficult to make a cause and effect

Mediation analysis is used to investigate how an independent variable, X, is related to an outcome variable, Y, through a mediator variable, M (MacKinnon, 2008). If X represents a randomized intervention it is difficult to make a cause and effect inference regarding indirect effects without making no unmeasured confounding assumptions using the potential outcomes framework (Holland, 1988; MacKinnon, 2008; Robins & Greenland, 1992; VanderWeele, 2015), using longitudinal data to determine the temporal order of M and Y (MacKinnon, 2008), or both. The goals of this dissertation were to (1) define all indirect and direct effects in a three-wave longitudinal mediation model using the causal mediation formula (Pearl, 2012), (2) analytically compare traditional estimators (ANCOVA, difference score, and residualized change score) to the potential outcomes-defined indirect effects, and (3) use a Monte Carlo simulation to compare the performance of regression and potential outcomes-based methods for estimating longitudinal indirect effects and apply the methods to an empirical dataset. The results of the causal mediation formula revealed the potential outcomes definitions of indirect effects are equivalent to the product of coefficient estimators in a three-wave longitudinal mediation model with linear and additive relations. It was demonstrated with analytical comparisons that the ANCOVA, difference score, and residualized change score models’ estimates of two time-specific indirect effects differ as a function of the respective mediator-outcome relations at each time point. The traditional model that performed the best in terms of the evaluation criteria in the Monte Carlo study was the ANCOVA model and the potential outcomes model that performed the best in terms of the evaluation criteria was sequential G-estimation. Implications and future directions are discussed.

Contributors

Agent

Created

Date Created
2018