Search Content

Impact of violations of longitudinal measurement invariance in latent growth models and autoregressive quasi-simplex models

Description

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using composites of items is that smaller sample sizes are required…

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using composites of items is that smaller sample sizes are required in contrast to second order models that include the measurement and the structural relationships among the variables. However, the use of composites assumes that longitudinal measurement invariance holds; that is, it is assumed that that the relationships among the items and the latent variables remain constant over time. Previous studies conducted on latent growth models (LGM) have shown that when longitudinal metric invariance is violated, the parameter estimates are biased and that mistaken conclusions about growth can be made. The purpose of the current study was to examine the impact of non-invariant loadings and non-invariant intercepts on two longitudinal models: the LGM and the autoregressive quasi-simplex model (AR quasi-simplex). A second purpose was to determine if there are conditions in which researchers can reach adequate conclusions about stability and growth even in the presence of violations of invariance. A Monte Carlo simulation study was conducted to achieve the purposes. The method consisted of generating items under a linear curve of factors model (COFM) or under the AR quasi-simplex. Composites of the items were formed at each time point and analyzed with a linear LGM or an AR quasi-simplex model. The results showed that AR quasi-simplex model yielded biased path coefficients only in the conditions with large violations of invariance. The fit of the AR quasi-simplex was not affected by violations of invariance. In general, the growth parameter estimates of the LGM were biased under violations of invariance. Further, in the presence of non-invariant loadings the rejection rates of the hypothesis of linear growth increased as the proportion of non-invariant items and as the magnitude of violations of invariance increased. A discussion of the results and limitations of the study are provided as well as general recommendations.

ContributorsOlivera-Aguilar, Margarita (Author) / Millsap, Roger E. (Thesis advisor) / Levy, Roy (Committee member) / MacKinnon, David (Committee member) / West, Stephen G. (Committee member) / Arizona State University (Publisher)

Created2013

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The…

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

ContributorsCham, Hei Ning (Author) / Tein, Jenn-Yun (Thesis advisor) / Enders, Stephen G (Thesis advisor) / Enders, Craig K. (Committee member) / Mackinnon, David P (Committee member) / Arizona State University (Publisher)

Created2013

Regression analysis of grouped counts and frequencies using the generalized linear model

Description

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate…

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.

ContributorsCoxe, Stefany (Author) / Aiken, Leona S. (Thesis advisor) / West, Stephen G. (Thesis advisor) / Mackinnon, David P (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2012

Multilevel mediation analysis: statistical assumptions and centering

Description

Mediation analysis is a statistical approach that examines the effect of a treatment (e.g., prevention program) on an outcome (e.g., substance use) achieved by targeting and changing one or more intervening variables (e.g., peer drug use norms). The increased use of prevention intervention programs with outcomes measured at multiple time…

Mediation analysis is a statistical approach that examines the effect of a treatment (e.g., prevention program) on an outcome (e.g., substance use) achieved by targeting and changing one or more intervening variables (e.g., peer drug use norms). The increased use of prevention intervention programs with outcomes measured at multiple time points following the intervention requires multilevel modeling techniques to account for clustering in the data. Estimating multilevel mediation models, in which all the variables are measured at individual level (Level 1), poses several challenges to researchers. The first challenge is to conceptualize a multilevel mediation model by clarifying the underlying statistical assumptions and implications of those assumptions on cluster-level (Level-2) covariance structure. A second challenge is that variables measured at Level 1 potentially contain both between- and within-cluster variation making interpretation of multilevel analysis difficult. As a result, multilevel mediation analyses may yield coefficient estimates that are composites of coefficient estimates at different levels if proper centering is not used. This dissertation addresses these two challenges. Study 1 discusses the concept of a correctly specified multilevel mediation model by examining the underlying statistical assumptions and implication of those assumptions on Level-2 covariance structure. Further, Study 1 presents analytical results showing algebraic relationships between the population parameters in a correctly specified multilevel mediation model. Study 2 extends previous work on centering in multilevel mediation analysis. First, different centering methods in multilevel analysis including centering within cluster with the cluster mean as a Level-2 predictor of intercept (CWC2) are discussed. Next, application of the CWC2 strategy to accommodate multilevel mediation models is explained. It is shown that the CWC2 centering strategy separates the between- and within-cluster mediated effects. Next, Study 2 discusses assumptions underlying a correctly specified CWC2 multilevel mediation model and defines between- and within-cluster mediated effects. In addition, analytical results for the algebraic relationships between the population parameters in a CWC2 multilevel mediation model are presented. Finally, Study 2 shows results of a simulation study conducted to verify derived algebraic relationships empirically.

ContributorsTofighi, Davood (Author) / West, Stephen G. (Thesis advisor) / Mackinnon, David P (Thesis advisor) / Enders, Craig C (Committee member) / Millsap, Roger E (Committee member) / Arizona State University (Publisher)

Created2010

Filtering by

Impact of violations of longitudinal measurement invariance in latent growth models and autoregressive quasi-simplex models

Propensity score estimation with random forests

Regression analysis of grouped counts and frequencies using the generalized linear model

Multilevel mediation analysis: statistical assumptions and centering