Matching Items (22)

151719-Thumbnail Image.png

Mediation as a novel method for increasing statistical power

Description

Including a covariate can increase power to detect an effect between two variables. Although previous research has studied power in mediation models, the extent to which the inclusion of a mediator will increase the power to detect a relation between

Including a covariate can increase power to detect an effect between two variables. Although previous research has studied power in mediation models, the extent to which the inclusion of a mediator will increase the power to detect a relation between two variables has not been investigated. The first study identified situations where empirical and analytical power of two tests of significance for a single mediator model was greater than power of a bivariate significance test. Results from the first study indicated that including a mediator increased statistical power in small samples with large effects and in large samples with small effects. Next, a study was conducted to assess when power was greater for a significance test for a two mediator model as compared with power of a bivariate significance test. Results indicated that including two mediators increased power in small samples when both specific mediated effects were large and in large samples when both specific mediated effects were small. Implications of the results and directions for future research are then discussed.

Contributors

Agent

Created

Date Created
2013

152217-Thumbnail Image.png

Estimating causal direct and indirect effects in the presence of post-treatment confounders: a simulation study

Description

In investigating mediating processes, researchers usually use randomized experiments and linear regression or structural equation modeling to determine if the treatment affects the hypothesized mediator and if the mediator affects the targeted outcome. However, randomizing the treatment will not yield

In investigating mediating processes, researchers usually use randomized experiments and linear regression or structural equation modeling to determine if the treatment affects the hypothesized mediator and if the mediator affects the targeted outcome. However, randomizing the treatment will not yield accurate causal path estimates unless certain assumptions are satisfied. Since randomization of the mediator may not be plausible for most studies (i.e., the mediator status is not randomly assigned, but self-selected by participants), both the direct and indirect effects may be biased by confounding variables. The purpose of this dissertation is (1) to investigate the extent to which traditional mediation methods are affected by confounding variables and (2) to assess the statistical performance of several modern methods to address confounding variable effects in mediation analysis. This dissertation first reviewed the theoretical foundations of causal inference in statistical mediation analysis, modern statistical analysis for causal inference, and then described different methods to estimate causal direct and indirect effects in the presence of two post-treatment confounders. A large simulation study was designed to evaluate the extent to which ordinary regression and modern causal inference methods are able to obtain correct estimates of the direct and indirect effects when confounding variables that are present in the population are not included in the analysis. Five methods were compared in terms of bias, relative bias, mean square error, statistical power, Type I error rates, and confidence interval coverage to test how robust the methods are to the violation of the no unmeasured confounders assumption and confounder effect sizes. The methods explored were linear regression with adjustment, inverse propensity weighting, inverse propensity weighting with truncated weights, sequential g-estimation, and a doubly robust sequential g-estimation. Results showed that in estimating the direct and indirect effects, in general, sequential g-estimation performed the best in terms of bias, Type I error rates, power, and coverage across different confounder effect, direct effect, and sample sizes when all confounders were included in the estimation. When one of the two confounders were omitted from the estimation process, in general, none of the methods had acceptable relative bias in the simulation study. Omitting one of the confounders from estimation corresponded to the common case in mediation studies where no measure of a confounder is available but a confounder may affect the analysis. Failing to measure potential post-treatment confounder variables in a mediation model leads to biased estimates regardless of the analysis method used and emphasizes the importance of sensitivity analysis for causal mediation analysis.

Contributors

Agent

Created

Date Created
2013

154498-Thumbnail Image.png

The impact of partial measurement invariance on between-group comparisons of latent means for a second-order factor

Description

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both levels for a second-order CFA model. The numbers and directions of differences in noninvariant loadings or intercepts were also manipulated, along with total sample size and effect size of the second-order factor mean difference. Data were analyzed using correct and incorrect specifications of noninvariant loadings and intercepts. Results summarized across the 5,000 replications in each condition included Type I error rates and powers for the chi-square difference test and the Wald test of the second-order factor mean difference, estimation bias and efficiency for this latent mean difference, and means of the standardized root mean square residual (SRMR) and the root mean square error of approximation (RMSEA).

When the model was correctly specified, no obvious estimation bias was observed; when the model was misspecified by constraining noninvariant loadings or intercepts to be equal, the latent mean difference was overestimated if the direction of the difference in loadings or intercepts of was consistent with the direction of the latent mean difference, and vice versa. Increasing the number of noninvariant loadings or intercepts resulted in larger estimation bias if these noninvariant loadings or intercepts were constrained to be equal. Power to detect the latent mean difference was influenced by estimation bias and the estimated variance of the difference in the second-order factor mean, in addition to sample size and effect size. Constraining more parameters to be equal between groups—even when unequal in the population—led to a decrease in the variance of the estimated latent mean difference, which increased power somewhat. Finally, RMSEA was very sensitive for detecting misspecification due to improper equality constraints in all conditions in the current scenario, including the nonzero latent mean difference, but SRMR did not increase as expected when noninvariant parameters were constrained.

Contributors

Agent

Created

Date Created
2016

154396-Thumbnail Image.png

Approaches to studying measurement invariance in multilevel data with a level-1 grouping variable

Description

Measurement invariance exists when a scale functions equivalently across people and is therefore essential for making meaningful group comparisons. Often, measurement invariance is examined with independent and identically distributed data; however, there are times when the participants are clustered within

Measurement invariance exists when a scale functions equivalently across people and is therefore essential for making meaningful group comparisons. Often, measurement invariance is examined with independent and identically distributed data; however, there are times when the participants are clustered within units, creating dependency in the data. Researchers have taken different approaches to address this dependency when studying measurement invariance (e.g., Kim, Kwok, & Yoon, 2012; Ryu, 2014; Kim, Yoon, Wen, Luo, & Kwok, 2015), but there are no comparisons of the various approaches. The purpose of this master's thesis was to investigate measurement invariance in multilevel data when the grouping variable was a level-1 variable using five different approaches. Publicly available data from the Early Childhood Longitudinal Study-Kindergarten Cohort (ECLS-K) was used as an illustrative example. The construct of early behavior, which was made up of four teacher-rated behavior scales, was evaluated for measurement invariance in relation to gender. In the specific case of this illustrative example, the statistical conclusions of the five approaches were in agreement (i.e., the loading of the externalizing item and the intercept of the approaches to learning item were not invariant). Simulation work should be done to investigate in which situations the conclusions of these approaches diverge.

Contributors

Agent

Created

Date Created
2016

151957-Thumbnail Image.png

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

Contributors

Agent

Created

Date Created
2013

152032-Thumbnail Image.png

Impact of violations of longitudinal measurement invariance in latent growth models and autoregressive quasi-simplex models

Description

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using composites of items is that smaller sample sizes are required in contrast to second order models that include the measurement and the structural relationships among the variables. However, the use of composites assumes that longitudinal measurement invariance holds; that is, it is assumed that that the relationships among the items and the latent variables remain constant over time. Previous studies conducted on latent growth models (LGM) have shown that when longitudinal metric invariance is violated, the parameter estimates are biased and that mistaken conclusions about growth can be made. The purpose of the current study was to examine the impact of non-invariant loadings and non-invariant intercepts on two longitudinal models: the LGM and the autoregressive quasi-simplex model (AR quasi-simplex). A second purpose was to determine if there are conditions in which researchers can reach adequate conclusions about stability and growth even in the presence of violations of invariance. A Monte Carlo simulation study was conducted to achieve the purposes. The method consisted of generating items under a linear curve of factors model (COFM) or under the AR quasi-simplex. Composites of the items were formed at each time point and analyzed with a linear LGM or an AR quasi-simplex model. The results showed that AR quasi-simplex model yielded biased path coefficients only in the conditions with large violations of invariance. The fit of the AR quasi-simplex was not affected by violations of invariance. In general, the growth parameter estimates of the LGM were biased under violations of invariance. Further, in the presence of non-invariant loadings the rejection rates of the hypothesis of linear growth increased as the proportion of non-invariant items and as the magnitude of violations of invariance increased. A discussion of the results and limitations of the study are provided as well as general recommendations.

Contributors

Agent

Created

Date Created
2013

149971-Thumbnail Image.png

The sensitivity of confirmatory factor analytic fit indices to violations of factorial invariance across latent classes: a simulation study

Description

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as those denoted by sex or ethnicity. While establishing factorial invariance

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as those denoted by sex or ethnicity. While establishing factorial invariance across observed groups is a requisite step in making meaningful cross-group comparisons, failure to attend to possible sources of latent class heterogeneity in the form of class-based differences in factor structure has the potential to compromise conclusions with respect to observed groups and may result in misguided attempts at instrument development and theory refinement. The present studies examined the sensitivity of two widely used confirmatory factor analytic model fit indices, the chi-square test of model fit and RMSEA, to latent class differences in factor structure. Two primary questions were addressed. The first of these concerned the impact of latent class differences in factor loadings with respect to model fit in a single sample reflecting a mixture of classes. The second question concerned the impact of latent class differences in configural structure on tests of factorial invariance across observed groups. The results suggest that both indices are highly insensitive to class-based differences in factor loadings. Across sample size conditions, models with medium (0.2) sized loading differences were rejected by the chi-square test of model fit at rates just slightly higher than the nominal .05 rate of rejection that would be expected under a true null hypothesis. While rates of rejection increased somewhat when the magnitude of loading difference increased, even the largest sample size with equal class representation and the most extreme violations of loading invariance only had rejection rates of approximately 60%. RMSEA was also insensitive to class-based differences in factor loadings, with mean values across conditions suggesting a degree of fit that would generally be regarded as exceptionally good in practice. In contrast, both indices were sensitive to class-based differences in configural structure in the context of a multiple group analysis in which each observed group was a mixture of classes. However, preliminary evidence suggests that this sensitivity may contingent on the form of the cross-group model misspecification.

Contributors

Agent

Created

Date Created
2011

150618-Thumbnail Image.png

Regression analysis of grouped counts and frequencies using the generalized linear model

Description

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.

Contributors

Agent

Created

Date Created
2012

149352-Thumbnail Image.png

Robustness of Latent variable interaction methods to nonnormal exogenous indicators

Description

For this thesis a Monte Carlo simulation was conducted to investigate the robustness of three latent interaction modeling approaches (constrained product indicator, generalized appended product indicator (GAPI), and latent moderated structural equations (LMS)) under high degrees of nonnormality of the

For this thesis a Monte Carlo simulation was conducted to investigate the robustness of three latent interaction modeling approaches (constrained product indicator, generalized appended product indicator (GAPI), and latent moderated structural equations (LMS)) under high degrees of nonnormality of the exogenous indicators, which have not been investigated in previous literature. Results showed that the constrained product indicator and LMS approaches yielded biased estimates of the interaction effect when the exogenous indicators were highly nonnormal. When the violation of nonnormality was not severe (symmetric with excess kurtosis < 1), the LMS approach with ML estimation yielded the most precise latent interaction effect estimates. The LMS approach with ML estimation also had the highest statistical power among the three approaches, given that the actual Type-I error rates of the Wald and likelihood ratio test of interaction effect were acceptable. In highly nonnormal conditions, only the GAPI approach with ML estimation yielded unbiased latent interaction effect estimates, with an acceptable actual Type-I error rate of both the Wald test and likelihood ratio test of interaction effect. No support for the use of the Satorra-Bentler or Yuan-Bentler ML corrections was found across all three methods.

Contributors

Agent

Created

Date Created
2010

149409-Thumbnail Image.png

Multilevel mediation analysis: statistical assumptions and centering

Description

Mediation analysis is a statistical approach that examines the effect of a treatment (e.g., prevention program) on an outcome (e.g., substance use) achieved by targeting and changing one or more intervening variables (e.g., peer drug use norms). The increased use

Mediation analysis is a statistical approach that examines the effect of a treatment (e.g., prevention program) on an outcome (e.g., substance use) achieved by targeting and changing one or more intervening variables (e.g., peer drug use norms). The increased use of prevention intervention programs with outcomes measured at multiple time points following the intervention requires multilevel modeling techniques to account for clustering in the data. Estimating multilevel mediation models, in which all the variables are measured at individual level (Level 1), poses several challenges to researchers. The first challenge is to conceptualize a multilevel mediation model by clarifying the underlying statistical assumptions and implications of those assumptions on cluster-level (Level-2) covariance structure. A second challenge is that variables measured at Level 1 potentially contain both between- and within-cluster variation making interpretation of multilevel analysis difficult. As a result, multilevel mediation analyses may yield coefficient estimates that are composites of coefficient estimates at different levels if proper centering is not used. This dissertation addresses these two challenges. Study 1 discusses the concept of a correctly specified multilevel mediation model by examining the underlying statistical assumptions and implication of those assumptions on Level-2 covariance structure. Further, Study 1 presents analytical results showing algebraic relationships between the population parameters in a correctly specified multilevel mediation model. Study 2 extends previous work on centering in multilevel mediation analysis. First, different centering methods in multilevel analysis including centering within cluster with the cluster mean as a Level-2 predictor of intercept (CWC2) are discussed. Next, application of the CWC2 strategy to accommodate multilevel mediation models is explained. It is shown that the CWC2 centering strategy separates the between- and within-cluster mediated effects. Next, Study 2 discusses assumptions underlying a correctly specified CWC2 multilevel mediation model and defines between- and within-cluster mediated effects. In addition, analytical results for the algebraic relationships between the population parameters in a CWC2 multilevel mediation model are presented. Finally, Study 2 shows results of a simulation study conducted to verify derived algebraic relationships empirically.

Contributors

Agent

Created

Date Created
2010