Matching Items (7)

Filtering by

Clear all filters

154040-Thumbnail Image.png

Three-level multiple imputation: a fully conditional specification approach

Description

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no work done on fully conditional specication (FCS)

method. Moreover, the literature lacks any methodological evaluation of three-level

imputation. Thus, this thesis serves two purposes: (1) to develop an algorithm in

order to implement FCS in the context of a three-level model and (2) to evaluate

both imputation methods. The simulation investigated a random intercept model

under both 20% and 40% missing data rates. The ndings of this thesis suggest

that the estimates for both JM and FCS were largely unbiased, gave good coverage,

and produced similar results. The sole exception for both methods was the slope for

the level-3 variable, which was modestly biased. The bias exhibited by the methods

could be due to the small number of clusters used. This nding suggests that future

research ought to investigate and establish clear recommendations for the number of

clusters required by these imputation methods. To conclude, this thesis serves as a

preliminary start in tackling a much larger issue and gap in the current missing data

literature.

Contributors

Agent

Created

Date Created
2015

149971-Thumbnail Image.png

The sensitivity of confirmatory factor analytic fit indices to violations of factorial invariance across latent classes: a simulation study

Description

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as those denoted by sex or ethnicity. While establishing factorial invariance

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as those denoted by sex or ethnicity. While establishing factorial invariance across observed groups is a requisite step in making meaningful cross-group comparisons, failure to attend to possible sources of latent class heterogeneity in the form of class-based differences in factor structure has the potential to compromise conclusions with respect to observed groups and may result in misguided attempts at instrument development and theory refinement. The present studies examined the sensitivity of two widely used confirmatory factor analytic model fit indices, the chi-square test of model fit and RMSEA, to latent class differences in factor structure. Two primary questions were addressed. The first of these concerned the impact of latent class differences in factor loadings with respect to model fit in a single sample reflecting a mixture of classes. The second question concerned the impact of latent class differences in configural structure on tests of factorial invariance across observed groups. The results suggest that both indices are highly insensitive to class-based differences in factor loadings. Across sample size conditions, models with medium (0.2) sized loading differences were rejected by the chi-square test of model fit at rates just slightly higher than the nominal .05 rate of rejection that would be expected under a true null hypothesis. While rates of rejection increased somewhat when the magnitude of loading difference increased, even the largest sample size with equal class representation and the most extreme violations of loading invariance only had rejection rates of approximately 60%. RMSEA was also insensitive to class-based differences in factor loadings, with mean values across conditions suggesting a degree of fit that would generally be regarded as exceptionally good in practice. In contrast, both indices were sensitive to class-based differences in configural structure in the context of a multiple group analysis in which each observed group was a mixture of classes. However, preliminary evidence suggests that this sensitivity may contingent on the form of the cross-group model misspecification.

Contributors

Agent

Created

Date Created
2011

150016-Thumbnail Image.png

An investigation of power analysis approaches for latent growth modeling

Description

Designing studies that use latent growth modeling to investigate change over time calls for optimal approaches for conducting power analysis for a priori determination of required sample size. This investigation (1) studied the impacts of variations in specified parameters,

Designing studies that use latent growth modeling to investigate change over time calls for optimal approaches for conducting power analysis for a priori determination of required sample size. This investigation (1) studied the impacts of variations in specified parameters, design features, and model misspecification in simulation-based power analyses and (2) compared power estimates across three common power analysis techniques: the Monte Carlo method; the Satorra-Saris method; and the method developed by MacCallum, Browne, and Cai (MBC). Choice of sample size, effect size, and slope variance parameters markedly influenced power estimates; however, level-1 error variance and number of repeated measures (3 vs. 6) when study length was held constant had little impact on resulting power. Under some conditions, having a moderate versus small effect size or using a sample size of 800 versus 200 increased power by approximately .40, and a slope variance of 10 versus 20 increased power by up to .24. Decreasing error variance from 100 to 50, however, increased power by no more than .09 and increasing measurement occasions from 3 to 6 increased power by no more than .04. Misspecification in level-1 error structure had little influence on power, whereas misspecifying the form of the growth model as linear rather than quadratic dramatically reduced power for detecting differences in slopes. Additionally, power estimates based on the Monte Carlo and Satorra-Saris techniques never differed by more than .03, even with small sample sizes, whereas power estimates for the MBC technique appeared quite discrepant from the other two techniques. Results suggest the choice between using the Satorra-Saris or Monte Carlo technique in a priori power analyses for slope differences in latent growth models is a matter of preference, although features such as missing data can only be considered within the Monte Carlo approach. Further, researchers conducting power analyses for slope differences in latent growth models should pay greatest attention to estimating slope difference, slope variance, and sample size. Arguments are also made for examining model-implied covariance matrices based on estimated parameters and graphic depictions of slope variance to help ensure parameter estimates are reasonable in a priori power analysis.

Contributors

Agent

Created

Date Created
2011

153391-Thumbnail Image.png

Multilevel multiple imputation: an examination of competing methods

Description

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers.

Contributors

Agent

Created

Date Created
2015

154088-Thumbnail Image.png

Interaction effects in multilevel models

Description

Researchers are often interested in estimating interactions in multilevel models, but many researchers assume that the same procedures and interpretations for interactions in single-level models apply to multilevel models. However, estimating interactions in multilevel models is much more complex

Researchers are often interested in estimating interactions in multilevel models, but many researchers assume that the same procedures and interpretations for interactions in single-level models apply to multilevel models. However, estimating interactions in multilevel models is much more complex than in single-level models. Because uncentered (RAS) or grand mean centered (CGM) level-1 predictors in two-level models contain two sources of variability (i.e., within-cluster variability and between-cluster variability), interactions involving RAS or CGM level-1 predictors also contain more than one source of variability. In this Master’s thesis, I use simulations to demonstrate that ignoring the four sources of variability in a total level-1 interaction effect can lead to erroneous conclusions. I explain how to parse a total level-1 interaction effect into four specific interaction effects, derive equivalencies between CGM and centering within context (CWC) for this model, and describe how the interpretations of the fixed effects change under CGM and CWC. Finally, I provide an empirical example using diary data collected from working adults with chronic pain.

Contributors

Agent

Created

Date Created
2015

155025-Thumbnail Image.png

Multiple imputation for two-level hierarchical models with categorical variables and missing at random data

Description

Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis

Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the structure of relationships in the data such as clustered/hierarchical data, b) do not allow or control for missing values present in the data, or c) do not accurately compensate for different data types such as categorical data, then the assumptions associated with the model have not been met and the results of the analysis may be inaccurate. In the presence of clustered
ested data, hierarchical linear modeling or multilevel modeling (MLM; Raudenbush & Bryk, 2002) has the ability to predict outcomes for each level of analysis and across multiple levels (accounting for relationships between levels) providing a significant advantage over single-level analyses. When multilevel data contain missingness, multilevel multiple imputation (MLMI) techniques may be used to model both the missingness and the clustered nature of the data. With categorical multilevel data with missingness, categorical MLMI must be used. Two such routines for MLMI with continuous and categorical data were explored with missing at random (MAR) data: a formal Bayesian imputation and analysis routine in JAGS (R/JAGS) and a common MLM procedure of imputation via Bayesian estimation in BLImP with frequentist analysis of the multilevel model in Mplus (BLImP/Mplus). Manipulated variables included interclass correlations, number of clusters, and the rate of missingness. Results showed that with continuous data, R/JAGS returned more accurate parameter estimates than BLImP/Mplus for almost all parameters of interest across levels of the manipulated variables. Both R/JAGS and BLImP/Mplus encountered convergence issues and returned inaccurate parameter estimates when imputing and analyzing dichotomous data. Follow-up studies showed that JAGS and BLImP returned similar imputed datasets but the choice of analysis software for MLM impacted the recovery of accurate parameter estimates. Implications of these findings and recommendations for further research will be discussed.

Contributors

Agent

Created

Date Created
2016

158850-Thumbnail Image.png

Spatial Regression and Gaussian Process BART

Description

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were proposed to overcome the challenges in practice. There are three major parts in the dissertation.

In the first part, nonlinear regression models were embedded into a multistage workflow to predict the spatial abundance of reef fish species in the Gulf of Mexico. There were two challenges, zero-inflated data and out of sample prediction. The methods and models in the workflow could effectively handle the zero-inflated sampling data without strong assumptions. Three strategies were proposed to solve the out of sample prediction problem. The results and discussions showed that the nonlinear prediction had the advantages of high accuracy, low bias and well-performed in multi-resolution.

In the second part, a two-stage spatial regression model was proposed for analyzing soil carbon stock (SOC) data. In the first stage, there was a spatial linear mixed model that captured the linear and stationary effects. In the second stage, a generalized additive model was used to explain the nonlinear and nonstationary effects. The results illustrated that the two-stage model had good interpretability in understanding the effect of covariates, meanwhile, it kept high prediction accuracy which is competitive to the popular machine learning models, like, random forest, xgboost and support vector machine.

A new nonlinear regression model, Gaussian process BART (Bayesian additive regression tree), was proposed in the third part. Combining advantages in both BART and Gaussian process, the model could capture the nonlinear effects of both observed and latent covariates. To develop the model, first, the traditional BART was generalized to accommodate correlated errors. Then, the failure of likelihood based Markov chain Monte Carlo (MCMC) in parameter estimating was discussed. Based on the idea of analysis of variation, back comparing and tuning range, were proposed to tackle this failure. Finally, effectiveness of the new model was examined by experiments on both simulation and real data.

Contributors

Agent

Created

Date Created
2020