Matching Items (21)
Filtering by

Clear all filters

153391-Thumbnail Image.png
Description
Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS)

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers.
ContributorsMistler, Stephen (Author) / Enders, Craig K. (Thesis advisor) / Aiken, Leona (Committee member) / Levy, Roy (Committee member) / West, Stephen G. (Committee member) / Arizona State University (Publisher)
Created2015
150618-Thumbnail Image.png
Description
Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.
ContributorsCoxe, Stefany (Author) / Aiken, Leona S. (Thesis advisor) / West, Stephen G. (Thesis advisor) / Mackinnon, David P (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2012
154088-Thumbnail Image.png
Description
Researchers are often interested in estimating interactions in multilevel models, but many researchers assume that the same procedures and interpretations for interactions in single-level models apply to multilevel models. However, estimating interactions in multilevel models is much more complex than in single-level models. Because uncentered (RAS) or grand

Researchers are often interested in estimating interactions in multilevel models, but many researchers assume that the same procedures and interpretations for interactions in single-level models apply to multilevel models. However, estimating interactions in multilevel models is much more complex than in single-level models. Because uncentered (RAS) or grand mean centered (CGM) level-1 predictors in two-level models contain two sources of variability (i.e., within-cluster variability and between-cluster variability), interactions involving RAS or CGM level-1 predictors also contain more than one source of variability. In this Master’s thesis, I use simulations to demonstrate that ignoring the four sources of variability in a total level-1 interaction effect can lead to erroneous conclusions. I explain how to parse a total level-1 interaction effect into four specific interaction effects, derive equivalencies between CGM and centering within context (CWC) for this model, and describe how the interpretations of the fixed effects change under CGM and CWC. Finally, I provide an empirical example using diary data collected from working adults with chronic pain.
ContributorsMazza, Gina L (Author) / Enders, Craig K. (Thesis advisor) / Aiken, Leona S. (Thesis advisor) / West, Stephen G. (Committee member) / Arizona State University (Publisher)
Created2015
156112-Thumbnail Image.png
Description
Understanding how adherence affects outcomes is crucial when developing and assigning interventions. However, interventions are often evaluated by conducting randomized experiments and estimating intent-to-treat effects, which ignore actual treatment received. Dose-response effects can supplement intent-to-treat effects when participants are offered the full dose but many only receive a

Understanding how adherence affects outcomes is crucial when developing and assigning interventions. However, interventions are often evaluated by conducting randomized experiments and estimating intent-to-treat effects, which ignore actual treatment received. Dose-response effects can supplement intent-to-treat effects when participants are offered the full dose but many only receive a partial dose due to nonadherence. Using these data, we can estimate the magnitude of the treatment effect at different levels of adherence, which serve as a proxy for different levels of treatment. In this dissertation, I conducted Monte Carlo simulations to evaluate when linear dose-response effects can be accurately and precisely estimated in randomized experiments comparing a no-treatment control condition to a treatment condition with partial adherence. Specifically, I evaluated the performance of confounder adjustment and instrumental variable methods when their assumptions were met (Study 1) and when their assumptions were violated (Study 2). In Study 1, the confounder adjustment and instrumental variable methods provided unbiased estimates of the dose-response effect across sample sizes (200, 500, 2,000) and adherence distributions (uniform, right skewed, left skewed). The adherence distribution affected power for the instrumental variable method. In Study 2, the confounder adjustment method provided unbiased or minimally biased estimates of the dose-response effect under no or weak (but not moderate or strong) unobserved confounding. The instrumental variable method provided extremely biased estimates of the dose-response effect under violations of the exclusion restriction (no direct effect of treatment assignment on the outcome), though less severe violations of the exclusion restriction should be investigated.
ContributorsMazza, Gina L (Author) / Grimm, Kevin J. (Thesis advisor) / West, Stephen G. (Thesis advisor) / Mackinnon, David P (Committee member) / Tein, Jenn-Yun (Committee member) / Arizona State University (Publisher)
Created2018
156576-Thumbnail Image.png
Description
The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed through regressing current values on previous realizations and proxy innovations. The classic paradigm fails when dynamics are nonlinear; in this case, parametric, regime-switching specifications model changes in level, ARMA dynamics, and volatility, using a finite number of latent states. If the states can be identified using past endogenous or exogenous information, a threshold autoregressive (TAR) or logistic smooth transition autoregressive (LSTAR) model may simplify complex nonlinear associations to conditional weakly stationary processes. For ARMA, TAR, and STAR, order parameters quantify the extent past information is associated with the future. Unfortunately, even if model orders are known a priori, the possibility of over-fitting can lead to sub-optimal forecasting performance. By intentionally overestimating these orders, a linear representation of the full model is exploited and Bayesian regularization can be used to achieve sparsity. Global-local shrinkage priors for AR, MA, and exogenous coefficients are adopted to pull posterior means toward 0 without over-shrinking relevant effects. This dissertation introduces, evaluates, and compares Bayesian techniques that automatically perform model selection and coefficient estimation of ARMA, TAR, and STAR models. Multiple Monte Carlo experiments illustrate the accuracy of these methods in finding the "true" data generating process. Practical applications demonstrate their efficacy in forecasting.
ContributorsGiacomazzo, Mario (Author) / Kamarianakis, Yiannis (Thesis advisor) / Reiser, Mark R. (Committee member) / McCulloch, Robert (Committee member) / Hahn, Richard (Committee member) / Fricks, John (Committee member) / Arizona State University (Publisher)
Created2018
156631-Thumbnail Image.png
Description
Mediation analysis is used to investigate how an independent variable, X, is related to an outcome variable, Y, through a mediator variable, M (MacKinnon, 2008). If X represents a randomized intervention it is difficult to make a cause and effect inference regarding indirect effects without making no unmeasured confounding assumptions

Mediation analysis is used to investigate how an independent variable, X, is related to an outcome variable, Y, through a mediator variable, M (MacKinnon, 2008). If X represents a randomized intervention it is difficult to make a cause and effect inference regarding indirect effects without making no unmeasured confounding assumptions using the potential outcomes framework (Holland, 1988; MacKinnon, 2008; Robins & Greenland, 1992; VanderWeele, 2015), using longitudinal data to determine the temporal order of M and Y (MacKinnon, 2008), or both. The goals of this dissertation were to (1) define all indirect and direct effects in a three-wave longitudinal mediation model using the causal mediation formula (Pearl, 2012), (2) analytically compare traditional estimators (ANCOVA, difference score, and residualized change score) to the potential outcomes-defined indirect effects, and (3) use a Monte Carlo simulation to compare the performance of regression and potential outcomes-based methods for estimating longitudinal indirect effects and apply the methods to an empirical dataset. The results of the causal mediation formula revealed the potential outcomes definitions of indirect effects are equivalent to the product of coefficient estimators in a three-wave longitudinal mediation model with linear and additive relations. It was demonstrated with analytical comparisons that the ANCOVA, difference score, and residualized change score models’ estimates of two time-specific indirect effects differ as a function of the respective mediator-outcome relations at each time point. The traditional model that performed the best in terms of the evaluation criteria in the Monte Carlo study was the ANCOVA model and the potential outcomes model that performed the best in terms of the evaluation criteria was sequential G-estimation. Implications and future directions are discussed.
ContributorsValente, Matthew J (Author) / Mackinnon, David P (Thesis advisor) / West, Stephen G. (Committee member) / Grimm, Keving (Committee member) / Chassin, Laurie (Committee member) / Arizona State University (Publisher)
Created2018
157026-Thumbnail Image.png
Description
Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social sciences. De- spite their common usage, model selection methods are

Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social sciences. De- spite their common usage, model selection methods are not driven by a notion of statistical confidence, so their results entail an unknown de- gree of uncertainty. This paper introduces a general framework which extends notions of Type-I and Type-II error to model selection. A theo- retical method for controlling Type-I error using Difference of Goodness of Fit (DGOF) distributions is given, along with a bootstrap approach that approximates the procedure. Results are presented for simulated experiments using normal distributions, random walk models, nested linear regression, and nonnested regression including nonlinear mod- els. Tests are performed using an R package developed by the author which will be made publicly available on journal publication of research results.
ContributorsCullan, Michael J (Author) / Sterner, Beckett (Thesis advisor) / Fricks, John (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)
Created2018
154939-Thumbnail Image.png
Description
The comparison of between- versus within-person relations addresses a central issue in psychological research regarding whether group-level relations among variables generalize to individual group members. Between- and within-person effects may differ in magnitude as well as direction, and contextual multilevel models can accommodate this difference. Contextual multilevel models have been

The comparison of between- versus within-person relations addresses a central issue in psychological research regarding whether group-level relations among variables generalize to individual group members. Between- and within-person effects may differ in magnitude as well as direction, and contextual multilevel models can accommodate this difference. Contextual multilevel models have been explicated mostly for cross-sectional data, but they can also be applied to longitudinal data where level-1 effects represent within-person relations and level-2 effects represent between-person relations. With longitudinal data, estimating the contextual effect allows direct evaluation of whether between-person and within-person effects differ. Furthermore, these models, unlike single-level models, permit individual differences by allowing within-person slopes to vary across individuals. This study examined the statistical performance of the contextual model with a random slope for longitudinal within-person fluctuation data.

A Monte Carlo simulation was used to generate data based on the contextual multilevel model, where sample size, effect size, and intraclass correlation (ICC) of the predictor variable were varied. The effects of simulation factors on parameter bias, parameter variability, and standard error accuracy were assessed. Parameter estimates were in general unbiased. Power to detect the slope variance and contextual effect was over 80% for most conditions, except some of the smaller sample size conditions. Type I error rates for the contextual effect were also high for some of the smaller sample size conditions. Conclusions and future directions are discussed.
ContributorsWurpts, Ingrid Carlson (Author) / Mackinnon, David P (Thesis advisor) / West, Stephen G. (Committee member) / Grimm, Kevin J. (Committee member) / Suk, Hye Won (Committee member) / Arizona State University (Publisher)
Created2016
155855-Thumbnail Image.png
Description
Time-to-event analysis or equivalently, survival analysis deals with two variables simultaneously: when (time information) an event occurs and whether an event occurrence is observed or not during the observation period (censoring information). In behavioral and social sciences, the event of interest usually does not lead to a terminal state

Time-to-event analysis or equivalently, survival analysis deals with two variables simultaneously: when (time information) an event occurs and whether an event occurrence is observed or not during the observation period (censoring information). In behavioral and social sciences, the event of interest usually does not lead to a terminal state such as death. Other outcomes after the event can be collected and thus, the survival variable can be considered as a predictor as well as an outcome in a study. One example of a case where the survival variable serves as a predictor as well as an outcome is a survival-mediator model. In a single survival-mediator model an independent variable, X predicts a survival variable, M which in turn, predicts a continuous outcome, Y. The survival-mediator model consists of two regression equations: X predicting M (M-regression), and M and X simultaneously predicting Y (Y-regression). To estimate the regression coefficients of the survival-mediator model, Cox regression is used for the M-regression. Ordinary least squares regression is used for the Y-regression using complete case analysis assuming censored data in M are missing completely at random so that the Y-regression is unbiased. In this dissertation research, different measures for the indirect effect were proposed and a simulation study was conducted to compare performance of different indirect effect test methods. Bias-corrected bootstrapping produced high Type I error rates as well as low parameter coverage rates in some conditions. In contrast, the Sobel test produced low Type I error rates as well as high parameter coverage rates in some conditions. The bootstrap of the natural indirect effect produced low Type I error and low statistical power when the censoring proportion was non-zero. Percentile bootstrapping, distribution of the product and the joint-significance test showed best performance. Statistical analysis of the survival-mediator model is discussed. Two indirect effect measures, the ab-product and the natural indirect effect are compared and discussed. Limitations and future directions of the simulation study are discussed. Last, interpretation of the survival-mediator model for a made-up empirical data set is provided to clarify the meaning of the quantities in the survival-mediator model.
ContributorsKim, Han Joe (Author) / Mackinnon, David P. (Thesis advisor) / Tein, Jenn-Yun (Thesis advisor) / West, Stephen G. (Committee member) / Grimm, Kevin J. (Committee member) / Arizona State University (Publisher)
Created2017
189356-Thumbnail Image.png
Description
This dissertation comprises two projects: (i) Multiple testing of local maxima for detection of peaks and change points with non-stationary noise, and (ii) Height distributions of critical points of smooth isotropic Gaussian fields: computations, simulations and asymptotics. The first project introduces a topological multiple testing method for one-dimensional domains to

This dissertation comprises two projects: (i) Multiple testing of local maxima for detection of peaks and change points with non-stationary noise, and (ii) Height distributions of critical points of smooth isotropic Gaussian fields: computations, simulations and asymptotics. The first project introduces a topological multiple testing method for one-dimensional domains to detect signals in the presence of non-stationary Gaussian noise. The approach involves conducting tests at local maxima based on two observation conditions: (i) the noise is smooth with unit variance and (ii) the noise is not smooth where kernel smoothing is applied to increase the signal-to-noise ratio (SNR). The smoothed signals are then standardized, which ensures that the variance of the new sequence's noise becomes one, making it possible to calculate $p$-values for all local maxima using random field theory. Assuming unimodal true signals with finite support and non-stationary Gaussian noise that can be repeatedly observed. The algorithm introduced in this work, demonstrates asymptotic strong control of the False Discovery Rate (FDR) and power consistency as the number of sequence repetitions and signal strength increase. Simulations indicate that FDR levels can also be controlled under non-asymptotic conditions with finite repetitions. The application of this algorithm to change point detection also guarantees FDR control and power consistency. The second project focuses on investigating the explicit and asymptotic height densities of critical points of smooth isotropic Gaussian random fields on both Euclidean space and spheres.The formulae are based on characterizing the distribution of the Hessian of the Gaussian field using the Gaussian orthogonally invariant (GOI) matrices and the Gaussian orthogonal ensemble (GOE) matrices, which are special cases of GOI matrices. However, as the dimension increases, calculating explicit formulae becomes computationally challenging. The project includes two simulation methods for these distributions. Additionally, asymptotic distributions are obtained by utilizing the asymptotic distribution of the eigenvalues (excluding the maximum eigenvalues) of the GOE matrix for large dimensions. However, when it comes to the maximum eigenvalue, the Tracy-Widom distribution is utilized. Simulation results demonstrate the close approximation between the asymptotic distribution and the real distribution when $N$ is sufficiently large.
Contributorsgu, shuang (Author) / Cheng, Dan (Thesis advisor) / Lopes, Hedibert (Committee member) / Fricks, John (Committee member) / Lan, Shiwei (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)
Created2023