Matching Items (19)

158877-Thumbnail Image.png

An Evaluation of Statistical Tests of Suppression

Description

This research explores tests for statistical suppression. Suppression is a statistical phenomenon whereby the magnitude of an effect becomes larger when another variable is added to the regression equation. From

This research explores tests for statistical suppression. Suppression is a statistical phenomenon whereby the magnitude of an effect becomes larger when another variable is added to the regression equation. From a causal perspective, suppression occurs when there is inconsistent mediation or negative confounding. Several different estimators for suppression are evaluated conceptually and in a statistical simulation study where we impose suppression and non-suppression conditions. For each estimator without an existing standard error formula, one was derived in order to conduct significance tests and build confidence intervals. Overall, two of the estimators were biased and had poor coverage, one worked well but had inflated type-I error rates when the population model was complete mediation. As a result of analyzing these three tests, a fourth was considered in the late stages of the project and showed promising results that address concerns of the other tests. When the tests were applied to real data, they gave similar results and were consistent.

Contributors

Agent

Created

Date Created
  • 2020

157322-Thumbnail Image.png

Modeling relationships between cycles in psychology: potential limitations of sinusoidal and mass-spring models

Description

With improvements in technology, intensive longitudinal studies that permit the investigation of daily and weekly cycles in behavior have increased exponentially over the past few decades. Traditionally, when data have

With improvements in technology, intensive longitudinal studies that permit the investigation of daily and weekly cycles in behavior have increased exponentially over the past few decades. Traditionally, when data have been collected on two variables over time, multivariate time series approaches that remove trends, cycles, and serial dependency have been used. These analyses permit the study of the relationship between random shocks (perturbations) in the presumed causal series and changes in the outcome series, but do not permit the study of the relationships between cycles. Liu and West (2016) proposed a multilevel approach that permitted the study of potential between subject relationships between features of the cycles in two series (e.g., amplitude). However, I show that the application of the Liu and West approach is restricted to a small set of features and types of relationships between the series. Several authors (e.g., Boker & Graham, 1998) proposed a connected mass-spring model that appears to permit modeling of more general cyclic relationships. I showed that the undamped connected mass-spring model is also limited and may be unidentified. To test the severity of the restrictions of the motion trajectories producible by the undamped connected mass-spring model I mathematically derived their connection to the force equations of the undamped connected mass-spring system. The mathematical solution describes the domain of the trajectory pairs that are producible by the undamped connected mass-spring model. The set of producible trajectory pairs is highly restricted, and this restriction sets major limitations on the application of the connected mass-spring model to psychological data. I used a simulation to demonstrate that even if a pair of psychological time-varying variables behaved exactly like two masses in an undamped connected mass-spring system, the connected mass-spring model would not yield adequate parameter estimates. My simulation probed the performance of the connected mass-spring model as a function of several aspects of data quality including number of subjects, series length, sampling rate relative to the cycle, and measurement error in the data. The findings can be extended to damped and nonlinear connected mass-spring systems.

Contributors

Agent

Created

Date Created
  • 2019

157544-Thumbnail Image.png

Addressing the Variable Selection Bias and Local Optimum Limitations of Longitudinal Recursive Partitioning with Time-Efficient Approximations

Description

Longitudinal recursive partitioning (LRP) is a tree-based method for longitudinal data. It takes a sample of individuals that were each measured repeatedly across time, and it splits them based on

Longitudinal recursive partitioning (LRP) is a tree-based method for longitudinal data. It takes a sample of individuals that were each measured repeatedly across time, and it splits them based on a set of covariates such that individuals with similar trajectories become grouped together into nodes. LRP does this by fitting a mixed-effects model to each node every time that it becomes partitioned and extracting the deviance, which is the measure of node purity. LRP is implemented using the classification and regression tree algorithm, which suffers from a variable selection bias and does not guarantee reaching a global optimum. Additionally, fitting mixed-effects models to each potential split only to extract the deviance and discard the rest of the information is a computationally intensive procedure. Therefore, in this dissertation, I address the high computational demand, variable selection bias, and local optimum solution. I propose three approximation methods that reduce the computational demand of LRP, and at the same time, allow for a straightforward extension to recursive partitioning algorithms that do not have a variable selection bias and can reach the global optimum solution. In the three proposed approximations, a mixed-effects model is fit to the full data, and the growth curve coefficients for each individual are extracted. Then, (1) a principal component analysis is fit to the set of coefficients and the principal component score is extracted for each individual, (2) a one-factor model is fit to the coefficients and the factor score is extracted, or (3) the coefficients are summed. The three methods result in each individual having a single score that represents the growth curve trajectory. Therefore, now that the outcome is a single score for each individual, any tree-based method may be used for partitioning the data and group the individuals together. Once the individuals are assigned to their final nodes, a mixed-effects model is fit to each terminal node with the individuals belonging to it.

I conduct a simulation study, where I show that the approximation methods achieve the goals proposed while maintaining a similar level of out-of-sample prediction accuracy as LRP. I then illustrate and compare the methods using an applied data.

Contributors

Agent

Created

Date Created
  • 2019

154063-Thumbnail Image.png

Model criticism for growth curve models via posterior predictive model checking

Description

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance of several discrepancy functions was investigated in a Monte Carlo simulation study. The discrepancy functions of interest included two types of conditional concordance correlation (CCC) functions, two types of R2 functions, two types of standardized generalized dimensionality discrepancy (SGDDM) functions, the likelihood ratio (LR), and the likelihood ratio difference test (LRT). Key outcomes included effect sizes of the design factors on the realized values of discrepancy functions, distributions of posterior predictive p-values (PPP-values), and the proportion of extreme PPP-values.

In terms of the realized values, the behavior of the CCC and R2 functions were generally consistent with prior research. However, as diagnostics, these functions were extremely conservative even when some aspect of the data was unaccounted for. In contrast, the conditional SGDDM (SGDDMC), LR, and LRT were generally sensitive to the underspecifications investigated in this work on all outcomes considered. Although the proportions of extreme PPP-values for these functions tended to increase in null situations for non-normal data, this behavior may have reflected the true misfit that resulted from the specification of normal prior distributions. Importantly, the LR and the SGDDMC to a greater extent exhibited some potential for untangling the sources of data-model misfit. Owing to connections of growth curve models to the more fundamental frameworks of multilevel modeling, structural equation models with a mean structure, and Bayesian hierarchical models, the results of the current work may have broader implications that warrant further research.

Contributors

Agent

Created

Date Created
  • 2015

155625-Thumbnail Image.png

A Bayesian Synthesis approach to data fusion using augmented data-dependent priors

Description

The process of combining data is one in which information from disjoint datasets sharing at least a number of common variables is merged. This process is commonly referred to as

The process of combining data is one in which information from disjoint datasets sharing at least a number of common variables is merged. This process is commonly referred to as data fusion, with the main objective of creating a new dataset permitting more flexible analyses than the separate analysis of each individual dataset. Many data fusion methods have been proposed in the literature, although most utilize the frequentist framework. This dissertation investigates a new approach called Bayesian Synthesis in which information obtained from one dataset acts as priors for the next analysis. This process continues sequentially until a single posterior distribution is created using all available data. These informative augmented data-dependent priors provide an extra source of information that may aid in the accuracy of estimation. To examine the performance of the proposed Bayesian Synthesis approach, first, results of simulated data with known population values under a variety of conditions were examined. Next, these results were compared to those from the traditional maximum likelihood approach to data fusion, as well as the data fusion approach analyzed via Bayes. The assessment of parameter recovery based on the proposed Bayesian Synthesis approach was evaluated using four criteria to reflect measures of raw bias, relative bias, accuracy, and efficiency. Subsequently, empirical analyses with real data were conducted. For this purpose, the fusion of real data from five longitudinal studies of mathematics ability varying in their assessment of ability and in the timing of measurement occasions was used. Results from the Bayesian Synthesis and data fusion approaches with combined data using Bayesian and maximum likelihood estimation methods were reported. The results illustrate that Bayesian Synthesis with data driven priors is a highly effective approach, provided that the sample sizes for the fused data are large enough to provide unbiased estimates. Bayesian Synthesis provides another beneficial approach to data fusion that can effectively be used to enhance the validity of conclusions obtained from the merging of data from different studies.

Contributors

Agent

Created

Date Created
  • 2017

154889-Thumbnail Image.png

Time metric in latent difference score models

Description

Time metric is an important consideration for all longitudinal models because it can influence the interpretation of estimates, parameter estimate accuracy, and model convergence in longitudinal models with latent variables.

Time metric is an important consideration for all longitudinal models because it can influence the interpretation of estimates, parameter estimate accuracy, and model convergence in longitudinal models with latent variables. Currently, the literature on latent difference score (LDS) models does not discuss the importance of time metric. Furthermore, there is little research using simulations to investigate LDS models. This study examined the influence of time metric on model estimation, interpretation, parameter estimate accuracy, and convergence in LDS models using empirical simulations. Results indicated that for a time structure with a true time metric where participants had different starting points and unequally spaced intervals, LDS models fit with a restructured and less informative time metric resulted in biased parameter estimates. However, models examined using the true time metric were less likely to converge than models using the restructured time metric, likely due to missing data. Where participants had different starting points but equally spaced intervals, LDS models fit with a restructured time metric resulted in biased estimates of intercept means, but all other parameter estimates were unbiased, and models examined using the true time metric had less convergence than the restructured time metric as well due to missing data. The findings of this study support prior research on time metric in longitudinal models, and further research should examine these findings under alternative conditions. The importance of these findings for substantive researchers is discussed.

Contributors

Agent

Created

Date Created
  • 2016

157542-Thumbnail Image.png

Evaluating Person-Oriented Methods for Mediation

Description

Statistical inference from mediation analysis applies to populations, however, researchers and clinicians may be interested in making inference to individual clients or small, localized groups of people. Person-oriented approaches focus

Statistical inference from mediation analysis applies to populations, however, researchers and clinicians may be interested in making inference to individual clients or small, localized groups of people. Person-oriented approaches focus on the differences between people, or latent groups of people, to ask how individuals differ across variables, and can help researchers avoid ecological fallacies when making inferences about individuals. Traditional variable-oriented mediation assumes the population undergoes a homogenous reaction to the mediating process. However, mediation is also described as an intra-individual process where each person passes from a predictor, through a mediator, to an outcome (Collins, Graham, & Flaherty, 1998). Configural frequency mediation is a person-oriented analysis of contingency tables that has not been well-studied or implemented since its introduction in the literature (von Eye, Mair, & Mun, 2010; von Eye, Mun, & Mair, 2009). The purpose of this study is to describe CFM and investigate its statistical properties while comparing it to traditional and casual inference mediation methods. The results of this study show that joint significance mediation tests results in better Type I error rates but limit the person-oriented interpretations of CFM. Although the estimator for logistic regression and causal mediation are different, they both perform well in terms of Type I error and power, although the causal estimator had higher bias than expected, which is discussed in the limitations section.

Contributors

Agent

Created

Date Created
  • 2019

155670-Thumbnail Image.png

Statistical properties of the single mediator model with latent variables in the bayesian framework

Description

Statistical mediation analysis has been widely used in the social sciences in order to examine the indirect effects of an independent variable on a dependent variable. The statistical properties of

Statistical mediation analysis has been widely used in the social sciences in order to examine the indirect effects of an independent variable on a dependent variable. The statistical properties of the single mediator model with manifest and latent variables have been studied using simulation studies. However, the single mediator model with latent variables in the Bayesian framework with various accurate and inaccurate priors for structural and measurement model parameters has yet to be evaluated in a statistical simulation. This dissertation outlines the steps in the estimation of a single mediator model with latent variables as a Bayesian structural equation model (SEM). A Monte Carlo study is carried out in order to examine the statistical properties of point and interval summaries for the mediated effect in the Bayesian latent variable single mediator model with prior distributions with varying degrees of accuracy and informativeness. Bayesian methods with diffuse priors have equally good statistical properties as Maximum Likelihood (ML) and the distribution of the product. With accurate informative priors Bayesian methods can increase power up to 25% and decrease interval width up to 24%. With inaccurate informative priors the point summaries of the mediated effect are more biased than ML estimates, and the bias is higher if the inaccuracy occurs in priors for structural parameters than in priors for measurement model parameters. Findings from the Monte Carlo study are generalizable to Bayesian analyses with priors of the same distributional forms that have comparable amounts of (in)accuracy and informativeness to priors evaluated in the Monte Carlo study.

Contributors

Agent

Created

Date Created
  • 2017

154905-Thumbnail Image.png

Determining appropriate sample sizes and their effects on key parameters in longitudinal three-level models

Description

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75,

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained using a large-scale longitudinal data set from North Carolina, provided by the National Center on Assessment and Accountability for Special Education (NCAASE). I discuss ranges of sample sizes that are inadequate or adequate for convergence, absolute bias, relative bias, root mean squared error (RMSE), and coverage of individual parameter estimates. The current study, with the help of a detailed two-part simulation design for various sample sizes, model complexity and ICCs, provides various options of adequate sample sizes under different conditions. This study emphasizes that adequate sample sizes at either L1, L2, and L3 can be adjusted according to different interests in parameter estimates, different ranges of acceptable absolute bias, relative bias, root mean squared error, and coverage. Under different model complexity and varying ICC conditions, this study aims to help researchers identify L1, L2, and L3 sample size or both as the source of variation in absolute bias, relative bias, RMSE, or coverage proportions for a certain parameter estimate. This assists researchers in making better decisions for selecting adequate sample sizes in a three-level HLM analysis. A limitation of the study was the use of only a single distribution for the dependent and explanatory variables, different types of distributions and their effects might result in different sample size recommendations.

Contributors

Agent

Created

Date Created
  • 2016

155025-Thumbnail Image.png

Multiple imputation for two-level hierarchical models with categorical variables and missing at random data

Description

Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of

Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the structure of relationships in the data such as clustered/hierarchical data, b) do not allow or control for missing values present in the data, or c) do not accurately compensate for different data types such as categorical data, then the assumptions associated with the model have not been met and the results of the analysis may be inaccurate. In the presence of clustered
ested data, hierarchical linear modeling or multilevel modeling (MLM; Raudenbush & Bryk, 2002) has the ability to predict outcomes for each level of analysis and across multiple levels (accounting for relationships between levels) providing a significant advantage over single-level analyses. When multilevel data contain missingness, multilevel multiple imputation (MLMI) techniques may be used to model both the missingness and the clustered nature of the data. With categorical multilevel data with missingness, categorical MLMI must be used. Two such routines for MLMI with continuous and categorical data were explored with missing at random (MAR) data: a formal Bayesian imputation and analysis routine in JAGS (R/JAGS) and a common MLM procedure of imputation via Bayesian estimation in BLImP with frequentist analysis of the multilevel model in Mplus (BLImP/Mplus). Manipulated variables included interclass correlations, number of clusters, and the rate of missingness. Results showed that with continuous data, R/JAGS returned more accurate parameter estimates than BLImP/Mplus for almost all parameters of interest across levels of the manipulated variables. Both R/JAGS and BLImP/Mplus encountered convergence issues and returned inaccurate parameter estimates when imputing and analyzing dichotomous data. Follow-up studies showed that JAGS and BLImP returned similar imputed datasets but the choice of analysis software for MLM impacted the recovery of accurate parameter estimates. Implications of these findings and recommendations for further research will be discussed.

Contributors

Agent

Created

Date Created
  • 2016