Search Content

Impact of violations of longitudinal measurement invariance in latent growth models and autoregressive quasi-simplex models

Description

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using composites of items is that smaller sample sizes are required…

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using composites of items is that smaller sample sizes are required in contrast to second order models that include the measurement and the structural relationships among the variables. However, the use of composites assumes that longitudinal measurement invariance holds; that is, it is assumed that that the relationships among the items and the latent variables remain constant over time. Previous studies conducted on latent growth models (LGM) have shown that when longitudinal metric invariance is violated, the parameter estimates are biased and that mistaken conclusions about growth can be made. The purpose of the current study was to examine the impact of non-invariant loadings and non-invariant intercepts on two longitudinal models: the LGM and the autoregressive quasi-simplex model (AR quasi-simplex). A second purpose was to determine if there are conditions in which researchers can reach adequate conclusions about stability and growth even in the presence of violations of invariance. A Monte Carlo simulation study was conducted to achieve the purposes. The method consisted of generating items under a linear curve of factors model (COFM) or under the AR quasi-simplex. Composites of the items were formed at each time point and analyzed with a linear LGM or an AR quasi-simplex model. The results showed that AR quasi-simplex model yielded biased path coefficients only in the conditions with large violations of invariance. The fit of the AR quasi-simplex was not affected by violations of invariance. In general, the growth parameter estimates of the LGM were biased under violations of invariance. Further, in the presence of non-invariant loadings the rejection rates of the hypothesis of linear growth increased as the proportion of non-invariant items and as the magnitude of violations of invariance increased. A discussion of the results and limitations of the study are provided as well as general recommendations.

ContributorsOlivera-Aguilar, Margarita (Author) / Millsap, Roger E. (Thesis advisor) / Levy, Roy (Committee member) / MacKinnon, David (Committee member) / West, Stephen G. (Committee member) / Arizona State University (Publisher)

Created2013

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The…

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

ContributorsCham, Hei Ning (Author) / Tein, Jenn-Yun (Thesis advisor) / Enders, Stephen G (Thesis advisor) / Enders, Craig K. (Committee member) / Mackinnon, David P (Committee member) / Arizona State University (Publisher)

Created2013

Multilevel potential outcome models for causal inference in jury research

Description

Recent advances in hierarchical or multilevel statistical models and causal inference using the potential outcomes framework hold tremendous promise for mock and real jury research. These advances enable researchers to explore how individual jurors can exert a bottom-up effect on the jury’s verdict and how case-level features can exert a…

Recent advances in hierarchical or multilevel statistical models and causal inference using the potential outcomes framework hold tremendous promise for mock and real jury research. These advances enable researchers to explore how individual jurors can exert a bottom-up effect on the jury’s verdict and how case-level features can exert a top-down effect on a juror’s perception of the parties at trial. This dissertation explains and then applies these technical advances to a pre-existing mock jury dataset to provide worked examples in an effort to spur the adoption of these techniques. In particular, the paper introduces two new cross-level mediated effects and then describes how to conduct ecological validity tests with these mediated effects. The first cross-level mediated effect, the a1b1 mediated effect, is the juror level mediated effect for a jury level manipulation. The second cross-level mediated effect, the a2bc mediated effect, is the unique contextual effect that being in a jury has on the individual the juror. When a mock jury study includes a deliberation versus non-deliberation manipulation, the a1b1 can be compared for the two conditions, enabling a general test of ecological validity. If deliberating in a group generally influences the individual, then the two indirect effects should be significantly different. The a2bc can also be interpreted as a specific test of how much changes in jury level means of this specific mediator effect juror level decision-making.

ContributorsLovis-McMahon, David (Author) / Schweitzer, Nicholas (Thesis advisor) / Saks, Michael (Thesis advisor) / Salerno, Jessica (Committee member) / MacKinnon, David (Committee member) / Arizona State University (Publisher)

Created2015

Psychometric and Machine Learning Approaches to Diagnostic Classification

Description

The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response theory (IRT), and IRT scores can subsequently be used to…

The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response theory (IRT), and IRT scores can subsequently be used to determine a cut score using receiver operating characteristic (ROC) curves. Psychometric methods provide reliable and interpretable scores, but the prediction of the diagnosis is not the primary product of the measurement process. In contrast, machine learning methods, such as regularization or binary recursive partitioning, can build a model from the assessment items to predict the probability of diagnosis. Machine learning predicts the diagnosis directly, but does not provide an inferential framework to explain why item responses are related to the diagnosis. It remains unclear whether psychometric and machine learning methods have comparable accuracy or if one method is preferable in some situations. In this study, Monte Carlo simulation methods were used to compare psychometric and machine learning methods on diagnostic classification accuracy. Results suggest that classification accuracy of psychometric models depends on the diagnostic-test correlation and prevalence of diagnosis. Also, machine learning methods that reduce prediction error have inflated specificity and very low sensitivity compared to the data-generating model, especially when prevalence is low. Finally, machine learning methods that use ROC curves to determine probability thresholds have comparable classification accuracy to the psychometric models as sample size, number of items, and number of item categories increase. Therefore, results suggest that machine learning models could provide a viable alternative for classification in diagnostic assessments. Strengths and limitations for each of the methods are discussed, and future directions are considered.

ContributorsGonzález, Oscar (Author) / Mackinnon, David P (Thesis advisor) / Edwards, Michael C (Thesis advisor) / Grimm, Kevin J. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2018

Evaluation of five effect size measures of measurement non-invariance for continuous outcomes

Description

To make meaningful comparisons on a construct of interest across groups or over time, measurement invariance needs to exist for at least a subset of the observed variables that define the construct. Often, chi-square difference tests are used to test for measurement invariance. However, these statistics are affected by sample…

To make meaningful comparisons on a construct of interest across groups or over time, measurement invariance needs to exist for at least a subset of the observed variables that define the construct. Often, chi-square difference tests are used to test for measurement invariance. However, these statistics are affected by sample size such that larger sample sizes are associated with a greater prevalence of significant tests. Thus, using other measures of non-invariance to aid in the decision process would be beneficial. For this dissertation project, I proposed four new effect size measures of measurement non-invariance and analyzed a Monte Carlo simulation study to evaluate their properties and behavior in addition to the properties and behavior of an already existing effect size measure of non-invariance. The effect size measures were evaluated based on bias, variability, and consistency. Additionally, the factors that affected the value of the effect size measures were analyzed. All studied effect sizes were consistent, but three were biased under certain conditions. Further work is needed to establish benchmarks for the unbiased effect sizes.

ContributorsGunn, Heather J (Author) / Grimm, Kevin J. (Thesis advisor) / Edwards, Michael C (Thesis advisor) / Tein, Jenn-Yun (Committee member) / Anderson, Samantha F. (Committee member) / Arizona State University (Publisher)

Created2019

A Bayesian Synthesis approach to data fusion using augmented data-dependent priors

Description

The process of combining data is one in which information from disjoint datasets sharing at least a number of common variables is merged. This process is commonly referred to as data fusion, with the main objective of creating a new dataset permitting more flexible analyses than the separate analysis of…

The process of combining data is one in which information from disjoint datasets sharing at least a number of common variables is merged. This process is commonly referred to as data fusion, with the main objective of creating a new dataset permitting more flexible analyses than the separate analysis of each individual dataset. Many data fusion methods have been proposed in the literature, although most utilize the frequentist framework. This dissertation investigates a new approach called Bayesian Synthesis in which information obtained from one dataset acts as priors for the next analysis. This process continues sequentially until a single posterior distribution is created using all available data. These informative augmented data-dependent priors provide an extra source of information that may aid in the accuracy of estimation. To examine the performance of the proposed Bayesian Synthesis approach, first, results of simulated data with known population values under a variety of conditions were examined. Next, these results were compared to those from the traditional maximum likelihood approach to data fusion, as well as the data fusion approach analyzed via Bayes. The assessment of parameter recovery based on the proposed Bayesian Synthesis approach was evaluated using four criteria to reflect measures of raw bias, relative bias, accuracy, and efficiency. Subsequently, empirical analyses with real data were conducted. For this purpose, the fusion of real data from five longitudinal studies of mathematics ability varying in their assessment of ability and in the timing of measurement occasions was used. Results from the Bayesian Synthesis and data fusion approaches with combined data using Bayesian and maximum likelihood estimation methods were reported. The results illustrate that Bayesian Synthesis with data driven priors is a highly effective approach, provided that the sample sizes for the fused data are large enough to provide unbiased estimates. Bayesian Synthesis provides another beneficial approach to data fusion that can effectively be used to enhance the validity of conclusions obtained from the merging of data from different studies.

ContributorsMarcoulides, Katerina M (Author) / Grimm, Kevin (Thesis advisor) / Levy, Roy (Thesis advisor) / MacKinnon, David (Committee member) / Suk, Hye Won (Committee member) / Arizona State University (Publisher)

Created2017

Introduction, Estimation, and Potential Solutions to Collider Bias

Description

Collider effects pose a major problem in psychological research. Colliders are third variables that bias the relationship between an independent and dependent variable when (1) the composition of a research sample is restricted by the scores on a collider variable or (2) researchers adjust for a collider variable in their…

Collider effects pose a major problem in psychological research. Colliders are third variables that bias the relationship between an independent and dependent variable when (1) the composition of a research sample is restricted by the scores on a collider variable or (2) researchers adjust for a collider variable in their statistical analyses. Both cases interfere with the accuracy and generalizability of statistical results. Despite their importance, collider effects remain relatively unknown in the social sciences. This research introduces both the conceptual and the mathematical foundation for collider effects and demonstrates how to calculate a collider effect and test it for statistical significance. Simulation studies examined the efficiency and accuracy of the collider estimation methods and tested the viability of Thorndike’s Case III equation as a potential solution to correcting for collider bias in cases of biased sample selection.

ContributorsLamp, Sophia Josephine (Author) / Mackinnon, David P (Thesis advisor) / Anderson, Samantha F (Committee member) / Edwards, Michael C (Committee member) / Arizona State University (Publisher)

Created2021

Evaluating the Performance of the LI3P in Latent Profile Analysis Models

Description

Latent profile analysis (LPA), a type of finite mixture model, has grown in popularity due to its ability to detect latent classes or unobserved subgroups within a sample. Though numerous methods exist to determine the correct number of classes, past research has repeatedly demonstrated that no one method is consistently…

Latent profile analysis (LPA), a type of finite mixture model, has grown in popularity due to its ability to detect latent classes or unobserved subgroups within a sample. Though numerous methods exist to determine the correct number of classes, past research has repeatedly demonstrated that no one method is consistently the best as each tends to struggle under specific conditions. Recently, the likelihood incremental percentage per parameter (LI3P), a method using a new approach, was proposed and tested which yielded promising initial results. To evaluate this new method more thoroughly, this study simulated 50,000 datasets, manipulating factors such as sample size, class distance, number of items, and number of classes. After evaluating the performance of the LI3P on simulated data, the LI3P is applied to LPA models fit to an empirical dataset to illustrate the method’s application. Results indicate the LI3P performs in line with standard class enumeration techniques, and primarily reflects class separation and the number of classes.

ContributorsHoupt, Russell Paul (Author) / Grimm, Kevin J (Thesis advisor) / McNeish, Daniel (Committee member) / Edwards, Michael C (Committee member) / Arizona State University (Publisher)

Created2022

Missing Data in Conditional Inference Trees

Description

Decision trees is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. Conditional Inference Trees (CTREEs) is a non-parametric class of decision trees…

Decision trees is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. Conditional Inference Trees (CTREEs) is a non-parametric class of decision trees that uses statistical theory in order to select variables for splitting. Missing data can be problematic in decision trees because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., deletion, majority rule, and surrogate split) have been implemented in decision tree algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. In addition to these approaches, this dissertation proposed a modified multiple imputation approach to handling missing data in CTREEs. A simulation was conducted to compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging. Results revealed that simple approaches (i.e., majority rule, treat missing as its own category, and listwise deletion) were effective in handling missing data in CTREEs. The modified multiple imputation approach did not perform very well against simple approaches in most conditions, but this approach did seem best suited for small sample sizes and extreme missingness situations.

ContributorsManapat, Danielle Marie (Author) / Grimm, Kevin J (Thesis advisor) / Edwards, Michael C (Thesis advisor) / McNeish, Daniel (Committee member) / Anderson, Samantha F (Committee member) / Arizona State University (Publisher)

Created2023

Beyond Moderation: Exploring Person-Level Mediation with Residuals and Individual Model Fit

Description

Mediation analysis is integral to psychology, investigating human behavior’s causal mechanisms. The diversity of explanations for human behavior has implications for the estimation and interpretation of statistical mediation models. Individuals can have similar observed outcomes while undergoing different causal processes or different observed outcomes while receiving the same treatment. Researchers…

Mediation analysis is integral to psychology, investigating human behavior’s causal mechanisms. The diversity of explanations for human behavior has implications for the estimation and interpretation of statistical mediation models. Individuals can have similar observed outcomes while undergoing different causal processes or different observed outcomes while receiving the same treatment. Researchers can employ diverse strategies when studying individual differences in multiple mediation pathways, including individual fit measures and analysis of residuals. This dissertation investigates the use of individual residuals and fit measures to identify individual differences in multiple mediation pathways. More specifically, this study focuses on mediation model residuals in a heterogeneous population in which some people experience indirect effects through one mediator and others experience indirect effects through a different mediator. A simulation study investigates 162 conditions defined by effect size and sample size for three proposed methods: residual differences, delta z, and generalized Cook’s distance. Results indicate that analogs of Type 1 error rates are generally acceptable for the method of residual differences, but statistical power is limited. Likewise, neither delta z nor gCd could reliably distinguish between contrasts that had true effects and those that did not. The outcomes of this study reveal the potential for statistical measures of individual mediation. However, limitations related to unequal subpopulation variances, multiple dependent variables, the inherent relationship between direct effects and unestimated indirect effects, and minimal contrast effects require more research to develop a simple method that researchers can use on single data sets.

ContributorsSmyth, Heather Lynn (Author) / MacKinnon, David (Thesis advisor) / Tein, Jenn-Yun (Committee member) / McNeish, Daniel (Committee member) / Grimm, Kevin (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by