Matching Items (12)
Filtering by

Clear all filters

151992-Thumbnail Image.png
Description
Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools;

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools; see e.g., Froelich & Habing, 2007). It remains to be seen how such procedures perform in the context of small-scale assessments characterized by relatively small sample sizes and/or short tests. The fact that some procedures come with minimum allowable values for characteristics of the data, such as the number of items, may even render them unusable for some small-scale assessments. Other measures designed to assess dimensionality do not come with such limitations and, as such, may perform better under conditions that do not lend themselves to evaluation via statistics that rely on asymptotic theory. The current work aimed to evaluate the performance of one such metric, the standardized generalized dimensionality discrepancy measure (SGDDM; Levy & Svetina, 2011; Levy, Xu, Yel, & Svetina, 2012), under both large- and small-scale testing conditions. A Monte Carlo study was conducted to compare the performance of DIMTEST and the SGDDM statistic in terms of evaluating assumptions of unidimensionality in item response data under a variety of conditions, with an emphasis on the examination of these procedures in small-scale assessments. Similar to previous research, increases in either test length or sample size resulted in increased power. The DIMTEST procedure appeared to be a conservative test of the null hypothesis of unidimensionality. The SGDDM statistic exhibited rejection rates near the nominal rate of .05 under unidimensional conditions, though the reliability of these results may have been less than optimal due to high sampling variability resulting from a relatively limited number of replications. Power values were at or near 1.0 for many of the multidimensional conditions. It was only when the sample size was reduced to N = 100 that the two approaches diverged in performance. Results suggested that both procedures may be appropriate for sample sizes as low as N = 250 and tests as short as J = 12 (SGDDM) or J = 19 (DIMTEST). When used as a diagnostic tool, SGDDM may be appropriate with as few as N = 100 cases combined with J = 12 items. The study was somewhat limited in that it did not include any complex factorial designs, nor were the strength of item discrimination parameters or correlation between factors manipulated. It is recommended that further research be conducted with the inclusion of these factors, as well as an increase in the number of replications when using the SGDDM procedure.
ContributorsReichenberg, Ray E (Author) / Levy, Roy (Thesis advisor) / Thompson, Marilyn S. (Thesis advisor) / Green, Samuel B. (Committee member) / Arizona State University (Publisher)
Created2013
152477-Thumbnail Image.png
Description
This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex performance assessment within a digital-simulation

This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex performance assessment within a digital-simulation educational context grounded in theories of cognition and learning. BN models were manipulated along two factors: latent variable dependency structure and number of latent classes. Distributions of posterior predicted p-values (PPP-values) served as the primary outcome measure and were summarized in graphical presentations, by median values across replications, and by proportions of replications in which the PPP-values were extreme. An effect size measure for PPMC was introduced as a supplemental numerical summary to the PPP-value. Consistent with previous PPMC research, all investigated fit functions tended to perform conservatively, but Standardized Generalized Dimensionality Discrepancy Measure (SGDDM), Yen's Q3, and Hierarchy Consistency Index (HCI) only mildly so. Adequate power to detect at least some types of misfit was demonstrated by SGDDM, Q3, HCI, Item Consistency Index (ICI), and to a lesser extent Deviance, while proportion correct (PC), a chi-square-type item-fit measure, Ranked Probability Score (RPS), and Good's Logarithmic Scale (GLS) were powerless across all investigated factors. Bivariate SGDDM and Q3 were found to provide powerful and detailed feedback for all investigated types of misfit.
ContributorsCrawford, Aaron (Author) / Levy, Roy (Thesis advisor) / Green, Samuel (Committee member) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)
Created2014
153391-Thumbnail Image.png
Description
Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS)

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers.
ContributorsMistler, Stephen (Author) / Enders, Craig K. (Thesis advisor) / Aiken, Leona (Committee member) / Levy, Roy (Committee member) / West, Stephen G. (Committee member) / Arizona State University (Publisher)
Created2015
153357-Thumbnail Image.png
Description
Many methodological approaches have been utilized to predict student retention and persistence over the years, yet few have utilized a Bayesian framework. It is believed this is due in part to the absence of an established process for guiding educational researchers reared in a frequentist perspective into the realms of

Many methodological approaches have been utilized to predict student retention and persistence over the years, yet few have utilized a Bayesian framework. It is believed this is due in part to the absence of an established process for guiding educational researchers reared in a frequentist perspective into the realms of Bayesian analysis and educational data mining. The current study aimed to address this by providing a model-building process for developing a Bayesian network (BN) that leveraged educational data mining, Bayesian analysis, and traditional iterative model-building techniques in order to predict whether community college students will stop out at the completion of each of their first six terms. The study utilized exploratory and confirmatory techniques to reduce an initial pool of more than 50 potential predictor variables to a parsimonious final BN with only four predictor variables. The average in-sample classification accuracy rate for the model was 80% (Cohen's κ = 53%). The model was shown to be generalizable across samples with an average out-of-sample classification accuracy rate of 78% (Cohen's κ = 49%). The classification rates for the BN were also found to be superior to the classification rates produced by an analog frequentist discrete-time survival analysis model.
ContributorsArcuria, Philip (Author) / Levy, Roy (Thesis advisor) / Green, Samuel B (Committee member) / Thompson, Marilyn S (Committee member) / Arizona State University (Publisher)
Created2015
156621-Thumbnail Image.png
Description
Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of mismatch in dimensionality between data and analysis models with multiple-group analyses at the population and sample levels. Datasets were generated using a bifactor model with different factor structures and were analyzed with bifactor and single-factor models to assess misspecification effects on assessments of MI and latent mean differences. As baseline models, the bifactor models fit data well and had minimal bias in latent mean estimation. However, the low convergence rates of fitting bifactor models to data with complex structures and small sample sizes caused concern. On the other hand, effects of fitting the misspecified single-factor models on the assessments of MI and latent means differed by the bifactor structures underlying data. For data following one general factor and one group factor affecting a small set of indicators, the effects of ignoring the group factor in analysis models on the tests of MI and latent mean differences were mild. In contrast, for data following one general factor and several group factors, oversimplifications of analysis models can lead to inaccurate conclusions regarding MI assessment and latent mean estimation.
ContributorsXu, Yuning (Author) / Green, Samuel (Thesis advisor) / Levy, Roy (Committee member) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)
Created2018
157145-Thumbnail Image.png
Description
A simulation study was conducted to explore the robustness of general factor mean difference estimation in bifactor ordered-categorical data. In the No Differential Item Functioning (DIF) conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of the general factor mean difference, and

A simulation study was conducted to explore the robustness of general factor mean difference estimation in bifactor ordered-categorical data. In the No Differential Item Functioning (DIF) conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of the general factor mean difference, and the size of specific factor loadings; in data analysis, misspecification conditions were introduced in which the generated bifactor data were fit using a unidimensional model, and/or ordered-categorical data were treated as continuous data. In the DIF conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of latent mean difference for the general factor, the type of item parameters that had DIF, and the magnitude of DIF; the data analysis conditions varied in whether or not setting equality constraints on the noninvariant item parameters.

Results showed that falsely fitting bifactor data using unidimensional models or failing to account for DIF in item parameters resulted in estimation bias in the general factor mean difference, while treating ordinal data as continuous had little influence on the estimation bias as long as there was no severe model misspecification. The extent of estimation bias produced by misspecification of bifactor datasets with unidimensional models was mainly determined by the degree of unidimensionality (i.e., size of specific factor loadings) and the general factor mean difference size. When the DIF was present, the estimation accuracy of the general factor mean difference was completely robust to ignoring noninvariance in specific factor loadings while it was very sensitive to failing to account for DIF in threshold parameters. With respect to ignoring the DIF in general factor loadings, the estimation bias of the general factor mean difference was substantial when the DIF was -0.15, and it can be negligible for smaller sizes of DIF. Despite the impact of model misspecification on estimation accuracy, the power to detect the general factor mean difference was mainly influenced by the sample size and effect size. Serious Type I error rate inflation only occurred when the DIF was present in threshold parameters.
ContributorsLiu, Yixing (Author) / Thompson, Marilyn (Thesis advisor) / Levy, Roy (Committee member) / O’Rourke, Holly (Committee member) / Arizona State University (Publisher)
Created2019
154498-Thumbnail Image.png
Description
A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both levels for a second-order CFA model. The numbers and directions of differences in noninvariant loadings or intercepts were also manipulated, along with total sample size and effect size of the second-order factor mean difference. Data were analyzed using correct and incorrect specifications of noninvariant loadings and intercepts. Results summarized across the 5,000 replications in each condition included Type I error rates and powers for the chi-square difference test and the Wald test of the second-order factor mean difference, estimation bias and efficiency for this latent mean difference, and means of the standardized root mean square residual (SRMR) and the root mean square error of approximation (RMSEA).

When the model was correctly specified, no obvious estimation bias was observed; when the model was misspecified by constraining noninvariant loadings or intercepts to be equal, the latent mean difference was overestimated if the direction of the difference in loadings or intercepts of was consistent with the direction of the latent mean difference, and vice versa. Increasing the number of noninvariant loadings or intercepts resulted in larger estimation bias if these noninvariant loadings or intercepts were constrained to be equal. Power to detect the latent mean difference was influenced by estimation bias and the estimated variance of the difference in the second-order factor mean, in addition to sample size and effect size. Constraining more parameters to be equal between groups—even when unequal in the population—led to a decrease in the variance of the estimated latent mean difference, which increased power somewhat. Finally, RMSEA was very sensitive for detecting misspecification due to improper equality constraints in all conditions in the current scenario, including the nonzero latent mean difference, but SRMR did not increase as expected when noninvariant parameters were constrained.
ContributorsLiu, Yixing (Author) / Thompson, Marilyn (Thesis advisor) / Green, Samuel (Committee member) / Levy, Roy (Committee member) / Arizona State University (Publisher)
Created2016
154063-Thumbnail Image.png
Description
Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance of several discrepancy functions was investigated in a Monte Carlo simulation study. The discrepancy functions of interest included two types of conditional concordance correlation (CCC) functions, two types of R2 functions, two types of standardized generalized dimensionality discrepancy (SGDDM) functions, the likelihood ratio (LR), and the likelihood ratio difference test (LRT). Key outcomes included effect sizes of the design factors on the realized values of discrepancy functions, distributions of posterior predictive p-values (PPP-values), and the proportion of extreme PPP-values.

In terms of the realized values, the behavior of the CCC and R2 functions were generally consistent with prior research. However, as diagnostics, these functions were extremely conservative even when some aspect of the data was unaccounted for. In contrast, the conditional SGDDM (SGDDMC), LR, and LRT were generally sensitive to the underspecifications investigated in this work on all outcomes considered. Although the proportions of extreme PPP-values for these functions tended to increase in null situations for non-normal data, this behavior may have reflected the true misfit that resulted from the specification of normal prior distributions. Importantly, the LR and the SGDDMC to a greater extent exhibited some potential for untangling the sources of data-model misfit. Owing to connections of growth curve models to the more fundamental frameworks of multilevel modeling, structural equation models with a mean structure, and Bayesian hierarchical models, the results of the current work may have broader implications that warrant further research.
ContributorsFay, Derek (Author) / Levy, Roy (Thesis advisor) / Thompson, Marilyn (Committee member) / Enders, Craig (Committee member) / Arizona State University (Publisher)
Created2015
154905-Thumbnail Image.png
Description
Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained using a large-scale longitudinal data set from North Carolina, provided by the National Center on Assessment and Accountability for Special Education (NCAASE). I discuss ranges of sample sizes that are inadequate or adequate for convergence, absolute bias, relative bias, root mean squared error (RMSE), and coverage of individual parameter estimates. The current study, with the help of a detailed two-part simulation design for various sample sizes, model complexity and ICCs, provides various options of adequate sample sizes under different conditions. This study emphasizes that adequate sample sizes at either L1, L2, and L3 can be adjusted according to different interests in parameter estimates, different ranges of acceptable absolute bias, relative bias, root mean squared error, and coverage. Under different model complexity and varying ICC conditions, this study aims to help researchers identify L1, L2, and L3 sample size or both as the source of variation in absolute bias, relative bias, RMSE, or coverage proportions for a certain parameter estimate. This assists researchers in making better decisions for selecting adequate sample sizes in a three-level HLM analysis. A limitation of the study was the use of only a single distribution for the dependent and explanatory variables, different types of distributions and their effects might result in different sample size recommendations.
ContributorsYel, Nedim (Author) / Levy, Roy (Thesis advisor) / Elliott, Stephen N. (Thesis advisor) / Schulte, Ann C (Committee member) / Iida, Masumi (Committee member) / Arizona State University (Publisher)
Created2016
155025-Thumbnail Image.png
Description
Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the

Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the structure of relationships in the data such as clustered/hierarchical data, b) do not allow or control for missing values present in the data, or c) do not accurately compensate for different data types such as categorical data, then the assumptions associated with the model have not been met and the results of the analysis may be inaccurate. In the presence of clustered
ested data, hierarchical linear modeling or multilevel modeling (MLM; Raudenbush & Bryk, 2002) has the ability to predict outcomes for each level of analysis and across multiple levels (accounting for relationships between levels) providing a significant advantage over single-level analyses. When multilevel data contain missingness, multilevel multiple imputation (MLMI) techniques may be used to model both the missingness and the clustered nature of the data. With categorical multilevel data with missingness, categorical MLMI must be used. Two such routines for MLMI with continuous and categorical data were explored with missing at random (MAR) data: a formal Bayesian imputation and analysis routine in JAGS (R/JAGS) and a common MLM procedure of imputation via Bayesian estimation in BLImP with frequentist analysis of the multilevel model in Mplus (BLImP/Mplus). Manipulated variables included interclass correlations, number of clusters, and the rate of missingness. Results showed that with continuous data, R/JAGS returned more accurate parameter estimates than BLImP/Mplus for almost all parameters of interest across levels of the manipulated variables. Both R/JAGS and BLImP/Mplus encountered convergence issues and returned inaccurate parameter estimates when imputing and analyzing dichotomous data. Follow-up studies showed that JAGS and BLImP returned similar imputed datasets but the choice of analysis software for MLM impacted the recovery of accurate parameter estimates. Implications of these findings and recommendations for further research will be discussed.
ContributorsKunze, Katie L (Author) / Levy, Roy (Thesis advisor) / Enders, Craig K. (Committee member) / Thompson, Marilyn S (Committee member) / Arizona State University (Publisher)
Created2016