Search Content

Impact of violations of longitudinal measurement invariance in latent growth models and autoregressive quasi-simplex models

Description

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using composites of items is that smaller sample sizes are required…

In order to analyze data from an instrument administered at multiple time points it is a common practice to form composites of the items at each wave and to fit a longitudinal model to the composites. The advantage of using composites of items is that smaller sample sizes are required in contrast to second order models that include the measurement and the structural relationships among the variables. However, the use of composites assumes that longitudinal measurement invariance holds; that is, it is assumed that that the relationships among the items and the latent variables remain constant over time. Previous studies conducted on latent growth models (LGM) have shown that when longitudinal metric invariance is violated, the parameter estimates are biased and that mistaken conclusions about growth can be made. The purpose of the current study was to examine the impact of non-invariant loadings and non-invariant intercepts on two longitudinal models: the LGM and the autoregressive quasi-simplex model (AR quasi-simplex). A second purpose was to determine if there are conditions in which researchers can reach adequate conclusions about stability and growth even in the presence of violations of invariance. A Monte Carlo simulation study was conducted to achieve the purposes. The method consisted of generating items under a linear curve of factors model (COFM) or under the AR quasi-simplex. Composites of the items were formed at each time point and analyzed with a linear LGM or an AR quasi-simplex model. The results showed that AR quasi-simplex model yielded biased path coefficients only in the conditions with large violations of invariance. The fit of the AR quasi-simplex was not affected by violations of invariance. In general, the growth parameter estimates of the LGM were biased under violations of invariance. Further, in the presence of non-invariant loadings the rejection rates of the hypothesis of linear growth increased as the proportion of non-invariant items and as the magnitude of violations of invariance increased. A discussion of the results and limitations of the study are provided as well as general recommendations.

ContributorsOlivera-Aguilar, Margarita (Author) / Millsap, Roger E. (Thesis advisor) / Levy, Roy (Committee member) / MacKinnon, David (Committee member) / West, Stephen G. (Committee member) / Arizona State University (Publisher)

Created2013

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

Description

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’…

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions.

ContributorsValdivia, Arturo (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Broatch, Jennifer (Committee member) / Arizona State University (Publisher)

Created2013

A comparison of DIMTEST and generalized dimensionality discrepancy approaches to assessing dimensionality in item response theory

Description

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools;…

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools; see e.g., Froelich & Habing, 2007). It remains to be seen how such procedures perform in the context of small-scale assessments characterized by relatively small sample sizes and/or short tests. The fact that some procedures come with minimum allowable values for characteristics of the data, such as the number of items, may even render them unusable for some small-scale assessments. Other measures designed to assess dimensionality do not come with such limitations and, as such, may perform better under conditions that do not lend themselves to evaluation via statistics that rely on asymptotic theory. The current work aimed to evaluate the performance of one such metric, the standardized generalized dimensionality discrepancy measure (SGDDM; Levy & Svetina, 2011; Levy, Xu, Yel, & Svetina, 2012), under both large- and small-scale testing conditions. A Monte Carlo study was conducted to compare the performance of DIMTEST and the SGDDM statistic in terms of evaluating assumptions of unidimensionality in item response data under a variety of conditions, with an emphasis on the examination of these procedures in small-scale assessments. Similar to previous research, increases in either test length or sample size resulted in increased power. The DIMTEST procedure appeared to be a conservative test of the null hypothesis of unidimensionality. The SGDDM statistic exhibited rejection rates near the nominal rate of .05 under unidimensional conditions, though the reliability of these results may have been less than optimal due to high sampling variability resulting from a relatively limited number of replications. Power values were at or near 1.0 for many of the multidimensional conditions. It was only when the sample size was reduced to N = 100 that the two approaches diverged in performance. Results suggested that both procedures may be appropriate for sample sizes as low as N = 250 and tests as short as J = 12 (SGDDM) or J = 19 (DIMTEST). When used as a diagnostic tool, SGDDM may be appropriate with as few as N = 100 cases combined with J = 12 items. The study was somewhat limited in that it did not include any complex factorial designs, nor were the strength of item discrimination parameters or correlation between factors manipulated. It is recommended that further research be conducted with the inclusion of these factors, as well as an increase in the number of replications when using the SGDDM procedure.

ContributorsReichenberg, Ray E (Author) / Levy, Roy (Thesis advisor) / Thompson, Marilyn S. (Thesis advisor) / Green, Samuel B. (Committee member) / Arizona State University (Publisher)

Created2013

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs…

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

ContributorsIsmay, Chester (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Kao, Ming-Hung (Committee member) / Lanchier, Nicolas (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2013

Posterior predictive model checking in Bayesian networks

Description

This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex performance assessment within a digital-simulation…

This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex performance assessment within a digital-simulation educational context grounded in theories of cognition and learning. BN models were manipulated along two factors: latent variable dependency structure and number of latent classes. Distributions of posterior predicted p-values (PPP-values) served as the primary outcome measure and were summarized in graphical presentations, by median values across replications, and by proportions of replications in which the PPP-values were extreme. An effect size measure for PPMC was introduced as a supplemental numerical summary to the PPP-value. Consistent with previous PPMC research, all investigated fit functions tended to perform conservatively, but Standardized Generalized Dimensionality Discrepancy Measure (SGDDM), Yen's Q3, and Hierarchy Consistency Index (HCI) only mildly so. Adequate power to detect at least some types of misfit was demonstrated by SGDDM, Q3, HCI, Item Consistency Index (ICI), and to a lesser extent Deviance, while proportion correct (PC), a chi-square-type item-fit measure, Ranked Probability Score (RPS), and Good's Logarithmic Scale (GLS) were powerless across all investigated factors. Bivariate SGDDM and Q3 were found to provide powerful and detailed feedback for all investigated types of misfit.

ContributorsCrawford, Aaron (Author) / Levy, Roy (Thesis advisor) / Green, Samuel (Committee member) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)

Created2014

Obtaining accurate estimates of the mediated effect with and without prior information

Description

Research methods based on the frequentist philosophy use prior information in a priori power calculations and when determining the necessary sample size for the detection of an effect, but not in statistical analyses. Bayesian methods incorporate prior knowledge into the statistical analysis in the form of a prior distribution. When…

Research methods based on the frequentist philosophy use prior information in a priori power calculations and when determining the necessary sample size for the detection of an effect, but not in statistical analyses. Bayesian methods incorporate prior knowledge into the statistical analysis in the form of a prior distribution. When prior information about a relationship is available, the estimates obtained could differ drastically depending on the choice of Bayesian or frequentist method. Study 1 in this project compared the performance of five methods for obtaining interval estimates of the mediated effect in terms of coverage, Type I error rate, empirical power, interval imbalance, and interval width at N = 20, 40, 60, 100 and 500. In Study 1, Bayesian methods with informative prior distributions performed almost identically to Bayesian methods with diffuse prior distributions, and had more power than normal theory confidence limits, lower Type I error rates than the percentile bootstrap, and coverage, interval width, and imbalance comparable to normal theory, percentile bootstrap, and the bias-corrected bootstrap confidence limits. Study 2 evaluated if a Bayesian method with true parameter values as prior information outperforms the other methods. The findings indicate that with true values of parameters as the prior information, Bayesian credibility intervals with informative prior distributions have more power, less imbalance, and narrower intervals than Bayesian credibility intervals with diffuse prior distributions, normal theory, percentile bootstrap, and bias-corrected bootstrap confidence limits. Study 3 examined how much power increases when increasing the precision of the prior distribution by a factor of ten for either the action or the conceptual path in mediation analysis. Power generally increases with increases in precision but there are many sample size and parameter value combinations where precision increases by a factor of 10 do not lead to substantial increases in power.

ContributorsMiocevic, Milica (Author) / Mackinnon, David P. (Thesis advisor) / Levy, Roy (Committee member) / West, Stephen G. (Committee member) / Enders, Craig (Committee member) / Arizona State University (Publisher)

Created2014

Multilevel multiple imputation: an examination of competing methods

Description

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS)…

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers.

ContributorsMistler, Stephen (Author) / Enders, Craig K. (Thesis advisor) / Aiken, Leona (Committee member) / Levy, Roy (Committee member) / West, Stephen G. (Committee member) / Arizona State University (Publisher)

Created2015

Applying academic analytics: developing a process for utilizing Bayesian networks to predict stopping out among community college students

Description

Many methodological approaches have been utilized to predict student retention and persistence over the years, yet few have utilized a Bayesian framework. It is believed this is due in part to the absence of an established process for guiding educational researchers reared in a frequentist perspective into the realms of…

Many methodological approaches have been utilized to predict student retention and persistence over the years, yet few have utilized a Bayesian framework. It is believed this is due in part to the absence of an established process for guiding educational researchers reared in a frequentist perspective into the realms of Bayesian analysis and educational data mining. The current study aimed to address this by providing a model-building process for developing a Bayesian network (BN) that leveraged educational data mining, Bayesian analysis, and traditional iterative model-building techniques in order to predict whether community college students will stop out at the completion of each of their first six terms. The study utilized exploratory and confirmatory techniques to reduce an initial pool of more than 50 potential predictor variables to a parsimonious final BN with only four predictor variables. The average in-sample classification accuracy rate for the model was 80% (Cohen's κ = 53%). The model was shown to be generalizable across samples with an average out-of-sample classification accuracy rate of 78% (Cohen's κ = 49%). The classification rates for the BN were also found to be superior to the classification rates produced by an analog frequentist discrete-time survival analysis model.

ContributorsArcuria, Philip (Author) / Levy, Roy (Thesis advisor) / Green, Samuel B (Committee member) / Thompson, Marilyn S (Committee member) / Arizona State University (Publisher)

Created2015

Nonword item generation: predicting item difficulty in nonword repetition

Description

The current study employs item difficulty modeling procedures to evaluate the feasibility of potential generative item features for nonword repetition. Specifically, the extent to which the manipulated item features affect the theoretical mechanisms that underlie nonword repetition accuracy was estimated. Generative item features were based on the phonological loop component…

The current study employs item difficulty modeling procedures to evaluate the feasibility of potential generative item features for nonword repetition. Specifically, the extent to which the manipulated item features affect the theoretical mechanisms that underlie nonword repetition accuracy was estimated. Generative item features were based on the phonological loop component of Baddelely's model of working memory which addresses phonological short-term memory (Baddeley, 2000, 2003; Baddeley & Hitch, 1974). Using researcher developed software, nonwords were generated to adhere to the phonological constraints of Spanish. Thirty-six nonwords were chosen based on the set item features identified by the proposed cognitive processing model. Using a planned missing data design, two-hundred fifteen Spanish-English bilingual children were administered 24 of the 36 generated nonwords. Multiple regression and explanatory item response modeling techniques (e.g., linear logistic test model, LLTM; Fischer, 1973) were used to estimate the impact of item features on item difficulty. The final LLTM included three item radicals and two item incidentals. Results indicated that the LLTM predicted item difficulties were highly correlated with the Rasch item difficulties (r = .89) and accounted for a substantial amount of the variance in item difficulty (R2 = .79). The findings are discussed in terms of validity evidence in support of using the phonological loop component of Baddeley's model (2000) as a cognitive processing model for nonword repetition items and the feasibility of using the proposed radical structure as an item blueprint for the future generation of nonword repetition items.

ContributorsMorgan, Gareth Philip (Author) / Gorin, Joanna (Thesis advisor) / Levy, Roy (Committee member) / Gray, Shelley (Committee member) / Arizona State University (Publisher)

Created2011

Chi-square orthogonal components for assessing goodness-of-fit of multidimensional multinomial data

Description

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among…

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components are used to develop goodness-of-fit tests for models fit to the counts obtained from the cross-classification of multi-category dependent variables. Ordinal categories are assumed. Orthogonal components defined on marginals are obtained when analyzing multi-dimensional contingency tables through the use of the QR decomposition. A subset of these orthogonal components can be used to construct limited-information tests that allow one to identify the source of lack-of-fit and provide an increase in power compared to Pearson's test. These tests can address the adverse effects presented when data are sparse. The tests rely on the set of first- and second-order marginals jointly, the set of second-order marginals only, and the random forest method, a popular algorithm for modeling large complex data sets. The performance of these tests is compared to the likelihood ratio test as well as to tests based on orthogonal polynomial components. The derived goodness-of-fit tests are evaluated with studies for detecting two- and three-way associations that are not accounted for by a categorical variable factor model with a single latent variable. In addition the tests are used to investigate the case when the model misspecification involves parameter constraints for large and sparse contingency tables. The methodology proposed here is applied to data from the 38th round of the State Survey conducted by the Institute for Public Policy and Michigan State University Social Research (2005) . The results illustrate the use of the proposed techniques in the context of a sparse data set.

ContributorsMilovanovic, Jelena (Author) / Young, Dennis (Thesis advisor) / Reiser, Mark R. (Thesis advisor) / Wilson, Jeffrey (Committee member) / Eubank, Randall (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by