Matching Items (3)
Filtering by

Clear all filters

149935-Thumbnail Image.png
Description
The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in compensatory and noncompensatory multidimensional item response models (MIRT) of assessment data using dimensionality assessment procedures based on conditional covariances (i.e., DETECT) and a factor analytical approach (i.e., NOHARM). The DETECT-based methods typically outperformed

The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in compensatory and noncompensatory multidimensional item response models (MIRT) of assessment data using dimensionality assessment procedures based on conditional covariances (i.e., DETECT) and a factor analytical approach (i.e., NOHARM). The DETECT-based methods typically outperformed the NOHARM-based methods in both two- (2D) and three-dimensional (3D) compensatory MIRT conditions. The DETECT-based methods yielded high proportion correct, especially when correlations were .60 or smaller, data exhibited 30% or less complexity, and larger sample size. As the complexity increased and the sample size decreased, the performance typically diminished. As the complexity increased, it also became more difficult to label the resulting sets of items from DETECT in terms of the dimensions. DETECT was consistent in classification of simple items, but less consistent in classification of complex items. Out of the three NOHARM-based methods, χ2G/D and ALR generally outperformed RMSR. χ2G/D was more accurate when N = 500 and complexity levels were 30% or lower. As the number of items increased, ALR performance improved at correlation of .60 and 30% or less complexity. When the data followed a noncompensatory MIRT model, the NOHARM-based methods, specifically χ2G/D and ALR, were the most accurate of all five methods. The marginal proportions for labeling sets of items as dimension-like were typically low, suggesting that the methods generally failed to label two (three) sets of items as dimension-like in 2D (3D) noncompensatory situations. The DETECT-based methods were more consistent in classifying simple items across complexity levels, sample sizes, and correlations. However, as complexity and correlation levels increased the classification rates for all methods decreased. In most conditions, the DETECT-based methods classified complex items equally or more consistent than the NOHARM-based methods. In particular, as complexity, the number of items, and the true dimensionality increased, the DETECT-based methods were notably more consistent than any NOHARM-based method. Despite DETECT's consistency, when data follow a noncompensatory MIRT model, the NOHARM-based method should be preferred over the DETECT-based methods to assess dimensionality due to poor performance of DETECT in identifying the true dimensionality.
ContributorsSvetina, Dubravka (Author) / Levy, Roy (Thesis advisor) / Gorin, Joanna S. (Committee member) / Millsap, Roger (Committee member) / Arizona State University (Publisher)
Created2011
390-Thumbnail Image.png
Description

This paper presents a Bayesian framework for evaluative classification. Current education policy debates center on arguments about whether and how to use student test score data in school and personnel evaluation. Proponents of such use argue that refusing to use data violates both the public’s need to hold schools accountable

This paper presents a Bayesian framework for evaluative classification. Current education policy debates center on arguments about whether and how to use student test score data in school and personnel evaluation. Proponents of such use argue that refusing to use data violates both the public’s need to hold schools accountable when they use taxpayer dollars and students’ right to educational opportunities. Opponents of formulaic use of test-score data argue that most standardized test data is susceptible to fatal technical flaws, is a partial picture of student achievement, and leads to behavior that corrupts the measures.

A Bayesian perspective on summative ordinal classification is a possible framework for combining quantitative outcome data for students with the qualitative types of evaluation that critics of high-stakes testing advocate. This paper describes the key characteristics of a Bayesian perspective on classification, describes a method to translate a naïve Bayesian classifier into a point-based system for evaluation, and draws conclusions from the comparison on the construction of algorithmic (including point-based) systems that could capture the political and practical benefits of a Bayesian approach. The most important practical conclusion is that point-based systems with fixed components and weights cannot capture the dynamic and political benefits of a reciprocal relationship between professional judgment and quantitative student outcome data.

ContributorsDorn, Sherman (Author) / Mary Lou Fulton Teachers College (Contributor)
Created2009
388-Thumbnail Image.png
Description

The spread of academic testing for accountability purposes in multiple countries has obscured at least two historical purposes of academic testing: community ritual and management of the social structure. Testing for accountability is very different from the purpose of academic challenges one can identify in community “examinations” in 19th century

The spread of academic testing for accountability purposes in multiple countries has obscured at least two historical purposes of academic testing: community ritual and management of the social structure. Testing for accountability is very different from the purpose of academic challenges one can identify in community “examinations” in 19th century North America, or exams’ controlling access to the civil service in Imperial China. Rather than testing for ritual or access to mobility, the modern uses of testing are much closer to the state-building project of a tax census, such as the Domesday Book of medieval Britain after the Norman Invasion, the social engineering projects described in James Scott's Seeing like a State (1998), or the “mapping the world” project that David Nye described in America as Second Creation (2004). This paper will explore both the instrumental and cultural differences among testing as ritual, testing as mobility control, and testing as state-building.

ContributorsDorn, Sherman (Author) / Mary Lou Fulton Teachers College (Contributor)
Created2014-12-08