Matching Items (22)

389-Thumbnail Image.png

The Political Legacy of School Accountability Systems

Description

The recent battle reported from Washington about proposed national testing program does not tell the most important political story about high stakes tests. Politically popular school accountability systems in many

The recent battle reported from Washington about proposed national testing program does not tell the most important political story about high stakes tests. Politically popular school accountability systems in many states already revolve around statistical results of testing with high-stakes environments. The future of high stakes tests thus does not depend on what happens on Capitol Hill. Rather, the existence of tests depends largely on the political culture of published test results. Most critics of high-stakes testing do not talk about that culture, however. They typically focus on the practice legacy of testing, the ways in which testing creates perverse incentives against good teaching.

More important may be the political legacy, or how testing defines legitimate discussion about school politics. The consequence of statistical accountability systems will be the narrowing of purpose for schools, impatience with reform, and the continuing erosion of political support for publicly funded schools. Dissent from the high-stakes accountability regime that has developed around standardized testing, including proposals for professionalism and performance assessment, commonly fails to consider these political legacies. Alternatives to standardized testing which do not also connect schooling with the public at large will not be politically viable.

Contributors

Created

Date Created
  • 1998

383-Thumbnail Image.png

Accountability in a Postdesegregation Era: The Continuing Significance of Racial Segregation

Description

In the wake of both the end of court-ordered school desegregation and the growing popularity of accountability as a mechanism to maximize student achievement, the authors explore the association between

In the wake of both the end of court-ordered school desegregation and the growing popularity of accountability as a mechanism to maximize student achievement, the authors explore the association between racial segregation and the percentage of students passing high-stakes tests in Florida's schools. Results suggest that segregation matters in predicting school-level performance on the Florida Comprehensive Assessment Test after control for other known and purported predictors of standardized test performance. Also, these results suggest that neither recent efforts by the state of Florida to equalize the funding of education nor current efforts involving high-stakes testing will close the Black-White achievement gap without consideration of the racial distribution of students across schools.

Contributors

Created

Date Created
  • 2004

390-Thumbnail Image.png

'Use it or Lose It' Professional Judgment: Educational Evaluation and Bayesian Reasoning

Description

This paper presents a Bayesian framework for evaluative classification. Current education policy debates center on arguments about whether and how to use student test score data in school and personnel

This paper presents a Bayesian framework for evaluative classification. Current education policy debates center on arguments about whether and how to use student test score data in school and personnel evaluation. Proponents of such use argue that refusing to use data violates both the public’s need to hold schools accountable when they use taxpayer dollars and students’ right to educational opportunities. Opponents of formulaic use of test-score data argue that most standardized test data is susceptible to fatal technical flaws, is a partial picture of student achievement, and leads to behavior that corrupts the measures.

A Bayesian perspective on summative ordinal classification is a possible framework for combining quantitative outcome data for students with the qualitative types of evaluation that critics of high-stakes testing advocate. This paper describes the key characteristics of a Bayesian perspective on classification, describes a method to translate a naïve Bayesian classifier into a point-based system for evaluation, and draws conclusions from the comparison on the construction of algorithmic (including point-based) systems that could capture the political and practical benefits of a Bayesian approach. The most important practical conclusion is that point-based systems with fixed components and weights cannot capture the dynamic and political benefits of a reciprocal relationship between professional judgment and quantitative student outcome data.

Contributors

Agent

Created

Date Created
  • 2009

388-Thumbnail Image.png

Testing like William the Conqueror

Description

The spread of academic testing for accountability purposes in multiple countries has obscured at least two historical purposes of academic testing: community ritual and management of the social structure. Testing

The spread of academic testing for accountability purposes in multiple countries has obscured at least two historical purposes of academic testing: community ritual and management of the social structure. Testing for accountability is very different from the purpose of academic challenges one can identify in community “examinations” in 19th century North America, or exams’ controlling access to the civil service in Imperial China. Rather than testing for ritual or access to mobility, the modern uses of testing are much closer to the state-building project of a tax census, such as the Domesday Book of medieval Britain after the Norman Invasion, the social engineering projects described in James Scott's Seeing like a State (1998), or the “mapping the world” project that David Nye described in America as Second Creation (2004). This paper will explore both the instrumental and cultural differences among testing as ritual, testing as mobility control, and testing as state-building.

Contributors

Agent

Created

Date Created
  • 2014-12-08

150522-Thumbnail Image.png

Native American students' perceptions of high-stakes testing in New Mexico

Description

Given the political and public demands for accountability, using the voices of students from the frontlines, this study investigated student perceptions of New Mexico's high-stakes testing program taking public schools

Given the political and public demands for accountability, using the voices of students from the frontlines, this study investigated student perceptions of New Mexico's high-stakes testing program taking public schools in the right direction. Did the students perceive the program having an impact on retention, drop outs, or graduation requirements? What were the perceptions of Navajo students in Navajo reservation schools as to the impact of high-stakes testing on their emotional, physical, social, and academic well-being? The specific tests examined were the New Mexico High School Competency Exam (NMHSCE) and the New Mexico Standard Based Assessment (SBA/ High School Graduation Assessment) on Native American students. Based on interviews published by the Daily Times of Farmington, New Mexico, our local newspaper, some of the students reported that the testing program was not taking schools in the right direction, that the test was used improperly, and that the one-time test scores were not an accurate assessment of students learning. In addition, they were cited on negative and positive effects on the curriculum, teaching and learning, and student and teacher motivation. Based on the survey results, the students' positive and negative concerns and praises of high-stakes testing were categorized into themes. The positive effects cited included the fact that the testing held students, educators, and parents accountable for their actions. The students were not opposed to accountability, but rather, opposed to the manner in which it was currently implemented. Several implications of these findings were examined: (a) requirements to pass the New Mexico High School Competency Exam; (b) what high stakes testing meant for the emotional well-being of the students; (c) the impact of sanctions under New Mexico's high-stakes testing proficiency; and (d) the effects of high-stakes tests on students' perceptions, experiences and attitudes. Student voices are not commonly heard in meetings and discussions about K-12 education policy. Yet, the adults who control policy could learn much from listening to what students have to say about their experiences.

Contributors

Agent

Created

Date Created
  • 2012

150518-Thumbnail Image.png

Assessment of item parameter drift of known items in a university placement exam

Description

ABSTRACT This study investigated the possibility of item parameter drift (IPD) in a calculus placement examination administered to approximately 3,000 students at a large university in the United States. A

ABSTRACT This study investigated the possibility of item parameter drift (IPD) in a calculus placement examination administered to approximately 3,000 students at a large university in the United States. A single form of the exam was administered continuously for a period of two years, possibly allowing later examinees to have prior knowledge of specific items on the exam. An analysis of IPD was conducted to explore evidence of possible item exposure. Two assumptions concerning items exposure were made: 1) item recall and item exposure are positively correlated, and 2) item exposure results in the items becoming easier over time. Special consideration was given to two contextual item characteristics: 1) item location within the test, specifically items at the beginning and end of the exam, and 2) the use of an associated diagram. The hypotheses stated that these item characteristics would make the items easier to recall and, therefore, more likely to be exposed, resulting in item drift. BILOG-MG 3 was used to calibrate the items and assess for IPD. No evidence was found to support the hypotheses that the items located at the beginning of the test or with an associated diagram drifted as a result of item exposure. Three items among the last ten on the exam drifted significantly and became easier, consistent with item exposure. However, in this study, the possible effects of item exposure could not be separated from the effects of other potential factors such as speededness, curriculum changes, better test preparation on the part of subsequent examinees, or guessing.

Contributors

Agent

Created

Date Created
  • 2012

156311-Thumbnail Image.png

Norming at scale: faculty perceptions of assessment culture and student learning outcomes assessment

Description

To foster both external and internal accountability, universities seek more effective models for student learning outcomes assessment (SLOA). Meaningful and authentic measurement of program-level student learning outcomes requires engagement with

To foster both external and internal accountability, universities seek more effective models for student learning outcomes assessment (SLOA). Meaningful and authentic measurement of program-level student learning outcomes requires engagement with an institution’s faculty members, especially to gather student performance assessment data using common scoring instruments, or rubrics, across a university’s many colleges and programs. Too often, however, institutions rely on faculty engagement for SLOA initiatives like this without providing necessary support, communication, and training. The resulting data may lack sufficient reliability and reflect deficiencies in an institution’s culture of assessment.

This mixed methods action research study gauged how well one form of SLOA training – a rubric-norming workshop – could affect both inter-rater reliability for faculty scorers and faculty perceptions of SLOA while exploring the nature of faculty collaboration toward a shared understanding of student learning outcomes. The study participants, ten part-time faculty members at the institution, each held primary careers in the health care industry, apart from their secondary role teaching university courses. Accordingly, each contributed expertise and experience to the rubric-norming discussions, surveys of assessment-related perceptions, and individual scoring of student performance with a common rubric. Drawing on sociocultural learning principles and the specific lens of activity theory, influences on faculty SLOA were arranged and analyzed within the heuristic framework of an activity system to discern effects of collaboration and perceptions toward SLOA on consistent rubric-scoring by faculty participants.

Findings suggest participation in the study did not correlate to increased inter-rater reliability for faculty scorers when using the common rubric. Constraints found within assessment tools and unclear institutional leadership prevented more reliable use of common rubrics. Instead, faculty participants resorted to individual assessment approaches to meaningfully guide students to classroom achievement and preparation for careers in the health care field. Despite this, faculty participants valued SLOA, collaborated readily with colleagues for shared assessment goals, and worked hard to teach and assess students meaningfully.

Contributors

Agent

Created

Date Created
  • 2018

156269-Thumbnail Image.png

It makes me sad because I think-- I can never be good enough: what students are saying about high-stakes testing

Description

Much research has been conducted regarding the current state of public education within the United States. Very little of that research bodes well for the system’s current circumstances or for

Much research has been conducted regarding the current state of public education within the United States. Very little of that research bodes well for the system’s current circumstances or for the direction our system is headed. The debate stems around two opposing ideologies. One believes that there needs to be more accountability via high-stakes testing and the continuum of the status quo that the country has maintained for centuries, regardless of the effect it may be having on the students’ well-being. While the opposing view sees high-stakes testing as a contributing factor to the seemingly unproductive, chaotic, and even harmful conundrum of bias and hegemony that shows a positive correlation of deleterious effects to student well-being. Although this paper references the research of highly esteemed scholars, it asserts that the voices of those that are most relegated to that of undervalued and ignored are precisely the voices that need to be gleaned most relevant. This paper’s purpose is to hear what the ‘experts’ in the field of education, the students themselves, have to say.

Contributors

Agent

Created

Date Created
  • 2018

150494-Thumbnail Image.png

A correlated random effects model for nonignorable missing data in value-added assessment of teacher effects

Description

Value-added models (VAMs) are used by many states to assess contributions of individual teachers and schools to students' academic growth. The generalized persistence VAM, one of the most flexible in

Value-added models (VAMs) are used by many states to assess contributions of individual teachers and schools to students' academic growth. The generalized persistence VAM, one of the most flexible in the literature, estimates the ``value added'' by individual teachers to their students' current and future test scores by employing a mixed model with a longitudinal database of test scores. There is concern, however, that missing values that are common in the longitudinal student scores can bias value-added assessments, especially when the models serve as a basis for personnel decisions -- such as promoting or dismissing teachers -- as they are being used in some states. Certain types of missing data require that the VAM be modeled jointly with the missingness process in order to obtain unbiased parameter estimates. This dissertation studies two problems. First, the flexibility and multimembership random effects structure of the generalized persistence model lead to computational challenges that have limited the model's availability. To this point, no methods have been developed for scalable maximum likelihood estimation of the model. An EM algorithm to compute maximum likelihood estimates efficiently is developed, making use of the sparse structure of the random effects and error covariance matrices. The algorithm is implemented in the package GPvam in R statistical software. Illustrations of the gains in computational efficiency achieved by the estimation procedure are given. Furthermore, to address the presence of potentially nonignorable missing data, a flexible correlated random effects model is developed that extends the generalized persistence model to jointly model the test scores and the missingness process, allowing the process to depend on both students and teachers. The joint model gives the ability to test the sensitivity of the VAM to the presence of nonignorable missing data. Estimation of the model is challenging due to the non-hierarchical dependence structure and the resulting intractable high-dimensional integrals. Maximum likelihood estimation of the model is performed using an EM algorithm with fully exponential Laplace approximations for the E step. The methods are illustrated with data from university calculus classes and with data from standardized test scores from an urban school district.

Contributors

Agent

Created

Date Created
  • 2012

150355-Thumbnail Image.png

The factor structure of the English language development assessment: a confirmatory factor analysis

Description

This study investigated the internal factor structure of the English language development Assessment (ELDA) using confirmatory factor analysis. ELDA is an English language proficiency test developed by a consortium of

This study investigated the internal factor structure of the English language development Assessment (ELDA) using confirmatory factor analysis. ELDA is an English language proficiency test developed by a consortium of multiple states and is used to identify and reclassify English language learners in kindergarten to grade 12. Scores on item parcels based on the standards tested from the four domains of reading, writing, listening, and speaking were used for the analyses. Five different factor models were tested: a single factor model, a correlated two-factor model, a correlated four-factor model, a second-order factor model and a bifactor model. The results indicate that the four-factor model, second-order model, and bifactor model fit the data well. The four-factor model hypothesized constructs for reading, writing, listening and speaking. The second-order model hypothesized a second-order English language proficiency factor as well as the four lower-order factors of reading, writing, listening and speaking. The bifactor model hypothesized a general English language proficiency factor as well as the four domain specific factors of reading, writing, listening, and speaking. The Chi-square difference tests indicated that the bifactor model best explains the factor structure of the ELDA. The results from this study are consistent with the findings in the literature about the multifactorial nature of language but differ from the conclusion about the factor structures reported in previous studies. The overall proficiency levels on the ELDA gives more weight to the reading and writing sections of the test than the speaking and listening sections. This study has implications on the rules used for determining proficiency levels and recommends the use of conjunctive scoring where all constructs are weighted equally contrary to current practice.

Contributors

Agent

Created

Date Created
  • 2011