Filtering by
- All Subjects: Evaluation
- All Subjects: Teacher evaluation
- Creators: Amrein-Beardsley, Audrey
- Creators: Armfield, Jessica Ann
- Status: Published
This study examines validity evidence of a state policy-directed teacher evaluation system implemented in Arizona during school year 2012-2013. The purpose was to evaluate the warrant for making high stakes, consequential judgments of teacher competence based on value-added (VAM) estimates of instructional impact and observations of professional practice (PP). The research also explores educator influence (voice) in evaluation design and the role information brokers have in local decision making. Findings are situated in an evidentiary and policy context at both the LEA and state policy levels.
The study employs a single-phase, concurrent, mixed-methods research design triangulating multiple sources of qualitative and quantitative evidence onto a single (unified) validation construct: Teacher Instructional Quality. It focuses on assessing the characteristics of metrics used to construct quantitative ratings of instructional competence and the alignment of stakeholder perspectives to facets implicit in the evaluation framework. Validity examinations include assembly of criterion, content, reliability, consequential and construct articulation evidences. Perceptual perspectives were obtained from teachers, principals, district leadership, and state policy decision makers. Data for this study came from a large suburban public school district in metropolitan Phoenix, Arizona.
Study findings suggest that the evaluation framework is insufficient for supporting high stakes, consequential inferences of teacher instructional quality. This is based, in part on the following: (1) Weak associations between VAM and PP metrics; (2) Unstable VAM measures across time and between tested content areas; (3) Less than adequate scale reliabilities; (4) Lack of coherence between theorized and empirical PP factor structures; (5) Omission/underrepresentation of important instructional attributes/effects; (6) Stakeholder concerns over rater consistency, bias, and the inability of test scores to adequately represent instructional competence; (7) Negative sentiments regarding the system's ability to improve instructional competence and/or student learning; (8) Concerns regarding unintended consequences including increased stress, lower morale, harm to professional identity, and restricted learning opportunities; and (9) The general lack of empowerment and educator exclusion from the decision making process. Study findings also highlight the value of information brokers in policy decision making and the importance of having access to unbiased empirical information during the design and implementation phases of important change initiatives.
Preventing heat-associated morbidity and mortality is a public health priority in Maricopa County, Arizona (United States). The objective of this project was to evaluate Maricopa County cooling centers and gain insight into their capacity to provide relief for the public during extreme heat events. During the summer of 2014, 53 cooling centers were evaluated to assess facility and visitor characteristics. Maricopa County staff collected data by directly observing daily operations and by surveying managers and visitors. The cooling centers in Maricopa County were often housed within community, senior, or religious centers, which offered various services for at least 1500 individuals daily. Many visitors were unemployed and/or homeless. Many learned about a cooling center by word of mouth or by having seen the cooling center’s location. The cooling centers provide a valuable service and reach some of the region’s most vulnerable populations. This project is among the first to systematically evaluate cooling centers from a public health perspective and provides helpful insight to community leaders who are implementing or improving their own network of cooling centers.
For the first part of the analysis, the author collected and analyzed documents and field notes related to the teacher evaluation system at one urban middle school. The analysis included official policy documents, official White House speeches and press releases, evaluation system promotional materials, evaluator training materials, and the like. For the second part of the analysis, she interviewed teachers and their evaluators at the local middle school in order to understand how the participants had embodied the market-based discourse to define themselves as teachers and qualify their practice, quality, and worth accordingly.
The findings of the study suggest that teacher evaluation policies, practices, and instruments make possible a variety of techniques, such as numericization, hierarchical surveillance, normalizing judgments, and audit, in order to first make teachers objects of knowledge and then act upon that knowledge to manage teachers' conduct. The author also found that teachers and their evaluators have taken up this discourse in order to think about and act upon themselves as responsibilized subjects. Ultimately, the author argues that while much of the attention related to teacher evaluations has focused on the instruments used to measure the construct of teacher quality, that teacher evaluation instruments work in a mutually constitutive ways to discursively shape the construct of teacher quality.
shift in how teacher evaluation policies govern the evaluation of their performance.
Spurred by federal mandates, teachers have been increasingly held accountable for their
students’ academic achievement, most notably through the use of value-added models
(VAMs)—a statistically complex tool that aims to isolate and then quantify the effect of
teachers on their students’ achievement. This increased focus on accountability ultimately
resulted in numerous lawsuits across the U.S. where teachers protested what they felt
were unfair evaluations informed by invalid, unreliable, and biased measures—most
notably VAMs.
While New Mexico’s teacher evaluation system was labeled as a “gold standard”
due to its purported ability to objectively and accurately differentiate between effective
and ineffective teachers, in 2015, teachers filed suit contesting the fairness and accuracy
of their evaluations. Amrein-Beardsley and Geiger’s (revise and resubmit) initial analyses
of the state’s teacher evaluation data revealed that the four individual measures
comprising teachers’ overall evaluation scores showed evidence of bias, and specifically,
teachers who taught in schools with different student body compositions (e.g., special
education students, poorer students, gifted students) had significantly different scores
than their peers. The purpose of this study was to expand upon these prior analyses by
investigating whether those conclusions still held true when controlling for a variety of
confounding factors at the school, class, and teacher levels, as such covariates were not
included in prior analyses.
Results from multiple linear regression analyses indicated that, overall, the
measures used to inform New Mexico teachers’ overall evaluation scores still showed
evidence of bias by school-level student demographic factors, with VAMs potentially
being the most susceptible and classroom observations being the least. This study is
especially unique given the juxtaposition of such a highly touted evaluation system also
being one where teachers contested its constitutionality. Study findings are important for
all education stakeholders to consider, especially as teacher evaluation systems and
related policies continue to be transformed.