Search Content

Houston, we have a problem: studying the SAS Education Value-Added Assessment System (EVAAS) from teachers' perspectives in the Houston Independent School District (HISD)

Description

This study examined the intended and unintended consequences associated with the Education Value-Added Assessment System (EVAAS) as perceived and experienced by teachers in the Houston Independent School District (HISD). To evaluate teacher effectiveness, HISD is using EVAAS for high-stakes consequences more than any other district or state in the country.…

This study examined the intended and unintended consequences associated with the Education Value-Added Assessment System (EVAAS) as perceived and experienced by teachers in the Houston Independent School District (HISD). To evaluate teacher effectiveness, HISD is using EVAAS for high-stakes consequences more than any other district or state in the country. A large-scale electronic survey was used to investigate the model's reliability and validity; to determine whether teachers used the EVAAS data in formative ways as intended; to gather teachers' opinions on EVAAS's claimed benefits and statements; and to understand the unintended consequences that occurred as a result of EVAAS use in HISD. Mixed methods data collection and analyses were used to present the findings in user-friendly ways, particularly when using the words and experiences of the teachers themselves. Results revealed that the reliability of the EVAAS model produced split and inconsistent results among teacher participants, and teachers indicated that students biased the EVAAS results. The majority of teachers did not report similar EVAAS and principal observation scores, reducing the criterion-related validity of both measures of teacher quality. Teachers revealed discrepancies in the distribution of EVAAS reports, the awareness of trainings offered, and among principals' understanding of EVAAS across the district. This resulted in an underwhelming number of teachers who reportedly used EVAAS data for formative purposes. Teachers disagreed with EVAAS marketing claims, implying the majority did not believe EVAAS worked as intended and promoted. Additionally, many unintended consequences associated with the high-stakes use of EVAAS emerged through teachers' responses, which revealed among others that teachers felt heightened pressure and competition, which reduced morale and collaboration, and encouraged cheating or teaching to the test in attempt to raise EVAAS scores. This study is one of the first to investigate how the EVAAS model works in practice and provides a glimpse of whether value-added models might produce desired outcomes and encourage best teacher practices. This is information of which policymakers, researchers, and districts should be aware and consider when implementing the EVAAS, or any value-added model for teacher evaluation, as many of the reported issues are not specific to the EVAAS model.

ContributorsCollins, Clarin (Author) / Amrein-Beardsley, Audrey (Thesis advisor) / Berliner, David C. (Committee member) / Fischman, Gustavo E (Committee member) / Arizona State University (Publisher)

Created2012

Examining the validity of a state policy-directed framework for evaluating teacher instructional quality: informing policy, impacting practice

Description

ABSTRACT

This study examines validity evidence of a state policy-directed teacher evaluation system implemented in Arizona during school year 2012-2013. The purpose was to evaluate the warrant for making high stakes, consequential judgments of teacher competence based on value-added (VAM) estimates of instructional impact and observations of professional practice (PP). …

ABSTRACT

This study examines validity evidence of a state policy-directed teacher evaluation system implemented in Arizona during school year 2012-2013. The purpose was to evaluate the warrant for making high stakes, consequential judgments of teacher competence based on value-added (VAM) estimates of instructional impact and observations of professional practice (PP). The research also explores educator influence (voice) in evaluation design and the role information brokers have in local decision making. Findings are situated in an evidentiary and policy context at both the LEA and state policy levels.

The study employs a single-phase, concurrent, mixed-methods research design triangulating multiple sources of qualitative and quantitative evidence onto a single (unified) validation construct: Teacher Instructional Quality. It focuses on assessing the characteristics of metrics used to construct quantitative ratings of instructional competence and the alignment of stakeholder perspectives to facets implicit in the evaluation framework. Validity examinations include assembly of criterion, content, reliability, consequential and construct articulation evidences. Perceptual perspectives were obtained from teachers, principals, district leadership, and state policy decision makers. Data for this study came from a large suburban public school district in metropolitan Phoenix, Arizona.

Study findings suggest that the evaluation framework is insufficient for supporting high stakes, consequential inferences of teacher instructional quality. This is based, in part on the following: (1) Weak associations between VAM and PP metrics; (2) Unstable VAM measures across time and between tested content areas; (3) Less than adequate scale reliabilities; (4) Lack of coherence between theorized and empirical PP factor structures; (5) Omission/underrepresentation of important instructional attributes/effects; (6) Stakeholder concerns over rater consistency, bias, and the inability of test scores to adequately represent instructional competence; (7) Negative sentiments regarding the system's ability to improve instructional competence and/or student learning; (8) Concerns regarding unintended consequences including increased stress, lower morale, harm to professional identity, and restricted learning opportunities; and (9) The general lack of empowerment and educator exclusion from the decision making process. Study findings also highlight the value of information brokers in policy decision making and the importance of having access to unbiased empirical information during the design and implementation phases of important change initiatives.

ContributorsSloat, Edward F. (Author) / Wetzel, Keith (Thesis advisor) / Amrein-Beardsley, Audrey (Thesis advisor) / Ewbank, Ann (Committee member) / Shough, Lori (Committee member) / Arizona State University (Publisher)

Created2015

Three essays on comparative simulation in three-level hierarchical data structure

Description

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression…

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression and covariance parameters as well as the intraclass correlation coefficients are usually obtained. In those cases, I have resorted to use of Laplace approximation and large sample theory approach for point and interval estimates such as Wald-type confidence intervals and profile likelihood confidence intervals. These methods rely on distributional assumptions and large sample theory. However, when dealing with small hierarchical datasets they often result in severe bias or non-convergence. I present a generalized quasi-likelihood approach and a generalized method of moments approach; both do not rely on any distributional assumptions but only moments of response. As an alternative to the typical large sample theory approach, I present bootstrapping hierarchical logistic regression models which provides more accurate interval estimates for small binary hierarchical data. These models substitute computations as an alternative to the traditional Wald-type and profile likelihood confidence intervals. I use a latent variable approach with a new split bootstrap method for estimating intraclass correlation coefficients when analyzing binary data obtained from a three-level hierarchical structure. It is especially useful with small sample size and easily expanded to multilevel. Comparisons are made to existing approaches through both theoretical justification and simulation studies. Further, I demonstrate my findings through an analysis of three numerical examples, one based on cancer in remission data, one related to the China’s antibiotic abuse study, and a third related to teacher effectiveness in schools from a state of southwest US.

ContributorsWang, Bei (Author) / Wilson, Jeffrey R (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Reiser, Mark R. (Committee member) / St Louis, Robert (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2017

Locally D-optimal designs for generalized linear models

Description

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained…

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection.

Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among each other and there is no restriction on the possible values of the continuous covariate. The locally D-optimal design structures for such models are identified and results for obtaining smaller optimal designs using orthogonal arrays (OAs) are presented. Topic II considers GLMs with multiple covariates under the assumptions that all but one covariate are bounded within specified intervals and interaction effects among those bounded covariates may also exist. An explicit formula for D-optimal designs is derived and OA-based smaller D-optimal designs for models with one or two two-factor interactions are also constructed. Topic III considers multiple-covariate logistic models. All covariates are nonnegative and there is no interaction among them. Two types of D-optimal design structures are identified and their global D-optimality is proved using the celebrated equivalence theorem.

ContributorsWang, Zhongsheng (Author) / Stufken, John (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2018

Perceptions of New Adjuncts on the Optional Professional Development at University Of California, Los Angeles Extension

Description

This mixed-methods study explored perceptions of new adjuncts on various trainings with regards to satisfying their professional and aspirational needs. Three trainings were offered in fall 2018 quarter as optional professional development: workshop, and two roundtable sessions. These trainings assisted adjuncts with their teaching skills, educational technology and pedagogy. Guidance…

This mixed-methods study explored perceptions of new adjuncts on various trainings with regards to satisfying their professional and aspirational needs. Three trainings were offered in fall 2018 quarter as optional professional development: workshop, and two roundtable sessions. These trainings assisted adjuncts with their teaching skills, educational technology and pedagogy. Guidance was provided from experienced adjuncts and staff.

Surveys and interviews with adjuncts, along with a focus group with staff were the sources of data for this study. A repeated measures Analysis of Covariance (ANCOVA) model was utilized. Analysis of data showed that there was a positive and statistical significance of change in perceptions of adjuncts who participated in all trainings towards fulfilling their needs, as compared to those who did not participate in any trainings. Adjuncts perceived an improvement in their professional growth based on Herzberg’s motivation-hygiene theory and the trainings also fulfilled their higher-level growth needs based on Maslow’s hierarchical needs theory. A large practical significance was also found which measures the practical impact of such trainings at local communities of practice.

ContributorsSreekaram, Siddhartha (Author) / Marsh, Josephine (Thesis advisor) / Amrein-Beardsley, Audrey (Committee member) / Kim, Jeongeun (Committee member) / Arizona State University (Publisher)

Created2019

Psychometric and Machine Learning Approaches to Diagnostic Classification

Description

The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response theory (IRT), and IRT scores can subsequently be used to…

The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response theory (IRT), and IRT scores can subsequently be used to determine a cut score using receiver operating characteristic (ROC) curves. Psychometric methods provide reliable and interpretable scores, but the prediction of the diagnosis is not the primary product of the measurement process. In contrast, machine learning methods, such as regularization or binary recursive partitioning, can build a model from the assessment items to predict the probability of diagnosis. Machine learning predicts the diagnosis directly, but does not provide an inferential framework to explain why item responses are related to the diagnosis. It remains unclear whether psychometric and machine learning methods have comparable accuracy or if one method is preferable in some situations. In this study, Monte Carlo simulation methods were used to compare psychometric and machine learning methods on diagnostic classification accuracy. Results suggest that classification accuracy of psychometric models depends on the diagnostic-test correlation and prevalence of diagnosis. Also, machine learning methods that reduce prediction error have inflated specificity and very low sensitivity compared to the data-generating model, especially when prevalence is low. Finally, machine learning methods that use ROC curves to determine probability thresholds have comparable classification accuracy to the psychometric models as sample size, number of items, and number of item categories increase. Therefore, results suggest that machine learning models could provide a viable alternative for classification in diagnostic assessments. Strengths and limitations for each of the methods are discussed, and future directions are considered.

ContributorsGonzález, Oscar (Author) / Mackinnon, David P (Thesis advisor) / Edwards, Michael C (Thesis advisor) / Grimm, Kevin J. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2018

Toward more inclusive large-enrollment undergraduate biology classrooms: identifying inequities and possible underlying mechanisms

Description

Guided by Tinto’s Theory of College Student Departure, I conducted a set of five studies to identify factors that influence students’ social integration in college science active learning classes. These studies were conducted in large-enrollment college science courses and some were specifically conducted in undergraduate active learning biology courses.…

Guided by Tinto’s Theory of College Student Departure, I conducted a set of five studies to identify factors that influence students’ social integration in college science active learning classes. These studies were conducted in large-enrollment college science courses and some were specifically conducted in undergraduate active learning biology courses. Using qualitative and quantitative methodologies, I identified how students’ identities, such as their gender and LGBTQIA identity, and students’ perceptions of their own intelligence influence their experience in active learning science classes and consequently their social integration in college. I also determined factors of active learning classrooms and instructor behaviors that can affect whether students experience positive or negative social integration in the context of active learning. I found that students’ hidden identities, such as the LGBTQIA identity, are more relevant in active learning classes where students work together and that the increased relevance of one’s identity can have a positive and negative impact on their social integration. I also found that students’ identities can predict their academic self-concept, or their perception of their intelligence as it compares to others’ intelligence in biology, which in turn predicts their participation in small group-discussion. While many students express a fear of negative evaluation, or dread being evaluated negatively by others when speaking out in active learning classes, I identified that how instructors structure group work can cause students to feel more or less integrated into the college science classroom. Lastly, I identified tools that instructors can use, such as name tents and humor, which can positive affect students’ social integration into the college science classroom. In sum, I highlight inequities in students’ experiences in active learning science classrooms and the mechanisms that underlie some of these inequities. I hope this work can be used to create more inclusive undergraduate active learning science courses.

ContributorsCooper, Katelyn M (Author) / Brownell, Sara E (Thesis advisor) / Stout, Valerie (Committee member) / Collins, James (Committee member) / Orchinik, Miles (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2018

Ready or not: student perceptions of the college readiness binary and Arizona Move on When Ready

Description

In 2010, the Arizona Legislature established a performance-based diploma initiative known as Move On When Ready (MOWR). The policy relies on an education model designed to evaluate students' college and career readiness by measuring their academic ability to succeed in the first credit-bearing course in community college. Move On When…

In 2010, the Arizona Legislature established a performance-based diploma initiative known as Move On When Ready (MOWR). The policy relies on an education model designed to evaluate students' college and career readiness by measuring their academic ability to succeed in the first credit-bearing course in community college. Move On When Ready is a structurally oriented, qualification system that attempts to attain a relatively narrow goal: increase the number of students able to successfully perform at a college-level academic standard. By relying on a set of benchmarked assessments to measure success and failure, MOWR propagates a categorical binary. The binary establishes explicit performance criteria on a set of examinations students are required to meet in order to earn a high school qualification that, by design, certifies whether students are ready or not ready for college.

This study sought to reveal how students’ perceptions of the policy and schooling in general affect their understanding of the concept of college readiness and the college readiness binary and to identify factors that help formulate those perceptions. This interpretivist, qualitative study relied on analysis of multiple face-to-face interviews with students to better understand how they think and act within the context of Move On When Ready, paying particular attention to students from historically vulnerable minority subgroups (e.g., the Latina (a)/Hispanic sub-population) enrolled in two schools deploying the MOWR strategy.

Findings suggest that interviewed students understand little about MOWR's design, intent or implications for their future educational trajectories. Moreover, what they believe is generally misinformed, regardless of aspiration, socio-cultural background, or academic standing. School-based sources of messaging (e.g., teachers and administrators) supply the bulk of information to students about MOWR. However, in these two schools, the flow of information is constricted. In addition, the information conveyed is either distorted by message mediators or misinterpreted by the students. The data reveal that formal and informal mediators of policy messages influence students’ engagement with the policy and affect students’ capacity to play an active role in determining the policy’s effect on their educational outcomes.

ContributorsSilver, Michael Greg (Author) / Berliner, David C. (Thesis advisor) / Fischman, Gustavo (Committee member) / Amrein-Beardsley, Audrey (Committee member) / Arizona State University (Publisher)

Created2015

Modeling multifaceted constructs in statistical mediation analysis: a bifactor approach

Description

Statistical mediation analysis allows researchers to identify the most important the mediating constructs in the causal process studied. Information about the mediating processes can be used to make interventions more powerful by enhancing successful program components and by not implementing components that did not significantly change the outcome. Identifying mediators…

Statistical mediation analysis allows researchers to identify the most important the mediating constructs in the causal process studied. Information about the mediating processes can be used to make interventions more powerful by enhancing successful program components and by not implementing components that did not significantly change the outcome. Identifying mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to external criteria. However, current methods do not allow researchers to study the relationships between general and specific aspects of a construct to an external criterion simultaneously. This study proposes a bifactor measurement model for the mediating construct as a way to represent the general aspect and specific facets of a construct simultaneously. Monte Carlo simulation results are presented to help to determine under what conditions researchers can detect the mediated effect when one of the facets of the mediating construct is the true mediator, but the mediator is treated as unidimensional. Results indicate that parameter bias and detection of the mediated effect depends on the facet variance represented in the mediation model. This study contributes to the largely unexplored area of measurement issues in statistical mediation analysis.

ContributorsGonzález, Oscar (Author) / Mackinnon, David P (Thesis advisor) / Grimm, Kevin J. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2016

A power study of Gffit statistics as somponents of Pearson chi-square

Description

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due…

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on two-way subtables. However, due to variables with a large number of categories and small sample size, even the GFfit statistic may have low power and inaccurate Type I error level due to sparseness in the two-way subtable. In this dissertation, the theoretical power and empirical power of the GFfit statistic are studied. A method based on subsets of orthogonal components for the GFfit statistic on the subtables is developed to improve the performance of the GFfit statistic. Simulation results for power and type I error rate for several different cases along with comparisons to other diagnostics are presented.

ContributorsZhu, Junfei (Author) / Reiser, Mark R. (Thesis advisor) / Stufken, John (Committee member) / Zheng, Yi (Committee member) / St Louis, Robert (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)

Created2017

Filtering by