Matching Items (36)
156311-Thumbnail Image.png
Description
To foster both external and internal accountability, universities seek more effective models for student learning outcomes assessment (SLOA). Meaningful and authentic measurement of program-level student learning outcomes requires engagement with an institution’s faculty members, especially to gather student performance assessment data using common scoring instruments, or rubrics, across a university’s

To foster both external and internal accountability, universities seek more effective models for student learning outcomes assessment (SLOA). Meaningful and authentic measurement of program-level student learning outcomes requires engagement with an institution’s faculty members, especially to gather student performance assessment data using common scoring instruments, or rubrics, across a university’s many colleges and programs. Too often, however, institutions rely on faculty engagement for SLOA initiatives like this without providing necessary support, communication, and training. The resulting data may lack sufficient reliability and reflect deficiencies in an institution’s culture of assessment.

This mixed methods action research study gauged how well one form of SLOA training – a rubric-norming workshop – could affect both inter-rater reliability for faculty scorers and faculty perceptions of SLOA while exploring the nature of faculty collaboration toward a shared understanding of student learning outcomes. The study participants, ten part-time faculty members at the institution, each held primary careers in the health care industry, apart from their secondary role teaching university courses. Accordingly, each contributed expertise and experience to the rubric-norming discussions, surveys of assessment-related perceptions, and individual scoring of student performance with a common rubric. Drawing on sociocultural learning principles and the specific lens of activity theory, influences on faculty SLOA were arranged and analyzed within the heuristic framework of an activity system to discern effects of collaboration and perceptions toward SLOA on consistent rubric-scoring by faculty participants.

Findings suggest participation in the study did not correlate to increased inter-rater reliability for faculty scorers when using the common rubric. Constraints found within assessment tools and unclear institutional leadership prevented more reliable use of common rubrics. Instead, faculty participants resorted to individual assessment approaches to meaningfully guide students to classroom achievement and preparation for careers in the health care field. Despite this, faculty participants valued SLOA, collaborated readily with colleagues for shared assessment goals, and worked hard to teach and assess students meaningfully.
ContributorsWilliams, Nicholas (Author) / Liou, Daniel D (Thesis advisor) / Rotheram-Fuller, Erin (Committee member) / Turbow, David (Committee member) / Arizona State University (Publisher)
Created2018
156690-Thumbnail Image.png
Description
Dynamic Bayesian networks (DBNs; Reye, 2004) are a promising tool for modeling student proficiency under rich measurement scenarios (Reichenberg, in press). These scenarios often present assessment conditions far more complex than what is seen with more traditional assessments and require assessment arguments and psychometric models capable of integrating those complexities.

Dynamic Bayesian networks (DBNs; Reye, 2004) are a promising tool for modeling student proficiency under rich measurement scenarios (Reichenberg, in press). These scenarios often present assessment conditions far more complex than what is seen with more traditional assessments and require assessment arguments and psychometric models capable of integrating those complexities. Unfortunately, DBNs remain understudied and their psychometric properties relatively unknown. If the apparent strengths of DBNs are to be leveraged, then the body of literature surrounding their properties and use needs to be expanded upon. To this end, the current work aimed at exploring the properties of DBNs under a variety of realistic psychometric conditions. A two-phase Monte Carlo simulation study was conducted in order to evaluate parameter recovery for DBNs using maximum likelihood estimation with the Netica software package. Phase 1 included a limited number of conditions and was exploratory in nature while Phase 2 included a larger and more targeted complement of conditions. Manipulated factors included sample size, measurement quality, test length, the number of measurement occasions. Results suggested that measurement quality has the most prominent impact on estimation quality with more distinct performance categories yielding better estimation. While increasing sample size tended to improve estimation, there were a limited number of conditions under which greater samples size led to more estimation bias. An exploration of this phenomenon is included. From a practical perspective, parameter recovery appeared to be sufficient with samples as low as N = 400 as long as measurement quality was not poor and at least three items were present at each measurement occasion. Tests consisting of only a single item required exceptional measurement quality in order to adequately recover model parameters. The study was somewhat limited due to potentially software-specific issues as well as a non-comprehensive collection of experimental conditions. Further research should replicate and, potentially expand the current work using other software packages including exploring alternate estimation methods (e.g., Markov chain Monte Carlo).
ContributorsReichenberg, Raymond E (Author) / Levy, Roy (Thesis advisor) / Eggum-Wilkens, Natalie (Thesis advisor) / Iida, Masumi (Committee member) / DeLay, Dawn (Committee member) / Arizona State University (Publisher)
Created2018
156621-Thumbnail Image.png
Description
Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of mismatch in dimensionality between data and analysis models with multiple-group analyses at the population and sample levels. Datasets were generated using a bifactor model with different factor structures and were analyzed with bifactor and single-factor models to assess misspecification effects on assessments of MI and latent mean differences. As baseline models, the bifactor models fit data well and had minimal bias in latent mean estimation. However, the low convergence rates of fitting bifactor models to data with complex structures and small sample sizes caused concern. On the other hand, effects of fitting the misspecified single-factor models on the assessments of MI and latent means differed by the bifactor structures underlying data. For data following one general factor and one group factor affecting a small set of indicators, the effects of ignoring the group factor in analysis models on the tests of MI and latent mean differences were mild. In contrast, for data following one general factor and several group factors, oversimplifications of analysis models can lead to inaccurate conclusions regarding MI assessment and latent mean estimation.
ContributorsXu, Yuning (Author) / Green, Samuel (Thesis advisor) / Levy, Roy (Committee member) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)
Created2018
156658-Thumbnail Image.png
Description
Education through field exploration is fundamental in geoscience. But not all students enjoy equal access to field-based learning because of time, cost, distance, ability, and safety constraints. At the same time, technological advances afford ever more immersive, rich, and student-centered virtual field experiences. Virtual field trips may be the only

Education through field exploration is fundamental in geoscience. But not all students enjoy equal access to field-based learning because of time, cost, distance, ability, and safety constraints. At the same time, technological advances afford ever more immersive, rich, and student-centered virtual field experiences. Virtual field trips may be the only practical options for most students to explore pedagogically rich but inaccessible places. A mixed-methods research project was conducted on an introductory and an advanced geology class to explore the implications of learning outcomes of in-person and virtual field-based instruction at Grand Canyon National Park. The study incorporated the Great Unconformity in the Grand Canyon, a 1.2 billion year break in the rock record; the Trail of Time, an interpretive walking timeline; and two immersive, interactive virtual field trips (iVFTs). The in-person field trip (ipFT) groups collectively explored the canyon and took an instructor-guided inquiry hike along the interpretive Trail of Time from rim level, while iVFT students individually explored the canyon and took a guided-inquiry virtual tour of Grand Canyon geology from river level. High-resolution 360° spherical images anchor the iVFTs and serve as a framework for programmed overlays that enable interactivity and allow the iVFT to provide feedback in response to student actions. Students in both modalities received pre- and post-trip Positive and Negative Affect Schedules (PANAS). The iVFT students recorded pre- to post-trip increases in positive affect (PA) scores and decreases in negative (NA) affect scores, representing an affective state conducive to learning. Pre- to post-trip mean scores on concept sketches used to assess visualization and geological knowledge increased for both classes and modalities. However, the iVFT pre- to post-trip increases were three times greater (statistically significant) than the ipFT gains. Both iVFT and ipFT students scored 92-98% on guided-inquiry worksheets completed during the trips, signifying both met learning outcomes. Virtual field trips do not trump traditional in-person field work, but they can meet and/or exceed similar learning objectives and may replace an inaccessible or impractical in-person field trip.
ContributorsRuberto, Thomas (Author) / Semken, Steve (Thesis advisor) / Anbar, Ariel (Committee member) / Brownell, Sara (Committee member) / Arizona State University (Publisher)
Created2018
156269-Thumbnail Image.png
Description
Much research has been conducted regarding the current state of public education within the United States. Very little of that research bodes well for the system’s current circumstances or for the direction our system is headed. The debate stems around two opposing ideologies. One believes that there needs to be

Much research has been conducted regarding the current state of public education within the United States. Very little of that research bodes well for the system’s current circumstances or for the direction our system is headed. The debate stems around two opposing ideologies. One believes that there needs to be more accountability via high-stakes testing and the continuum of the status quo that the country has maintained for centuries, regardless of the effect it may be having on the students’ well-being. While the opposing view sees high-stakes testing as a contributing factor to the seemingly unproductive, chaotic, and even harmful conundrum of bias and hegemony that shows a positive correlation of deleterious effects to student well-being. Although this paper references the research of highly esteemed scholars, it asserts that the voices of those that are most relegated to that of undervalued and ignored are precisely the voices that need to be gleaned most relevant. This paper’s purpose is to hear what the ‘experts’ in the field of education, the students themselves, have to say.
ContributorsKhaleesi, Casey (Author) / Swadener, Elizabeth (Thesis advisor) / Bertrand, Melanie (Committee member) / Broberg, Gregory (Committee member) / Arizona State University (Publisher)
Created2018
155025-Thumbnail Image.png
Description
Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the

Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the structure of relationships in the data such as clustered/hierarchical data, b) do not allow or control for missing values present in the data, or c) do not accurately compensate for different data types such as categorical data, then the assumptions associated with the model have not been met and the results of the analysis may be inaccurate. In the presence of clustered
ested data, hierarchical linear modeling or multilevel modeling (MLM; Raudenbush & Bryk, 2002) has the ability to predict outcomes for each level of analysis and across multiple levels (accounting for relationships between levels) providing a significant advantage over single-level analyses. When multilevel data contain missingness, multilevel multiple imputation (MLMI) techniques may be used to model both the missingness and the clustered nature of the data. With categorical multilevel data with missingness, categorical MLMI must be used. Two such routines for MLMI with continuous and categorical data were explored with missing at random (MAR) data: a formal Bayesian imputation and analysis routine in JAGS (R/JAGS) and a common MLM procedure of imputation via Bayesian estimation in BLImP with frequentist analysis of the multilevel model in Mplus (BLImP/Mplus). Manipulated variables included interclass correlations, number of clusters, and the rate of missingness. Results showed that with continuous data, R/JAGS returned more accurate parameter estimates than BLImP/Mplus for almost all parameters of interest across levels of the manipulated variables. Both R/JAGS and BLImP/Mplus encountered convergence issues and returned inaccurate parameter estimates when imputing and analyzing dichotomous data. Follow-up studies showed that JAGS and BLImP returned similar imputed datasets but the choice of analysis software for MLM impacted the recovery of accurate parameter estimates. Implications of these findings and recommendations for further research will be discussed.
ContributorsKunze, Katie L (Author) / Levy, Roy (Thesis advisor) / Enders, Craig K. (Committee member) / Thompson, Marilyn S (Committee member) / Arizona State University (Publisher)
Created2016
168438-Thumbnail Image.png
Description
In this mixed-methods study, I sought to design and develop a test delivery method to reduce linguistic bias in English-based mathematics tests. Guided by translanguaging, a recent linguistic theory recognizing the complexity of multilingualism, I designed a computer-based test delivery method allowing test-takers to toggle between English and their self-identified

In this mixed-methods study, I sought to design and develop a test delivery method to reduce linguistic bias in English-based mathematics tests. Guided by translanguaging, a recent linguistic theory recognizing the complexity of multilingualism, I designed a computer-based test delivery method allowing test-takers to toggle between English and their self-identified dominant language. This three-part study asks and answers research questions from all phases of the novel test delivery design. In the first phase, I conducted cognitive interviews with 11 Mandarin Chinese dominant speakers and 11 Spanish speaking dominant undergraduate students while taking a well-regarded calculus conceptual exam, the Precalculus Concept Assessment (PCA). In the second phase, I designed and developed the linguistically adaptive test (LAT) version of the PCA using the Concerto test delivery platform. In the third phase, I conducted a within-subjects random-assignment study of the efficacy the LAT. I also conducted in-depth interviews with a subset of the test-takers. Nine items on the PCA revealed linguistic issues during the cognitive interviews demonstrating the need to improve the linguistic bias on the test items. Additionally, the newly developed LAT demonstrated evidence of reliability and validity. However, the large-scale efficacy study showed that the LAT did not appear to make a significant difference in scores for dominant speakers of Spanish or dominant speakers of Mandarin Chinese. This finding held true for overall test scores as well as at the item level indicating that the LAT test delivery system does not appear to reduce linguistic bias in testing. Additionally, in-depth interviews revealed that many students felt that the linguistically adaptive test was either the same or essentially the same as the non-LAT version of the test. Some participants felt that the toggle button was not necessary if they could understand the mathematics item well enough. As one participant noted, “It's math, It's math. It doesn't matter if it's in English or in Spanish.” This dissertation concludes with a discussion about the implications for test developers and suggestions for future direction of study.
ContributorsClose, Kevin (Author) / Zheng, Yi (Thesis advisor) / Amrein-Beardsley, Audrey (Thesis advisor) / Anderson, Kate (Committee member) / Arizona State University (Publisher)
Created2021
189328-Thumbnail Image.png
Description
Evolution is a key feature of undergraduate biology education: the AmericanAssociation for the Advancement of Science (AAAS) has identified evolution as one of the five core concepts of biology, and it is relevant to a wide array of biology-related careers. If biology instructors want students to use evolution to address scientific challenges post-graduation,

Evolution is a key feature of undergraduate biology education: the AmericanAssociation for the Advancement of Science (AAAS) has identified evolution as one of the five core concepts of biology, and it is relevant to a wide array of biology-related careers. If biology instructors want students to use evolution to address scientific challenges post-graduation, students need to be able to apply evolutionary principles to real-life situations, and accept that the theory of evolution is the best scientific explanation for the unity and diversity of life on Earth. In order to help students progress on both fronts, biology education researchers need surveys that measure evolution acceptance and assessments that measure students’ ability to apply evolutionary concepts. This dissertation improves the measurement of student understanding and acceptance of evolution by (1) developing a novel Evolutionary Medicine Assessment that measures students’ ability to apply the core principles of Evolutionary Medicine to a variety of health-related scenarios, (2) reevaluating existing measures of student evolution acceptance by using student interviews to assess response process validity, and (3) correcting the validity issues identified on the most widely-used measure of evolution acceptance - the Measure of Acceptance of the Theory of Evolution (MATE) - by developing and validating a revised version of this survey: the MATE 2.0.
ContributorsMisheva, Anastasia Taya (Author) / Brownell, Sara (Thesis advisor) / Barnes, Elizabeth (Committee member) / Collins, James (Committee member) / Cooper, Katelyn (Committee member) / Sterner, Beckett (Committee member) / Arizona State University (Publisher)
Created2023
189387-Thumbnail Image.png
Description
The theoretical basis of the proposed study is drawn from an ecological-transactional (Lynch & Cicchetti, 1998) systems approach to development, which focuses on contexts, and correspondingly, overlays the gender affirmative model’s (GAM) transactional model of support (Keomeier & Ehrensaft, 2018) to reveal protection in the school ecology. Combining these two

The theoretical basis of the proposed study is drawn from an ecological-transactional (Lynch & Cicchetti, 1998) systems approach to development, which focuses on contexts, and correspondingly, overlays the gender affirmative model’s (GAM) transactional model of support (Keomeier & Ehrensaft, 2018) to reveal protection in the school ecology. Combining these two approaches provides unique insights into protective factors in the school ecology, distinct from developmental systems approaches driven by the minority stress model (Meyer, 2003), which are designed to highlight the multidimensional quality of risk (Eisenberg et al., 2019). The dissertation had two central aims: 1) to report on the development of the Gender Affirmative School Climate (GASC) scale, a self-report survey designed to capture high school climate specific to the domain of gender, and 2) to explore how gender affirmative school climate (GASC) relates to student self-esteem and school belongingness. Unique from risk factors approaches the central aims sought out to identify protective factors within a developmental system ecology of the high school context.In two pilot studies (N=12; N=758; trans = 413, non-trans = 344) and primary study (N=813; trans = 482, non-trans = 328) results for scale development provide evidence to validate assumptions that the proposed (GASC) construct captures what was intended, that is, school climate specific to the domain of gender. However, measurement invariance procedure showed that not all items operated equivalently across trans and non-trans groups, and confirmed that the proposed scale meets criteria for “weak measurement invariance”. High school students that reported more positive school climate reported lower self-esteem scores. Only one protective moderator was consistent with hypotheses: More feelings of similarity to peer group gender (boys) emerged as a protective factor for transgender identified high schoolers attenuating the negative relationship between perceptions of school climate and self-esteem. Latent measurement models for each gender group demonstrated that the school belongingness construct is highly related to the proposed (GASC) construct. This demonstrated domain overlap with “feelings of school belongingness” signals that the proposed scale showed good convergent validity. The results provide insight about ways high schools can be pro-active to promote a healthier school climate for transgender students.
ContributorsScrofani, Stephan (Author) / Martin, Carol L. (Thesis advisor) / DeLay, Dawn (Thesis advisor) / Lindstrom-Johnson, Sarah (Committee member) / Low, Sabina (Committee member) / Arizona State University (Publisher)
Created2023
161892-Thumbnail Image.png
Description
Since the No Child Left Behind (NCLB) Act required classifications of students’ performance levels, test scores have been used to measure students’ achievement; in particular, test scores are used to determine whether students reach a proficiency level in the state assessment. Accordingly, school districts have started using benchmark assessments to

Since the No Child Left Behind (NCLB) Act required classifications of students’ performance levels, test scores have been used to measure students’ achievement; in particular, test scores are used to determine whether students reach a proficiency level in the state assessment. Accordingly, school districts have started using benchmark assessments to complement the state assessment. Unlike state assessments administered at the end of the school year, benchmark assessments, administered multiple times during the school year, measures students’ learning progress toward reaching the proficiency level. Thus, the results of the benchmark assessments can help districts and schools prepare their students for the subsequent state assessments so that their students can reach the proficiency level in the state assessment. If benchmark assessments can predict students’ future performance measured in the state assessments accurately, the assessments can be more useful to facilitate classroom instructions to support students’ improvements. Thus, this study focuses on the predictive accuracy of a proficiency cut score in the benchmark assessment. Specifically, using an econometric research technique, Regression Discontinuity Design, this study assesses whether reaching a proficiency level in the benchmark assessment had a causal impact on increasing the probability of reaching a proficiency level in the state assessment. Finding no causal impact of the cut score, this study alternatively applies a Precision-Recall curve - a useful measure for evaluating predictive performance of binary classification. By using this technique, this study calculates an optimal proficiency cut score in the benchmark assessment that maximizes the accuracy and minimizes the inaccuracy in predicting the proficiency level in the state assessment. Based on the results, this study discusses issues regarding the conventional approaches of establishing cut scores in large-scale assessments and suggests some potential approaches to increase the predictive accuracy of the cut score in benchmark assessments.
ContributorsTerada, Takeshi (Author) / Chen, Ying-Chih (Thesis advisor) / Edwards, Michael (Thesis advisor) / Garcia, David (Committee member) / Arizona State University (Publisher)
Created2021