Matching Items (11)
152057-Thumbnail Image.png
Description
Possible selves researchers have uncovered many issues associated with the current possible selves measures. For instance, one of the most famous possible selves measures, Oyserman (2004)'s open-ended possible selves, has proven to be difficult to score reliably and also involves laborious scoring procedures. Therefore, this study was initiated to develo

Possible selves researchers have uncovered many issues associated with the current possible selves measures. For instance, one of the most famous possible selves measures, Oyserman (2004)'s open-ended possible selves, has proven to be difficult to score reliably and also involves laborious scoring procedures. Therefore, this study was initiated to develop a close-ended measure, called the Persistent Academic Possible Selves Scale for Adolescents (PAPSS), that meets these challenges. The PAPSS integrates possible selves theories (personal and social identities) and educational psychology (self-regulation in social cognitive theory). Four hundred and ninety five junior high and high school students participated in the validation study of the PAPSS. I conducted confirmatory factor analyses (CFA) to compare fit for a baseline model to the hypothesized models using Mplus version 7 (Muthén & Muthén, 2012). A weighted least square means and a variance adjusted (WLSMV) estimation method was used for handling multivariate nonnormality of ordered categorical data. The final PAPSS has validity evidence based on the internal structure. The factor structure is composed of three goal-driven factors, one self-regulated factor that focuses on peers, and four self-regulated factors that emphasize the self. Oyserman (2004)'s open-ended questionnaire was used for exploring the evidence of convergent validity. Many issues regarding Oyserman (2003)'s instructions were found during the coding process of academic plausibility. It was complicated to detect hidden academic possible selves and strategies from non-academic possible selves and strategies. Also, interpersonal related strategies were over weighted in the scoring process compared to interpersonal related academic possible selves. The study results uncovered that all of the academic goal-related factors in the PAPSS are significantly related to academic plausibility in a positive direction. However, self-regulated factors in the PAPSS are not. The correlation results between the self-regulated factors and academic plausibility do not provide the evidence of convergent validity. Theoretical and methodological explanations for the test results are discussed.
ContributorsLee, Ji Eun (Author) / Husman, Jenefer (Thesis advisor) / Green, Samuel (Committee member) / Millsap, Roger (Committee member) / Brem, Sarah (Committee member) / Arizona State University (Publisher)
Created2013
152866-Thumbnail Image.png
Description
The measurement of competency in nursing is critical to ensure safe and effective care of patients. This study had two purposes. First, the psychometric characteristics of the Nursing Performance Profile (NPP), an instrument used to measure nursing competency, were evaluated using generalizability theory and a sample of 18 nurses in

The measurement of competency in nursing is critical to ensure safe and effective care of patients. This study had two purposes. First, the psychometric characteristics of the Nursing Performance Profile (NPP), an instrument used to measure nursing competency, were evaluated using generalizability theory and a sample of 18 nurses in the Measuring Competency with Simulation (MCWS) Phase I dataset. The relative magnitudes of various error sources and their interactions were estimated in a generalizability study involving a fully crossed, three-facet random design with nurse participants as the object of measurement and scenarios, raters, and items as the three facets. A design corresponding to that of the MCWS Phase I data--involving three scenarios, three raters, and 41 items--showed nurse participants contributed the greatest proportion to total variance (50.00%), followed, in decreasing magnitude, by: rater (19.40%), the two-way participant x scenario interaction (12.93%), and the two-way participant x rater interaction (8.62%). The generalizability (G) coefficient was .65 and the dependability coefficient was .50. In decision study designs minimizing number of scenarios, the desired generalizability coefficients of .70 and .80 were reached at three scenarios with five raters, and five scenarios with nine raters, respectively. In designs minimizing number of raters, G coefficients of .72 and .80 were reached at three raters and five scenarios and four raters and nine scenarios, respectively. A dependability coefficient of .71 was attained with six scenarios and nine raters or seven raters and nine scenarios. Achieving high reliability with designs involving fewer raters may be possible with enhanced rater training to decrease variance components for rater main and interaction effects. The second part of this study involved the design and implementation of a validation process for evidence-based human patient simulation scenarios in assessment of nursing competency. A team of experts validated the new scenario using a modified Delphi technique, involving three rounds of iterative feedback and revisions. In tandem, the psychometric study of the NPP and the development of a validation process for human patient simulation scenarios both advance and encourage best practices for studying the validity of simulation-based assessments.
ContributorsO'Brien, Janet Elaine (Author) / Thompson, Marilyn (Thesis advisor) / Hagler, Debra (Thesis advisor) / Green, Samuel (Committee member) / Arizona State University (Publisher)
Created2014
Description
ABSTRACT

This study examines validity evidence of a state policy-directed teacher evaluation system implemented in Arizona during school year 2012-2013. The purpose was to evaluate the warrant for making high stakes, consequential judgments of teacher competence based on value-added (VAM) estimates of instructional impact and observations of professional practice (PP).

ABSTRACT

This study examines validity evidence of a state policy-directed teacher evaluation system implemented in Arizona during school year 2012-2013. The purpose was to evaluate the warrant for making high stakes, consequential judgments of teacher competence based on value-added (VAM) estimates of instructional impact and observations of professional practice (PP). The research also explores educator influence (voice) in evaluation design and the role information brokers have in local decision making. Findings are situated in an evidentiary and policy context at both the LEA and state policy levels.

The study employs a single-phase, concurrent, mixed-methods research design triangulating multiple sources of qualitative and quantitative evidence onto a single (unified) validation construct: Teacher Instructional Quality. It focuses on assessing the characteristics of metrics used to construct quantitative ratings of instructional competence and the alignment of stakeholder perspectives to facets implicit in the evaluation framework. Validity examinations include assembly of criterion, content, reliability, consequential and construct articulation evidences. Perceptual perspectives were obtained from teachers, principals, district leadership, and state policy decision makers. Data for this study came from a large suburban public school district in metropolitan Phoenix, Arizona.

Study findings suggest that the evaluation framework is insufficient for supporting high stakes, consequential inferences of teacher instructional quality. This is based, in part on the following: (1) Weak associations between VAM and PP metrics; (2) Unstable VAM measures across time and between tested content areas; (3) Less than adequate scale reliabilities; (4) Lack of coherence between theorized and empirical PP factor structures; (5) Omission/underrepresentation of important instructional attributes/effects; (6) Stakeholder concerns over rater consistency, bias, and the inability of test scores to adequately represent instructional competence; (7) Negative sentiments regarding the system's ability to improve instructional competence and/or student learning; (8) Concerns regarding unintended consequences including increased stress, lower morale, harm to professional identity, and restricted learning opportunities; and (9) The general lack of empowerment and educator exclusion from the decision making process. Study findings also highlight the value of information brokers in policy decision making and the importance of having access to unbiased empirical information during the design and implementation phases of important change initiatives.
ContributorsSloat, Edward F. (Author) / Wetzel, Keith (Thesis advisor) / Amrein-Beardsley, Audrey (Thesis advisor) / Ewbank, Ann (Committee member) / Shough, Lori (Committee member) / Arizona State University (Publisher)
Created2015
150306-Thumbnail Image.png
Description
Lexical diversity (LD) has been used in a wide range of applications, producing a rich history in the field of speech-language pathology. However, for clinicians and researchers identifying a robust measure to quantify LD has been challenging. Recently, sophisticated techniques have been developed that assert to measure LD. Each one

Lexical diversity (LD) has been used in a wide range of applications, producing a rich history in the field of speech-language pathology. However, for clinicians and researchers identifying a robust measure to quantify LD has been challenging. Recently, sophisticated techniques have been developed that assert to measure LD. Each one is based on its own theoretical assumptions and employs different computational machineries. Therefore, it is not clear to what extent these techniques produce valid scores and how they relate to each other. Further, in the field of speech-language pathology, researchers and clinicians often use different methods to elicit various types of discourse and it is an empirical question whether the inferences drawn from analyzing one type of discourse relate and generalize to other types. The current study examined a corpus of four types of discourse (procedures, eventcasts, storytelling, recounts) from 442 adults. Using four techniques (D; Maas; Measure of textual lexical diversity, MTLD; Moving average type token ratio, MATTR), LD scores were estimated for each type. Subsequently, data were modeled using structural equation modeling to uncover their latent structure. Results indicated that two estimation techniques (MATTR and MTLD) generated scores that were stronger indicators of the LD of the language samples. For the other two techniques, results were consistent with the presence of method factors that represented construct-irrelevant sources. A hierarchical factor analytic model indicated that a common factor underlay all combinations of types of discourse and estimation techniques and was interpreted as a general construct of LD. Two discourse types (storytelling and eventcasts) were significantly stronger indicators of the underlying trait. These findings supplement our understanding regarding the validity of scores generated by different estimation techniques. Further, they enhance our knowledge about how productive vocabulary manifests itself across different types of discourse that impose different cognitive and linguistic demands. They also offer clinicians and researchers a point of reference in terms of techniques that measure the LD of a language sample and little of anything else and also types of discourse that might be the most informative for measuring the LD of individuals.
ContributorsFergadiotis, Gerasimos (Author) / Wright, Heather H (Thesis advisor) / Katz, Richard (Committee member) / Green, Samuel (Committee member) / Arizona State University (Publisher)
Created2011
150355-Thumbnail Image.png
Description
This study investigated the internal factor structure of the English language development Assessment (ELDA) using confirmatory factor analysis. ELDA is an English language proficiency test developed by a consortium of multiple states and is used to identify and reclassify English language learners in kindergarten to grade 12. Scores on item

This study investigated the internal factor structure of the English language development Assessment (ELDA) using confirmatory factor analysis. ELDA is an English language proficiency test developed by a consortium of multiple states and is used to identify and reclassify English language learners in kindergarten to grade 12. Scores on item parcels based on the standards tested from the four domains of reading, writing, listening, and speaking were used for the analyses. Five different factor models were tested: a single factor model, a correlated two-factor model, a correlated four-factor model, a second-order factor model and a bifactor model. The results indicate that the four-factor model, second-order model, and bifactor model fit the data well. The four-factor model hypothesized constructs for reading, writing, listening and speaking. The second-order model hypothesized a second-order English language proficiency factor as well as the four lower-order factors of reading, writing, listening and speaking. The bifactor model hypothesized a general English language proficiency factor as well as the four domain specific factors of reading, writing, listening, and speaking. The Chi-square difference tests indicated that the bifactor model best explains the factor structure of the ELDA. The results from this study are consistent with the findings in the literature about the multifactorial nature of language but differ from the conclusion about the factor structures reported in previous studies. The overall proficiency levels on the ELDA gives more weight to the reading and writing sections of the test than the speaking and listening sections. This study has implications on the rules used for determining proficiency levels and recommends the use of conjunctive scoring where all constructs are weighted equally contrary to current practice.
ContributorsKuriakose, Anju Susan (Author) / Macswan, Jeff (Thesis advisor) / Haladyna, Thomas (Thesis advisor) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)
Created2011
150357-Thumbnail Image.png
Description
The current study employs item difficulty modeling procedures to evaluate the feasibility of potential generative item features for nonword repetition. Specifically, the extent to which the manipulated item features affect the theoretical mechanisms that underlie nonword repetition accuracy was estimated. Generative item features were based on the phonological loop component

The current study employs item difficulty modeling procedures to evaluate the feasibility of potential generative item features for nonword repetition. Specifically, the extent to which the manipulated item features affect the theoretical mechanisms that underlie nonword repetition accuracy was estimated. Generative item features were based on the phonological loop component of Baddelely's model of working memory which addresses phonological short-term memory (Baddeley, 2000, 2003; Baddeley & Hitch, 1974). Using researcher developed software, nonwords were generated to adhere to the phonological constraints of Spanish. Thirty-six nonwords were chosen based on the set item features identified by the proposed cognitive processing model. Using a planned missing data design, two-hundred fifteen Spanish-English bilingual children were administered 24 of the 36 generated nonwords. Multiple regression and explanatory item response modeling techniques (e.g., linear logistic test model, LLTM; Fischer, 1973) were used to estimate the impact of item features on item difficulty. The final LLTM included three item radicals and two item incidentals. Results indicated that the LLTM predicted item difficulties were highly correlated with the Rasch item difficulties (r = .89) and accounted for a substantial amount of the variance in item difficulty (R2 = .79). The findings are discussed in terms of validity evidence in support of using the phonological loop component of Baddeley's model (2000) as a cognitive processing model for nonword repetition items and the feasibility of using the proposed radical structure as an item blueprint for the future generation of nonword repetition items.
ContributorsMorgan, Gareth Philip (Author) / Gorin, Joanna (Thesis advisor) / Levy, Roy (Committee member) / Gray, Shelley (Committee member) / Arizona State University (Publisher)
Created2011
151036-Thumbnail Image.png
Description
National assessment data indicate that the large majority of students in America perform below expected proficiency levels in the area of writing. Given the importance of writing skills, this is a significant problem. Curriculum-based measurement, when used for progress monitoring and intervention planning, has been shown to lead to improved

National assessment data indicate that the large majority of students in America perform below expected proficiency levels in the area of writing. Given the importance of writing skills, this is a significant problem. Curriculum-based measurement, when used for progress monitoring and intervention planning, has been shown to lead to improved academic achievement. However, researchers have not yet been able to establish the validity of curriculum-based measures of writing (CBM-W). This study examined the structural validity of CBM-W using exploratory factor analysis. The participants for this study were 253 third, 154 seventh, and 154 tenth grade students. Each participant completed a 3-minute writing sample in response to a narrative prompt. The writing samples were scored for fifteen different CBM-W indices. Separate analyses were conducted for each grade level to examine differences in the CBM-W construct across grade levels. Due to extreme multicollinearity, principal components analysis rather than common factor analysis was used to examine the structure of writing as measured by CBM-W indices. The overall structure of CBM-W indices was found to remain stable across grade levels. In all cases a three-component solution was supported, with the components being labeled production, accuracy, and sentence complexity. Limitations of the study and implications for progress monitoring with CBM-W are discussed, including the recommendation for a combination of variables that may provide more reliable and valid measurement of the writing construct.
ContributorsBrown, Alec Judd (Author) / Watkins, Marley (Thesis advisor) / Caterino, Linda (Thesis advisor) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)
Created2012
149665-Thumbnail Image.png
Description
Risk assessment instruments play a significant role in correctional intervention and guide decisions about supervision and treatment. Although advances have been made in risk assessment over the past 50 years, limited attention has been given to risk assessment for domestic violence offenders. This study investigates the use of the Domestic

Risk assessment instruments play a significant role in correctional intervention and guide decisions about supervision and treatment. Although advances have been made in risk assessment over the past 50 years, limited attention has been given to risk assessment for domestic violence offenders. This study investigates the use of the Domestic Violence Screening Inventory (DVSI) and the Offender Screening Tool (OST) with a sample of 573 offenders convicted of domestic violence offenses and sentenced to supervised probation in Maricopa County, Arizona. The study has two purposes. The first is to assess the predictive validity of the existing assessment tools with a sample of domestic violence offenders, using a number of probation outcomes. The second is to identify the most significant predictors of probation outcomes. Predictive validity is assessed using crosstabulations, bivariate correlations, and the Receiver Operating Characteristic (ROC) curve. Logistic regression is used to identify the most significant predictors of probation outcomes. The DVSI and the OST were found to be predictive of probation outcomes and were most predictive of the outcomes petition to revoke filed, petition to revoke filed for a violation of specialized domestic violence conditions, and unsuccessful probation status. Significant predictors include demographics, criminal history, current offense, victim characteristics, static factors, supervision variables and dynamic variables. The most consistent predictors were supervision variables and dynamic risk factors. The supervision variables include being supervised on a specialized domestic violence caseload and changes in supervision, either an increase or decrease, during the probation grant. The dynamic variables include employment and substance abuse. The overall findings provide support for the continued use of the DVSI and the OST and are consistent with the literature on evidence-based practices for correctional interventions. However, the predictive validity of the assessments varied across sub-groups and the instruments were less predictive for females and offenders with non-intimate partner victims. In addition, study variables only explained a small portion of the variation in the probation outcomes. Additional research is needed, expanding beyond the psychology of criminal conduct, to continue to improve existing risk assessment tools and identify more salient predictors of probation outcomes for domestic violence offenders.
ContributorsFerguson, Jennifer (Author) / Hepburn, John R. (Thesis advisor) / Ashford, José B. (Committee member) / Johnson, John M. (Committee member) / Arizona State University (Publisher)
Created2011
135483-Thumbnail Image.png
Description
Responsible test use requires validation \u2014 the process of collecting evidence to support the inferences drawn from test scores. In high-stakes testing contexts, the need for validation is especially great; the far-reaching nature of high-stakes testing affects the educational, professional, and financial futures of stakeholders. The Standards for Educational and

Responsible test use requires validation \u2014 the process of collecting evidence to support the inferences drawn from test scores. In high-stakes testing contexts, the need for validation is especially great; the far-reaching nature of high-stakes testing affects the educational, professional, and financial futures of stakeholders. The Standards for Educational and Psychological Measurement (AERA et al., 2014) offers specific guidance in developing and implementing tests. Still, concerns exist over the extent to which test developers and users of high-stakes tests are making valid inferences from test scores. This paper explores the current state of high-stakes educational testing and the validity issues surrounding it. Drawing on measurement theory literature, educational literature, and professional standards of test development and use, I assess the significance of these concerns and their potential implications for the stakeholders of high-stakes testing programs.
ContributorsKasten, Justin Daniel (Author) / Zheng, Yi (Thesis director) / Pivovarova, Margarita (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Department of Economics (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
Description

Personality testing in dogs has become a controversial topic in the dog community in the last few years. These assessments have been used among owners, shelters, working dog trainers, breeders, and researchers to identify patterns of behavior that may lead to insight about a dog’s personality. Due to inconsistencies in

Personality testing in dogs has become a controversial topic in the dog community in the last few years. These assessments have been used among owners, shelters, working dog trainers, breeders, and researchers to identify patterns of behavior that may lead to insight about a dog’s personality. Due to inconsistencies in terminology and validity testing, these personality tests have lost a notable amount of credibility. Focusing on questionnaire and behavioral based testing, this literature review aims to evaluate the significance of personality testing within the dog community. Each assessment will be analyzed for measurements and validity, as well as potential drawbacks and benefits. Four prominent personality assessments will be discussed in depth. These assessments include C-BARQ, DPQ, SAFER, and VIDOPET. I advocate for a mixed assessment model approach and highlight the benefits of expanding personality testing into genetic research.

ContributorsBedeir, Amy Amira (Author) / Wynne, Clive (Thesis director) / Van Bourg, Joshua (Committee member) / Department of Psychology (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2021-05