Matching Items (7)
Filtering by

Clear all filters

152057-Thumbnail Image.png
Description
Possible selves researchers have uncovered many issues associated with the current possible selves measures. For instance, one of the most famous possible selves measures, Oyserman (2004)'s open-ended possible selves, has proven to be difficult to score reliably and also involves laborious scoring procedures. Therefore, this study was initiated to develo

Possible selves researchers have uncovered many issues associated with the current possible selves measures. For instance, one of the most famous possible selves measures, Oyserman (2004)'s open-ended possible selves, has proven to be difficult to score reliably and also involves laborious scoring procedures. Therefore, this study was initiated to develop a close-ended measure, called the Persistent Academic Possible Selves Scale for Adolescents (PAPSS), that meets these challenges. The PAPSS integrates possible selves theories (personal and social identities) and educational psychology (self-regulation in social cognitive theory). Four hundred and ninety five junior high and high school students participated in the validation study of the PAPSS. I conducted confirmatory factor analyses (CFA) to compare fit for a baseline model to the hypothesized models using Mplus version 7 (Muthén & Muthén, 2012). A weighted least square means and a variance adjusted (WLSMV) estimation method was used for handling multivariate nonnormality of ordered categorical data. The final PAPSS has validity evidence based on the internal structure. The factor structure is composed of three goal-driven factors, one self-regulated factor that focuses on peers, and four self-regulated factors that emphasize the self. Oyserman (2004)'s open-ended questionnaire was used for exploring the evidence of convergent validity. Many issues regarding Oyserman (2003)'s instructions were found during the coding process of academic plausibility. It was complicated to detect hidden academic possible selves and strategies from non-academic possible selves and strategies. Also, interpersonal related strategies were over weighted in the scoring process compared to interpersonal related academic possible selves. The study results uncovered that all of the academic goal-related factors in the PAPSS are significantly related to academic plausibility in a positive direction. However, self-regulated factors in the PAPSS are not. The correlation results between the self-regulated factors and academic plausibility do not provide the evidence of convergent validity. Theoretical and methodological explanations for the test results are discussed.
ContributorsLee, Ji Eun (Author) / Husman, Jenefer (Thesis advisor) / Green, Samuel (Committee member) / Millsap, Roger (Committee member) / Brem, Sarah (Committee member) / Arizona State University (Publisher)
Created2013
152477-Thumbnail Image.png
Description
This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex performance assessment within a digital-simulation

This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex performance assessment within a digital-simulation educational context grounded in theories of cognition and learning. BN models were manipulated along two factors: latent variable dependency structure and number of latent classes. Distributions of posterior predicted p-values (PPP-values) served as the primary outcome measure and were summarized in graphical presentations, by median values across replications, and by proportions of replications in which the PPP-values were extreme. An effect size measure for PPMC was introduced as a supplemental numerical summary to the PPP-value. Consistent with previous PPMC research, all investigated fit functions tended to perform conservatively, but Standardized Generalized Dimensionality Discrepancy Measure (SGDDM), Yen's Q3, and Hierarchy Consistency Index (HCI) only mildly so. Adequate power to detect at least some types of misfit was demonstrated by SGDDM, Q3, HCI, Item Consistency Index (ICI), and to a lesser extent Deviance, while proportion correct (PC), a chi-square-type item-fit measure, Ranked Probability Score (RPS), and Good's Logarithmic Scale (GLS) were powerless across all investigated factors. Bivariate SGDDM and Q3 were found to provide powerful and detailed feedback for all investigated types of misfit.
ContributorsCrawford, Aaron (Author) / Levy, Roy (Thesis advisor) / Green, Samuel (Committee member) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)
Created2014
152866-Thumbnail Image.png
Description
The measurement of competency in nursing is critical to ensure safe and effective care of patients. This study had two purposes. First, the psychometric characteristics of the Nursing Performance Profile (NPP), an instrument used to measure nursing competency, were evaluated using generalizability theory and a sample of 18 nurses in

The measurement of competency in nursing is critical to ensure safe and effective care of patients. This study had two purposes. First, the psychometric characteristics of the Nursing Performance Profile (NPP), an instrument used to measure nursing competency, were evaluated using generalizability theory and a sample of 18 nurses in the Measuring Competency with Simulation (MCWS) Phase I dataset. The relative magnitudes of various error sources and their interactions were estimated in a generalizability study involving a fully crossed, three-facet random design with nurse participants as the object of measurement and scenarios, raters, and items as the three facets. A design corresponding to that of the MCWS Phase I data--involving three scenarios, three raters, and 41 items--showed nurse participants contributed the greatest proportion to total variance (50.00%), followed, in decreasing magnitude, by: rater (19.40%), the two-way participant x scenario interaction (12.93%), and the two-way participant x rater interaction (8.62%). The generalizability (G) coefficient was .65 and the dependability coefficient was .50. In decision study designs minimizing number of scenarios, the desired generalizability coefficients of .70 and .80 were reached at three scenarios with five raters, and five scenarios with nine raters, respectively. In designs minimizing number of raters, G coefficients of .72 and .80 were reached at three raters and five scenarios and four raters and nine scenarios, respectively. A dependability coefficient of .71 was attained with six scenarios and nine raters or seven raters and nine scenarios. Achieving high reliability with designs involving fewer raters may be possible with enhanced rater training to decrease variance components for rater main and interaction effects. The second part of this study involved the design and implementation of a validation process for evidence-based human patient simulation scenarios in assessment of nursing competency. A team of experts validated the new scenario using a modified Delphi technique, involving three rounds of iterative feedback and revisions. In tandem, the psychometric study of the NPP and the development of a validation process for human patient simulation scenarios both advance and encourage best practices for studying the validity of simulation-based assessments.
ContributorsO'Brien, Janet Elaine (Author) / Thompson, Marilyn (Thesis advisor) / Hagler, Debra (Thesis advisor) / Green, Samuel (Committee member) / Arizona State University (Publisher)
Created2014
150306-Thumbnail Image.png
Description
Lexical diversity (LD) has been used in a wide range of applications, producing a rich history in the field of speech-language pathology. However, for clinicians and researchers identifying a robust measure to quantify LD has been challenging. Recently, sophisticated techniques have been developed that assert to measure LD. Each one

Lexical diversity (LD) has been used in a wide range of applications, producing a rich history in the field of speech-language pathology. However, for clinicians and researchers identifying a robust measure to quantify LD has been challenging. Recently, sophisticated techniques have been developed that assert to measure LD. Each one is based on its own theoretical assumptions and employs different computational machineries. Therefore, it is not clear to what extent these techniques produce valid scores and how they relate to each other. Further, in the field of speech-language pathology, researchers and clinicians often use different methods to elicit various types of discourse and it is an empirical question whether the inferences drawn from analyzing one type of discourse relate and generalize to other types. The current study examined a corpus of four types of discourse (procedures, eventcasts, storytelling, recounts) from 442 adults. Using four techniques (D; Maas; Measure of textual lexical diversity, MTLD; Moving average type token ratio, MATTR), LD scores were estimated for each type. Subsequently, data were modeled using structural equation modeling to uncover their latent structure. Results indicated that two estimation techniques (MATTR and MTLD) generated scores that were stronger indicators of the LD of the language samples. For the other two techniques, results were consistent with the presence of method factors that represented construct-irrelevant sources. A hierarchical factor analytic model indicated that a common factor underlay all combinations of types of discourse and estimation techniques and was interpreted as a general construct of LD. Two discourse types (storytelling and eventcasts) were significantly stronger indicators of the underlying trait. These findings supplement our understanding regarding the validity of scores generated by different estimation techniques. Further, they enhance our knowledge about how productive vocabulary manifests itself across different types of discourse that impose different cognitive and linguistic demands. They also offer clinicians and researchers a point of reference in terms of techniques that measure the LD of a language sample and little of anything else and also types of discourse that might be the most informative for measuring the LD of individuals.
ContributorsFergadiotis, Gerasimos (Author) / Wright, Heather H (Thesis advisor) / Katz, Richard (Committee member) / Green, Samuel (Committee member) / Arizona State University (Publisher)
Created2011
Description
This study presents a structural model of coping with dating violence. The model integrates abuse frequency and solution attribution to determine a college woman's choice of coping strategy. Three hundred, twenty-four undergraduate women reported being targets of some physical abuse from a boyfriend and responded to questions regarding the abuse,

This study presents a structural model of coping with dating violence. The model integrates abuse frequency and solution attribution to determine a college woman's choice of coping strategy. Three hundred, twenty-four undergraduate women reported being targets of some physical abuse from a boyfriend and responded to questions regarding the abuse, their gender role beliefs, their solution attribution and the coping behaviors they executed. Though gender role beliefs and abuse severity were not significant predictors, solution attribution mediated between frequency of the abuse and coping. Abuse frequency had a positive effect on external solution attribution and external solution attribution had a positive effect on the level of use of active coping, utilization of social support, denial and acceptance.
ContributorsBapat, Mona (Author) / Tracey, Terence J.G. (Thesis advisor) / Bernstein, Bianca (Committee member) / Green, Samuel (Committee member) / Arizona State University (Publisher)
Created2011
154781-Thumbnail Image.png
Description
Researchers who conduct longitudinal studies are inherently interested in studying individual and population changes over time (e.g., mathematics achievement, subjective well-being). To answer such research questions, models of change (e.g., growth models) make the assumption of longitudinal measurement invariance. In many applied situations, key constructs are measured by a collection

Researchers who conduct longitudinal studies are inherently interested in studying individual and population changes over time (e.g., mathematics achievement, subjective well-being). To answer such research questions, models of change (e.g., growth models) make the assumption of longitudinal measurement invariance. In many applied situations, key constructs are measured by a collection of ordered-categorical indicators (e.g., Likert scale items). To evaluate longitudinal measurement invariance with ordered-categorical indicators, a set of hierarchical models can be sequentially tested and compared. If the statistical tests of measurement invariance fail to be supported for one of the models, it is useful to have a method with which to gauge the practical significance of the differences in measurement model parameters over time. Drawing on studies of latent growth models and second-order latent growth models with continuous indicators (e.g., Kim & Willson, 2014a; 2014b; Leite, 2007; Wirth, 2008), this study examined the performance of a potential sensitivity analysis to gauge the practical significance of violations of longitudinal measurement invariance for ordered-categorical indicators using second-order latent growth models. The change in the estimate of the second-order growth parameters following the addition of an incorrect level of measurement invariance constraints at the first-order level was used as an effect size for measurement non-invariance. This study investigated how sensitive the proposed sensitivity analysis was to different locations of non-invariance (i.e., non-invariance in the factor loadings, the thresholds, and the unique factor variances) given a sufficient sample size. This study also examined whether the sensitivity of the proposed sensitivity analysis depended on a number of other factors including the magnitude of non-invariance, the number of non-invariant indicators, the number of non-invariant occasions, and the number of response categories in the indicators.
ContributorsLiu, Yu, Ph.D (Author) / West, Stephen G. (Thesis advisor) / Tein, Jenn-Yun (Thesis advisor) / Green, Samuel (Committee member) / Grimm, Kevin J. (Committee member) / Arizona State University (Publisher)
Created2016
151761-Thumbnail Image.png
Description
The use of exams for classification purposes has become prevalent across many fields including professional assessment for employment screening and standards based testing in educational settings. Classification exams assign individuals to performance groups based on the comparison of their observed test scores to a pre-selected criterion (e.g. masters vs. nonmasters

The use of exams for classification purposes has become prevalent across many fields including professional assessment for employment screening and standards based testing in educational settings. Classification exams assign individuals to performance groups based on the comparison of their observed test scores to a pre-selected criterion (e.g. masters vs. nonmasters in dichotomous classification scenarios). The successful use of exams for classification purposes assumes at least minimal levels of accuracy of these classifications. Classification accuracy is an index that reflects the rate of correct classification of individuals into the same category which contains their true ability score. Traditional methods estimate classification accuracy via methods which assume that true scores follow a four-parameter beta-binomial distribution. Recent research suggests that Item Response Theory may be a preferable alternative framework for estimating examinees' true scores and may return more accurate classifications based on these scores. Researchers hypothesized that test length, the location of the cut score, the distribution of items, and the distribution of examinee ability would impact the recovery of accurate estimates of classification accuracy. The current simulation study manipulated these factors to assess their potential influence on classification accuracy. Observed classification as masters vs. nonmasters, true classification accuracy, estimated classification accuracy, BIAS, and RMSE were analyzed. In addition, Analysis of Variance tests were conducted to determine whether an interrelationship existed between levels of the four manipulated factors. Results showed small values of estimated classification accuracy and increased BIAS in accuracy estimates with few items, mismatched distributions of item difficulty and examinee ability, and extreme cut scores. A significant four-way interaction between manipulated variables was observed. In additional to interpretations of these findings and explanation of potential causes for the recovered values, recommendations that inform practice and avenues of future research are provided.
ContributorsKunze, Katie (Author) / Gorin, Joanna (Thesis advisor) / Levy, Roy (Thesis advisor) / Green, Samuel (Committee member) / Arizona State University (Publisher)
Created2013