Matching Items (14)

131721-Thumbnail Image.png

A Psychometric Analysis of an Operational ASU Exam

Description

This thesis explored the psychometric properties of an ASU midterm. These analyses were done to explore the efficacy of the questions on the exam using the methods of item analysis

This thesis explored the psychometric properties of an ASU midterm. These analyses were done to explore the efficacy of the questions on the exam using the methods of item analysis difficulty and discrimination. The discrimination and difficulty scores as well as the correlations of questions led to suggests of questions that may need revision.

Contributors

Agent

Created

Date Created
  • 2020-05

135483-Thumbnail Image.png

Issues of Validity in High-Stakes Testing

Description

Responsible test use requires validation \u2014 the process of collecting evidence to support the inferences drawn from test scores. In high-stakes testing contexts, the need for validation is especially great;

Responsible test use requires validation \u2014 the process of collecting evidence to support the inferences drawn from test scores. In high-stakes testing contexts, the need for validation is especially great; the far-reaching nature of high-stakes testing affects the educational, professional, and financial futures of stakeholders. The Standards for Educational and Psychological Measurement (AERA et al., 2014) offers specific guidance in developing and implementing tests. Still, concerns exist over the extent to which test developers and users of high-stakes tests are making valid inferences from test scores. This paper explores the current state of high-stakes educational testing and the validity issues surrounding it. Drawing on measurement theory literature, educational literature, and professional standards of test development and use, I assess the significance of these concerns and their potential implications for the stakeholders of high-stakes testing programs.

Contributors

Agent

Created

Date Created
  • 2016-05

149971-Thumbnail Image.png

The sensitivity of confirmatory factor analytic fit indices to violations of factorial invariance across latent classes: a simulation study

Description

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as

Although the issue of factorial invariance has received increasing attention in the literature, the focus is typically on differences in factor structure across groups that are directly observed, such as those denoted by sex or ethnicity. While establishing factorial invariance across observed groups is a requisite step in making meaningful cross-group comparisons, failure to attend to possible sources of latent class heterogeneity in the form of class-based differences in factor structure has the potential to compromise conclusions with respect to observed groups and may result in misguided attempts at instrument development and theory refinement. The present studies examined the sensitivity of two widely used confirmatory factor analytic model fit indices, the chi-square test of model fit and RMSEA, to latent class differences in factor structure. Two primary questions were addressed. The first of these concerned the impact of latent class differences in factor loadings with respect to model fit in a single sample reflecting a mixture of classes. The second question concerned the impact of latent class differences in configural structure on tests of factorial invariance across observed groups. The results suggest that both indices are highly insensitive to class-based differences in factor loadings. Across sample size conditions, models with medium (0.2) sized loading differences were rejected by the chi-square test of model fit at rates just slightly higher than the nominal .05 rate of rejection that would be expected under a true null hypothesis. While rates of rejection increased somewhat when the magnitude of loading difference increased, even the largest sample size with equal class representation and the most extreme violations of loading invariance only had rejection rates of approximately 60%. RMSEA was also insensitive to class-based differences in factor loadings, with mean values across conditions suggesting a degree of fit that would generally be regarded as exceptionally good in practice. In contrast, both indices were sensitive to class-based differences in configural structure in the context of a multiple group analysis in which each observed group was a mixture of classes. However, preliminary evidence suggests that this sensitivity may contingent on the form of the cross-group model misspecification.

Contributors

Agent

Created

Date Created
  • 2011

154889-Thumbnail Image.png

Time metric in latent difference score models

Description

Time metric is an important consideration for all longitudinal models because it can influence the interpretation of estimates, parameter estimate accuracy, and model convergence in longitudinal models with latent variables.

Time metric is an important consideration for all longitudinal models because it can influence the interpretation of estimates, parameter estimate accuracy, and model convergence in longitudinal models with latent variables. Currently, the literature on latent difference score (LDS) models does not discuss the importance of time metric. Furthermore, there is little research using simulations to investigate LDS models. This study examined the influence of time metric on model estimation, interpretation, parameter estimate accuracy, and convergence in LDS models using empirical simulations. Results indicated that for a time structure with a true time metric where participants had different starting points and unequally spaced intervals, LDS models fit with a restructured and less informative time metric resulted in biased parameter estimates. However, models examined using the true time metric were less likely to converge than models using the restructured time metric, likely due to missing data. Where participants had different starting points but equally spaced intervals, LDS models fit with a restructured time metric resulted in biased estimates of intercept means, but all other parameter estimates were unbiased, and models examined using the true time metric had less convergence than the restructured time metric as well due to missing data. The findings of this study support prior research on time metric in longitudinal models, and further research should examine these findings under alternative conditions. The importance of these findings for substantive researchers is discussed.

Contributors

Agent

Created

Date Created
  • 2016

154905-Thumbnail Image.png

Determining appropriate sample sizes and their effects on key parameters in longitudinal three-level models

Description

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75,

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained using a large-scale longitudinal data set from North Carolina, provided by the National Center on Assessment and Accountability for Special Education (NCAASE). I discuss ranges of sample sizes that are inadequate or adequate for convergence, absolute bias, relative bias, root mean squared error (RMSE), and coverage of individual parameter estimates. The current study, with the help of a detailed two-part simulation design for various sample sizes, model complexity and ICCs, provides various options of adequate sample sizes under different conditions. This study emphasizes that adequate sample sizes at either L1, L2, and L3 can be adjusted according to different interests in parameter estimates, different ranges of acceptable absolute bias, relative bias, root mean squared error, and coverage. Under different model complexity and varying ICC conditions, this study aims to help researchers identify L1, L2, and L3 sample size or both as the source of variation in absolute bias, relative bias, RMSE, or coverage proportions for a certain parameter estimate. This assists researchers in making better decisions for selecting adequate sample sizes in a three-level HLM analysis. A limitation of the study was the use of only a single distribution for the dependent and explanatory variables, different types of distributions and their effects might result in different sample size recommendations.

Contributors

Agent

Created

Date Created
  • 2016

151636-Thumbnail Image.png

Communicating with compassion: the exploratory factor analysis and primary validation process of the Compassionate Communication Scale

Description

The purpose of this dissertation was to develop a Compassionate Communication Scale (CCS) by conducting a series of studies. The first study used qualitative data to identify and develop initial

The purpose of this dissertation was to develop a Compassionate Communication Scale (CCS) by conducting a series of studies. The first study used qualitative data to identify and develop initial scale items. A series of follow-up studies used exploratory factor analysis to investigate the underlying structure of the CCS. A three-factor structure emerged, which included: Compassionate conversation, such as listening, letting the distressed person disclose feelings, and making empathetic remarks; compassionate touch, such as holding someone's hand or patting someone's back; and compassionate messaging, such as posting an encouraging message on a social networking site or sending a sympathetic email. The next study tested convergent and divergent validity by determining how the three forms of compassionate communication associate with various traits. Compassionate conversation was positively related to compassion, empathetic concern, perspective taking, emotional intelligence, social expressivity, emotional expressivity and benevolence, and negatively related to verbal aggressiveness and narcissism. Compassionate touch was positively correlated with compassion, empathetic concern, perspective taking, emotional intelligence, social expressivity, emotional expressivity, and benevolence, and uncorrelated with verbal aggressiveness and benevolence. Finally, compassionate messaging was positively correlated with social expressivity, emotional expressivity, and uncorrelated with verbal aggressiveness and narcissism. The next study focused on cross-validation and criterion-related validity. Correlations showing that self-reports of a person's compassionate communication were positively related to a friend or romantic partner's report of that person's compassionate communication provided cross-validation. The test for criterion-related validity examined whether compassionate communication predicts relational satisfaction. Regression analyses revealed that people were more relationally satisfied when they perceived themselves to use compassionate conversation, when they perceived their partner to use compassionate conversation, and when their partner reported using compassionate conversation. This finding did not extend to compassionate touch or compassionate messaging. In fact, in one regression analysis, people reported more relational satisfaction when they perceived that their partners used high levels of compassionate conversation and low levels of compassionate touch. Overall, the analyses suggest that of the three forms of compassionate communication, compassionate conversation is most strongly related to relational satisfaction. Taken together, this series of studies provides initial evidence for the validity of the CCS.

Contributors

Agent

Created

Date Created
  • 2013

151192-Thumbnail Image.png

Developing a measure of cyberbullying perpetration and victimization

Description

This research addressed concerns regarding the measurement of cyberbullying and aimed to develop a reliable and valid measure of cyberbullying perpetration and victimization. Despite the growing body of literature on

This research addressed concerns regarding the measurement of cyberbullying and aimed to develop a reliable and valid measure of cyberbullying perpetration and victimization. Despite the growing body of literature on cyberbullying, several measurement concerns were identified and addressed in two pilot studies. These concerns included the most appropriate time frame for behavioral recall, use of the term "cyberbullying" in questionnaire instructions, whether to refer to power in instances of cyberbullying, and best practices for designing self-report measures to reflect how young adults understand and communicate about cyberbullying. Mixed methodology was employed in two pilot studies to address these concerns and to determine how to best design a measure which participants could respond to accurately and honestly. Pilot study one consisted of an experimental examination of time frame for recall and use of the term on the outcomes of honesty, accuracy, and social desirability. Pilot study two involved a qualitative examination of several measurement concerns through focus groups held with young adults. Results suggested that one academic year was the most appropriate time frame for behavioral recall, to avoid use of the term "cyberbullying" in questionnaire instructions, to include references to power, and other suggestions for the improving the method in the main study to bolster participants' attention. These findings informed the development of a final measure in the main study which aimed to be both practical in its ability to capture prevalence and precise in its ability to measure frequency. The main study involved examining the psychometric properties, reliability, and validity of the final measure. Results of the main study indicated that the final measure exhibited qualities of an index and was assessed as such. Further, structural equation modeling techniques and test-retest procedures indicated the measure had good reliability. And, good predictive validity and satisfactory convergent validity was established for the final measure. Results derived from the measure concerning prevalence, frequency, and chronicity are presented within the scope of findings in cyberbullying literature. Implications for practice and future directions for research with the measure developed here are discussed.

Contributors

Agent

Created

Date Created
  • 2012

151105-Thumbnail Image.png

Modeling motivation: examining the structural validity of the Sport Motivation Scale-6 among runners

Description

Two models of motivation are prevalent in the literature on sport and exercise participation (Deci & Ryan, 1991; Vallerand, 1997, 2000). Both models are grounded in self-determination theory (Deci &

Two models of motivation are prevalent in the literature on sport and exercise participation (Deci & Ryan, 1991; Vallerand, 1997, 2000). Both models are grounded in self-determination theory (Deci & Ryan, 1985; Ryan & Deci, 2000) and consider the relationship between intrinsic, extrinsic, and amotivation in explaining behavior choice and outcomes. Both models articulate the relationship between need satisfaction (i.e., autonomy, competence, relatedness; Deci & Ryan, 1985, 2000; Ryan & Deci, 2000) and various cognitive, affective, and behavioral outcomes as a function of self-determined motivation. Despite these comprehensive models, inconsistencies remain between the theories and their practical applications. The purpose of my study was to examine alternative theoretical models of intrinsic, extrinsic, and amotivation using the Sport Motivation Scale-6 (SMS-6; Mallett et al., 2007) to more thoroughly study the structure of motivation and the practical utility of using such a scale to measure motivation among runners. Confirmatory factor analysis was used to evaluate eight alternative models. After finding unsatisfactory fit of these models, exploratory factor analysis was conducted post hoc to further examine the measurement structure of motivation. A three-factor structure of general motivation, external accolades, and isolation/solitude explained motivation best, although high cross-loadings of items suggest the structure of this construct still lacks clarity. Future directions to modify item content and re-examine structure as well as limitations of this study are discussed.

Contributors

Agent

Created

Date Created
  • 2012

157034-Thumbnail Image.png

Evaluation of five effect size measures of measurement non-invariance for continuous outcomes

Description

To make meaningful comparisons on a construct of interest across groups or over time, measurement invariance needs to exist for at least a subset of the observed variables that define

To make meaningful comparisons on a construct of interest across groups or over time, measurement invariance needs to exist for at least a subset of the observed variables that define the construct. Often, chi-square difference tests are used to test for measurement invariance. However, these statistics are affected by sample size such that larger sample sizes are associated with a greater prevalence of significant tests. Thus, using other measures of non-invariance to aid in the decision process would be beneficial. For this dissertation project, I proposed four new effect size measures of measurement non-invariance and analyzed a Monte Carlo simulation study to evaluate their properties and behavior in addition to the properties and behavior of an already existing effect size measure of non-invariance. The effect size measures were evaluated based on bias, variability, and consistency. Additionally, the factors that affected the value of the effect size measures were analyzed. All studied effect sizes were consistent, but three were biased under certain conditions. Further work is needed to establish benchmarks for the unbiased effect sizes.

Contributors

Agent

Created

Date Created
  • 2019

156579-Thumbnail Image.png

Psychometric and Machine Learning Approaches to Diagnostic Classification

Description

The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can

The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response theory (IRT), and IRT scores can subsequently be used to determine a cut score using receiver operating characteristic (ROC) curves. Psychometric methods provide reliable and interpretable scores, but the prediction of the diagnosis is not the primary product of the measurement process. In contrast, machine learning methods, such as regularization or binary recursive partitioning, can build a model from the assessment items to predict the probability of diagnosis. Machine learning predicts the diagnosis directly, but does not provide an inferential framework to explain why item responses are related to the diagnosis. It remains unclear whether psychometric and machine learning methods have comparable accuracy or if one method is preferable in some situations. In this study, Monte Carlo simulation methods were used to compare psychometric and machine learning methods on diagnostic classification accuracy. Results suggest that classification accuracy of psychometric models depends on the diagnostic-test correlation and prevalence of diagnosis. Also, machine learning methods that reduce prediction error have inflated specificity and very low sensitivity compared to the data-generating model, especially when prevalence is low. Finally, machine learning methods that use ROC curves to determine probability thresholds have comparable classification accuracy to the psychometric models as sample size, number of items, and number of item categories increase. Therefore, results suggest that machine learning models could provide a viable alternative for classification in diagnostic assessments. Strengths and limitations for each of the methods are discussed, and future directions are considered.

Contributors

Agent

Created

Date Created
  • 2018