Search Content

Modern psychometric theory in clinical assessment

Description

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. While IRT has become prevalent in the assessment of ability and achievement, it has not been widely embraced by clinical psychologists. This appears due, in part, to psychometrists'…

Item response theory (IRT) and related latent variable models represent modern psychometric theory, the successor to classical test theory in psychological assessment. While IRT has become prevalent in the assessment of ability and achievement, it has not been widely embraced by clinical psychologists. This appears due, in part, to psychometrists' use of unidimensional models despite evidence that psychiatric disorders are inherently multidimensional. The construct validity of unidimensional and multidimensional latent variable models was compared to evaluate the utility of modern psychometric theory in clinical assessment. Archival data consisting of 688 outpatients' presenting concerns, psychiatric diagnoses, and item level responses to the Brief Symptom Inventory (BSI) were extracted from files at a university mental health clinic. Confirmatory factor analyses revealed that models with oblique factors and/or item cross-loadings better represented the internal structure of the BSI in comparison to a strictly unidimensional model. The models were generally equivalent in their ability to account for variance in criterion-related validity variables; however, bifactor models demonstrated superior validity in differentiating between mood and anxiety disorder diagnoses. Multidimensional IRT analyses showed that the orthogonal bifactor model partitioned distinct, clinically relevant sources of item variance. Similar results were also achieved through multivariate prediction with an oblique simple structure model. Receiver operating characteristic curves confirmed improved sensitivity and specificity through multidimensional models of psychopathology. Clinical researchers are encouraged to consider these and other comprehensive models of psychological distress.

ContributorsThomas, Michael Lee (Author) / Lanyon, Richard (Thesis advisor) / Barrera, Manuel (Committee member) / Levy, Roy (Committee member) / Millsap, Roger (Committee member) / Arizona State University (Publisher)

Created2011

Assessing dimensionality in complex data structures: a performance comparison of DETECT and NOHARM procedures

Description

The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in compensatory and noncompensatory multidimensional item response models (MIRT) of assessment data using dimensionality assessment procedures based on conditional covariances (i.e., DETECT) and a factor analytical approach (i.e., NOHARM). The DETECT-based methods typically outperformed…

The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in compensatory and noncompensatory multidimensional item response models (MIRT) of assessment data using dimensionality assessment procedures based on conditional covariances (i.e., DETECT) and a factor analytical approach (i.e., NOHARM). The DETECT-based methods typically outperformed the NOHARM-based methods in both two- (2D) and three-dimensional (3D) compensatory MIRT conditions. The DETECT-based methods yielded high proportion correct, especially when correlations were .60 or smaller, data exhibited 30% or less complexity, and larger sample size. As the complexity increased and the sample size decreased, the performance typically diminished. As the complexity increased, it also became more difficult to label the resulting sets of items from DETECT in terms of the dimensions. DETECT was consistent in classification of simple items, but less consistent in classification of complex items. Out of the three NOHARM-based methods, χ2G/D and ALR generally outperformed RMSR. χ2G/D was more accurate when N = 500 and complexity levels were 30% or lower. As the number of items increased, ALR performance improved at correlation of .60 and 30% or less complexity. When the data followed a noncompensatory MIRT model, the NOHARM-based methods, specifically χ2G/D and ALR, were the most accurate of all five methods. The marginal proportions for labeling sets of items as dimension-like were typically low, suggesting that the methods generally failed to label two (three) sets of items as dimension-like in 2D (3D) noncompensatory situations. The DETECT-based methods were more consistent in classifying simple items across complexity levels, sample sizes, and correlations. However, as complexity and correlation levels increased the classification rates for all methods decreased. In most conditions, the DETECT-based methods classified complex items equally or more consistent than the NOHARM-based methods. In particular, as complexity, the number of items, and the true dimensionality increased, the DETECT-based methods were notably more consistent than any NOHARM-based method. Despite DETECT's consistency, when data follow a noncompensatory MIRT model, the NOHARM-based method should be preferred over the DETECT-based methods to assess dimensionality due to poor performance of DETECT in identifying the true dimensionality.

ContributorsSvetina, Dubravka (Author) / Levy, Roy (Thesis advisor) / Gorin, Joanna S. (Committee member) / Millsap, Roger (Committee member) / Arizona State University (Publisher)

Created2011

Teachers' preferred methods of gaining information about epilepsy

Description

Children with epilepsy represent a unique group of students who may require accommodations in school to be optimally successful. Therefore, it is important for teachers to understand the possible academic consequences epilepsy can have on a child. An important step in providing this information about epilepsy to teachers…

Children with epilepsy represent a unique group of students who may require accommodations in school to be optimally successful. Therefore, it is important for teachers to understand the possible academic consequences epilepsy can have on a child. An important step in providing this information about epilepsy to teachers is understanding where they would prefer to acquire this information. The current study examined differences between teachers of differing ages, school levels and special education teaching status in their preferences for gaining information from parents and the internet. Contrary to expectations, older teachers (those 56 years of age and older) were no less likely that younger teachers to prefer information from the internet. As predicted, elementary school teachers were more likely than high school teachers to prefer information from parents. However, interestingly middle school teachers were also more likely to prefer information from parents than high school teachers. Lastly, contrary to hypothesized results, special education teachers were no more likely to prefer information from parents than non-special education colleagues. Limitations of this study, implications for practice and directions for future research are discussed.

ContributorsGay, Catherine (Author) / Wodrich, David (Thesis advisor) / Levy, Roy (Committee member) / Hart, Juliet (Committee member) / Arizona State University (Publisher)

Created2011

The accuracy of accuracy estimates for single form dichotomous classification exams

Description

The use of exams for classification purposes has become prevalent across many fields including professional assessment for employment screening and standards based testing in educational settings. Classification exams assign individuals to performance groups based on the comparison of their observed test scores to a pre-selected criterion (e.g. masters vs. nonmasters…

The use of exams for classification purposes has become prevalent across many fields including professional assessment for employment screening and standards based testing in educational settings. Classification exams assign individuals to performance groups based on the comparison of their observed test scores to a pre-selected criterion (e.g. masters vs. nonmasters in dichotomous classification scenarios). The successful use of exams for classification purposes assumes at least minimal levels of accuracy of these classifications. Classification accuracy is an index that reflects the rate of correct classification of individuals into the same category which contains their true ability score. Traditional methods estimate classification accuracy via methods which assume that true scores follow a four-parameter beta-binomial distribution. Recent research suggests that Item Response Theory may be a preferable alternative framework for estimating examinees' true scores and may return more accurate classifications based on these scores. Researchers hypothesized that test length, the location of the cut score, the distribution of items, and the distribution of examinee ability would impact the recovery of accurate estimates of classification accuracy. The current simulation study manipulated these factors to assess their potential influence on classification accuracy. Observed classification as masters vs. nonmasters, true classification accuracy, estimated classification accuracy, BIAS, and RMSE were analyzed. In addition, Analysis of Variance tests were conducted to determine whether an interrelationship existed between levels of the four manipulated factors. Results showed small values of estimated classification accuracy and increased BIAS in accuracy estimates with few items, mismatched distributions of item difficulty and examinee ability, and extreme cut scores. A significant four-way interaction between manipulated variables was observed. In additional to interpretations of these findings and explanation of potential causes for the recovered values, recommendations that inform practice and avenues of future research are provided.

ContributorsKunze, Katie (Author) / Gorin, Joanna (Thesis advisor) / Levy, Roy (Thesis advisor) / Green, Samuel (Committee member) / Arizona State University (Publisher)

Created2013

A comparison of DIMTEST and generalized dimensionality discrepancy approaches to assessing dimensionality in item response theory

Description

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools;…

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools; see e.g., Froelich & Habing, 2007). It remains to be seen how such procedures perform in the context of small-scale assessments characterized by relatively small sample sizes and/or short tests. The fact that some procedures come with minimum allowable values for characteristics of the data, such as the number of items, may even render them unusable for some small-scale assessments. Other measures designed to assess dimensionality do not come with such limitations and, as such, may perform better under conditions that do not lend themselves to evaluation via statistics that rely on asymptotic theory. The current work aimed to evaluate the performance of one such metric, the standardized generalized dimensionality discrepancy measure (SGDDM; Levy & Svetina, 2011; Levy, Xu, Yel, & Svetina, 2012), under both large- and small-scale testing conditions. A Monte Carlo study was conducted to compare the performance of DIMTEST and the SGDDM statistic in terms of evaluating assumptions of unidimensionality in item response data under a variety of conditions, with an emphasis on the examination of these procedures in small-scale assessments. Similar to previous research, increases in either test length or sample size resulted in increased power. The DIMTEST procedure appeared to be a conservative test of the null hypothesis of unidimensionality. The SGDDM statistic exhibited rejection rates near the nominal rate of .05 under unidimensional conditions, though the reliability of these results may have been less than optimal due to high sampling variability resulting from a relatively limited number of replications. Power values were at or near 1.0 for many of the multidimensional conditions. It was only when the sample size was reduced to N = 100 that the two approaches diverged in performance. Results suggested that both procedures may be appropriate for sample sizes as low as N = 250 and tests as short as J = 12 (SGDDM) or J = 19 (DIMTEST). When used as a diagnostic tool, SGDDM may be appropriate with as few as N = 100 cases combined with J = 12 items. The study was somewhat limited in that it did not include any complex factorial designs, nor were the strength of item discrimination parameters or correlation between factors manipulated. It is recommended that further research be conducted with the inclusion of these factors, as well as an increase in the number of replications when using the SGDDM procedure.

ContributorsReichenberg, Ray E (Author) / Levy, Roy (Thesis advisor) / Thompson, Marilyn S. (Thesis advisor) / Green, Samuel B. (Committee member) / Arizona State University (Publisher)

Created2013

A Psychometric Analysis of an Operational ASU Exam

Description

This thesis explored the psychometric properties of an ASU midterm. These analyses were done to explore the efficacy of the questions on the exam using the methods of item analysis difficulty and discrimination. The discrimination and difficulty scores as well as the correlations of questions led to suggests of questions…

This thesis explored the psychometric properties of an ASU midterm. These analyses were done to explore the efficacy of the questions on the exam using the methods of item analysis difficulty and discrimination. The discrimination and difficulty scores as well as the correlations of questions led to suggests of questions that may need revision.

ContributorsLowell, Emily M (Author) / Levy, Roy (Thesis director) / O'Rourke, Holly (Committee member) / School of Life Sciences (Contributor) / Department of Psychology (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Assessing measurement invariance and latent mean differences with bifactor multidimensional data in structural equation modeling

Description

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of…

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of mismatch in dimensionality between data and analysis models with multiple-group analyses at the population and sample levels. Datasets were generated using a bifactor model with different factor structures and were analyzed with bifactor and single-factor models to assess misspecification effects on assessments of MI and latent mean differences. As baseline models, the bifactor models fit data well and had minimal bias in latent mean estimation. However, the low convergence rates of fitting bifactor models to data with complex structures and small sample sizes caused concern. On the other hand, effects of fitting the misspecified single-factor models on the assessments of MI and latent means differed by the bifactor structures underlying data. For data following one general factor and one group factor affecting a small set of indicators, the effects of ignoring the group factor in analysis models on the tests of MI and latent mean differences were mild. In contrast, for data following one general factor and several group factors, oversimplifications of analysis models can lead to inaccurate conclusions regarding MI assessment and latent mean estimation.

ContributorsXu, Yuning (Author) / Green, Samuel (Thesis advisor) / Levy, Roy (Committee member) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)

Created2018

Robustness of the General Factor Mean Difference Estimation in Bifactor Ordinal Data

Description

A simulation study was conducted to explore the robustness of general factor mean difference estimation in bifactor ordered-categorical data. In the No Differential Item Functioning (DIF) conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of the general factor mean difference, and…

A simulation study was conducted to explore the robustness of general factor mean difference estimation in bifactor ordered-categorical data. In the No Differential Item Functioning (DIF) conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of the general factor mean difference, and the size of specific factor loadings; in data analysis, misspecification conditions were introduced in which the generated bifactor data were fit using a unidimensional model, and/or ordered-categorical data were treated as continuous data. In the DIF conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of latent mean difference for the general factor, the type of item parameters that had DIF, and the magnitude of DIF; the data analysis conditions varied in whether or not setting equality constraints on the noninvariant item parameters.

Results showed that falsely fitting bifactor data using unidimensional models or failing to account for DIF in item parameters resulted in estimation bias in the general factor mean difference, while treating ordinal data as continuous had little influence on the estimation bias as long as there was no severe model misspecification. The extent of estimation bias produced by misspecification of bifactor datasets with unidimensional models was mainly determined by the degree of unidimensionality (i.e., size of specific factor loadings) and the general factor mean difference size. When the DIF was present, the estimation accuracy of the general factor mean difference was completely robust to ignoring noninvariance in specific factor loadings while it was very sensitive to failing to account for DIF in threshold parameters. With respect to ignoring the DIF in general factor loadings, the estimation bias of the general factor mean difference was substantial when the DIF was -0.15, and it can be negligible for smaller sizes of DIF. Despite the impact of model misspecification on estimation accuracy, the power to detect the general factor mean difference was mainly influenced by the sample size and effect size. Serious Type I error rate inflation only occurred when the DIF was present in threshold parameters.

ContributorsLiu, Yixing (Author) / Thompson, Marilyn (Thesis advisor) / Levy, Roy (Committee member) / O’Rourke, Holly (Committee member) / Arizona State University (Publisher)

Created2019

Three-level multiple imputation: a fully conditional specification approach

Description

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no work done on fully conditional specication (FCS)

method. Moreover, the literature…

Currently, there is a clear gap in the missing data literature for three-level models.

To date, the literature has only focused on the theoretical and algorithmic work

required to implement three-level imputation using the joint model (JM) method of

imputation, leaving relatively no work done on fully conditional specication (FCS)

method. Moreover, the literature lacks any methodological evaluation of three-level

imputation. Thus, this thesis serves two purposes: (1) to develop an algorithm in

order to implement FCS in the context of a three-level model and (2) to evaluate

both imputation methods. The simulation investigated a random intercept model

under both 20% and 40% missing data rates. The ndings of this thesis suggest

that the estimates for both JM and FCS were largely unbiased, gave good coverage,

and produced similar results. The sole exception for both methods was the slope for

the level-3 variable, which was modestly biased. The bias exhibited by the methods

could be due to the small number of clusters used. This nding suggests that future

research ought to investigate and establish clear recommendations for the number of

clusters required by these imputation methods. To conclude, this thesis serves as a

preliminary start in tackling a much larger issue and gap in the current missing data

literature.

ContributorsKeller, Brian Tinnell (Author) / Enders, Craig K. (Thesis advisor) / Grimm, Kevin J. (Committee member) / Levy, Roy (Committee member) / Arizona State University (Publisher)

Created2015

The impact of partial measurement invariance on between-group comparisons of latent means for a second-order factor

Description

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both…

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both levels for a second-order CFA model. The numbers and directions of differences in noninvariant loadings or intercepts were also manipulated, along with total sample size and effect size of the second-order factor mean difference. Data were analyzed using correct and incorrect specifications of noninvariant loadings and intercepts. Results summarized across the 5,000 replications in each condition included Type I error rates and powers for the chi-square difference test and the Wald test of the second-order factor mean difference, estimation bias and efficiency for this latent mean difference, and means of the standardized root mean square residual (SRMR) and the root mean square error of approximation (RMSEA).

When the model was correctly specified, no obvious estimation bias was observed; when the model was misspecified by constraining noninvariant loadings or intercepts to be equal, the latent mean difference was overestimated if the direction of the difference in loadings or intercepts of was consistent with the direction of the latent mean difference, and vice versa. Increasing the number of noninvariant loadings or intercepts resulted in larger estimation bias if these noninvariant loadings or intercepts were constrained to be equal. Power to detect the latent mean difference was influenced by estimation bias and the estimated variance of the difference in the second-order factor mean, in addition to sample size and effect size. Constraining more parameters to be equal between groups—even when unequal in the population—led to a decrease in the variance of the estimated latent mean difference, which increased power somewhat. Finally, RMSEA was very sensitive for detecting misspecification due to improper equality constraints in all conditions in the current scenario, including the nonzero latent mean difference, but SRMR did not increase as expected when noninvariant parameters were constrained.

ContributorsLiu, Yixing (Author) / Thompson, Marilyn (Thesis advisor) / Green, Samuel (Committee member) / Levy, Roy (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by