Psychometric and Machine Learning Approaches to Diagnostic Classification

Document
Description
The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response

The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response theory (IRT), and IRT scores can subsequently be used to determine a cut score using receiver operating characteristic (ROC) curves. Psychometric methods provide reliable and interpretable scores, but the prediction of the diagnosis is not the primary product of the measurement process. In contrast, machine learning methods, such as regularization or binary recursive partitioning, can build a model from the assessment items to predict the probability of diagnosis. Machine learning predicts the diagnosis directly, but does not provide an inferential framework to explain why item responses are related to the diagnosis. It remains unclear whether psychometric and machine learning methods have comparable accuracy or if one method is preferable in some situations. In this study, Monte Carlo simulation methods were used to compare psychometric and machine learning methods on diagnostic classification accuracy. Results suggest that classification accuracy of psychometric models depends on the diagnostic-test correlation and prevalence of diagnosis. Also, machine learning methods that reduce prediction error have inflated specificity and very low sensitivity compared to the data-generating model, especially when prevalence is low. Finally, machine learning methods that use ROC curves to determine probability thresholds have comparable classification accuracy to the psychometric models as sample size, number of items, and number of item categories increase. Therefore, results suggest that machine learning models could provide a viable alternative for classification in diagnostic assessments. Strengths and limitations for each of the methods are discussed, and future directions are considered.