Matching Items (4)
151992-Thumbnail Image.png
Description
Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools;

Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools; see e.g., Froelich & Habing, 2007). It remains to be seen how such procedures perform in the context of small-scale assessments characterized by relatively small sample sizes and/or short tests. The fact that some procedures come with minimum allowable values for characteristics of the data, such as the number of items, may even render them unusable for some small-scale assessments. Other measures designed to assess dimensionality do not come with such limitations and, as such, may perform better under conditions that do not lend themselves to evaluation via statistics that rely on asymptotic theory. The current work aimed to evaluate the performance of one such metric, the standardized generalized dimensionality discrepancy measure (SGDDM; Levy & Svetina, 2011; Levy, Xu, Yel, & Svetina, 2012), under both large- and small-scale testing conditions. A Monte Carlo study was conducted to compare the performance of DIMTEST and the SGDDM statistic in terms of evaluating assumptions of unidimensionality in item response data under a variety of conditions, with an emphasis on the examination of these procedures in small-scale assessments. Similar to previous research, increases in either test length or sample size resulted in increased power. The DIMTEST procedure appeared to be a conservative test of the null hypothesis of unidimensionality. The SGDDM statistic exhibited rejection rates near the nominal rate of .05 under unidimensional conditions, though the reliability of these results may have been less than optimal due to high sampling variability resulting from a relatively limited number of replications. Power values were at or near 1.0 for many of the multidimensional conditions. It was only when the sample size was reduced to N = 100 that the two approaches diverged in performance. Results suggested that both procedures may be appropriate for sample sizes as low as N = 250 and tests as short as J = 12 (SGDDM) or J = 19 (DIMTEST). When used as a diagnostic tool, SGDDM may be appropriate with as few as N = 100 cases combined with J = 12 items. The study was somewhat limited in that it did not include any complex factorial designs, nor were the strength of item discrimination parameters or correlation between factors manipulated. It is recommended that further research be conducted with the inclusion of these factors, as well as an increase in the number of replications when using the SGDDM procedure.
ContributorsReichenberg, Ray E (Author) / Levy, Roy (Thesis advisor) / Thompson, Marilyn S. (Thesis advisor) / Green, Samuel B. (Committee member) / Arizona State University (Publisher)
Created2013
150759-Thumbnail Image.png
Description
Many schools have adopted programming designed to promote students' behavioral aptitude. A specific type of programming with this focus is School Wide Positive Behavior Supports (SWPBS), which combines positive behavior techniques with a system wide problem solving model. Aspects of this model are still being developed in the research community,

Many schools have adopted programming designed to promote students' behavioral aptitude. A specific type of programming with this focus is School Wide Positive Behavior Supports (SWPBS), which combines positive behavior techniques with a system wide problem solving model. Aspects of this model are still being developed in the research community, including assessment techniques which aid the decision making process. Tools for screening entire student populations are examples of such assessment interests. Although screening tools which have been described as "empirically validated" and "cost effective" have been around since at least 1991, they have yet to become standard practice (Lane, Gresham, & O'Shaughnessy 2002). The lack of widespread implementation to date raises questions regarding their ecological validity and actual cost-effectiveness, leaving the development of useful tools for screening an ongoing project for many researchers. It may be beneficial for educators to expand the range of measurement to include tools which measure the symptoms at the root of the problematic behaviors. Lane, Grasham, and O'Shaughnessy (2002) note the possibility that factors from within a student, including those that are cognitive in nature, may influence not only his or her academic performance, but also aspects of behavior. A line of logic follows wherein measurement of those factors may aid the early identification of students at risk for developing disorders with related symptoms. The validity and practicality of various tools available for screening in SWPBS were investigated, including brief behavior rating scales completed by parents and teachers, as well as performance tasks borrowed from the field of neuropsychology. All instruments showed an ability to predict children's behavior, although not to equal extents. A discussion of practicality and predictive utility of each instrument follows.
ContributorsHall, Morgan (Author) / Caterino, Linda (Thesis advisor) / Mathur, Sarup (Thesis advisor) / Husman, Jenefer (Committee member) / Arizona State University (Publisher)
Created2012
150934-Thumbnail Image.png
Description
The existing minima for sample size and test length recommendations for DIMTEST (750 examinees and 25 items) are tied to features of the procedure that are no longer in use. The current version of DIMTEST uses a bootstrapping procedure to remove bias from the test statistic and is packaged with

The existing minima for sample size and test length recommendations for DIMTEST (750 examinees and 25 items) are tied to features of the procedure that are no longer in use. The current version of DIMTEST uses a bootstrapping procedure to remove bias from the test statistic and is packaged with a conditional covariance-based procedure called ATFIND for partitioning test items. Key factors such as sample size, test length, test structure, the correlation between dimensions, and strength of dependence were manipulated in a Monte Carlo study to assess the effectiveness of the current version of DIMTEST with fewer examinees and items. In addition, the DETECT program was also used to partition test items; a second feature of this study also compared the structure of test partitions obtained with ATFIND and DETECT in a number of ways. With some exceptions, the performance of DIMTEST was quite conservative in unidimensional conditions. The performance of DIMTEST in multidimensional conditions depended on each of the manipulated factors, and did suggest that the minima of sample size and test length can be made lower for some conditions. In terms of partitioning test items in unidimensional conditions, DETECT tended to produce longer assessment subtests than ATFIND in turn yielding different test partitions. In multidimensional conditions, test partitions became more similar and were more accurate with increased sample size, for factorially simple data, greater strength of dependence, and a decreased correlation between dimensions. Recommendations for sample size and test length minima are provided along with suggestions for future research.
ContributorsFay, Derek (Author) / Levy, Roy (Thesis advisor) / Green, Samuel (Committee member) / Gorin, Joanna (Committee member) / Arizona State University (Publisher)
Created2012
156311-Thumbnail Image.png
Description
To foster both external and internal accountability, universities seek more effective models for student learning outcomes assessment (SLOA). Meaningful and authentic measurement of program-level student learning outcomes requires engagement with an institution’s faculty members, especially to gather student performance assessment data using common scoring instruments, or rubrics, across a university’s

To foster both external and internal accountability, universities seek more effective models for student learning outcomes assessment (SLOA). Meaningful and authentic measurement of program-level student learning outcomes requires engagement with an institution’s faculty members, especially to gather student performance assessment data using common scoring instruments, or rubrics, across a university’s many colleges and programs. Too often, however, institutions rely on faculty engagement for SLOA initiatives like this without providing necessary support, communication, and training. The resulting data may lack sufficient reliability and reflect deficiencies in an institution’s culture of assessment.

This mixed methods action research study gauged how well one form of SLOA training – a rubric-norming workshop – could affect both inter-rater reliability for faculty scorers and faculty perceptions of SLOA while exploring the nature of faculty collaboration toward a shared understanding of student learning outcomes. The study participants, ten part-time faculty members at the institution, each held primary careers in the health care industry, apart from their secondary role teaching university courses. Accordingly, each contributed expertise and experience to the rubric-norming discussions, surveys of assessment-related perceptions, and individual scoring of student performance with a common rubric. Drawing on sociocultural learning principles and the specific lens of activity theory, influences on faculty SLOA were arranged and analyzed within the heuristic framework of an activity system to discern effects of collaboration and perceptions toward SLOA on consistent rubric-scoring by faculty participants.

Findings suggest participation in the study did not correlate to increased inter-rater reliability for faculty scorers when using the common rubric. Constraints found within assessment tools and unclear institutional leadership prevented more reliable use of common rubrics. Instead, faculty participants resorted to individual assessment approaches to meaningfully guide students to classroom achievement and preparation for careers in the health care field. Despite this, faculty participants valued SLOA, collaborated readily with colleagues for shared assessment goals, and worked hard to teach and assess students meaningfully.
ContributorsWilliams, Nicholas (Author) / Liou, Daniel D (Thesis advisor) / Rotheram-Fuller, Erin (Committee member) / Turbow, David (Committee member) / Arizona State University (Publisher)
Created2018