Search Content

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The…

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

ContributorsCham, Hei Ning (Author) / Tein, Jenn-Yun (Thesis advisor) / Enders, Stephen G (Thesis advisor) / Enders, Craig K. (Committee member) / Mackinnon, David P (Committee member) / Arizona State University (Publisher)

Created2013

Competency Assessment in Nursing Using Simulation: A Generalizability Study and Scenario Validation Process

Description

The measurement of competency in nursing is critical to ensure safe and effective care of patients. This study had two purposes. First, the psychometric characteristics of the Nursing Performance Profile (NPP), an instrument used to measure nursing competency, were evaluated using generalizability theory and a sample of 18 nurses in…

The measurement of competency in nursing is critical to ensure safe and effective care of patients. This study had two purposes. First, the psychometric characteristics of the Nursing Performance Profile (NPP), an instrument used to measure nursing competency, were evaluated using generalizability theory and a sample of 18 nurses in the Measuring Competency with Simulation (MCWS) Phase I dataset. The relative magnitudes of various error sources and their interactions were estimated in a generalizability study involving a fully crossed, three-facet random design with nurse participants as the object of measurement and scenarios, raters, and items as the three facets. A design corresponding to that of the MCWS Phase I data--involving three scenarios, three raters, and 41 items--showed nurse participants contributed the greatest proportion to total variance (50.00%), followed, in decreasing magnitude, by: rater (19.40%), the two-way participant x scenario interaction (12.93%), and the two-way participant x rater interaction (8.62%). The generalizability (G) coefficient was .65 and the dependability coefficient was .50. In decision study designs minimizing number of scenarios, the desired generalizability coefficients of .70 and .80 were reached at three scenarios with five raters, and five scenarios with nine raters, respectively. In designs minimizing number of raters, G coefficients of .72 and .80 were reached at three raters and five scenarios and four raters and nine scenarios, respectively. A dependability coefficient of .71 was attained with six scenarios and nine raters or seven raters and nine scenarios. Achieving high reliability with designs involving fewer raters may be possible with enhanced rater training to decrease variance components for rater main and interaction effects. The second part of this study involved the design and implementation of a validation process for evidence-based human patient simulation scenarios in assessment of nursing competency. A team of experts validated the new scenario using a modified Delphi technique, involving three rounds of iterative feedback and revisions. In tandem, the psychometric study of the NPP and the development of a validation process for human patient simulation scenarios both advance and encourage best practices for studying the validity of simulation-based assessments.

ContributorsO'Brien, Janet Elaine (Author) / Thompson, Marilyn (Thesis advisor) / Hagler, Debra (Thesis advisor) / Green, Samuel (Committee member) / Arizona State University (Publisher)

Created2014

The factor structure of the English language development assessment: a confirmatory factor analysis

Description

This study investigated the internal factor structure of the English language development Assessment (ELDA) using confirmatory factor analysis. ELDA is an English language proficiency test developed by a consortium of multiple states and is used to identify and reclassify English language learners in kindergarten to grade 12. Scores on item…

This study investigated the internal factor structure of the English language development Assessment (ELDA) using confirmatory factor analysis. ELDA is an English language proficiency test developed by a consortium of multiple states and is used to identify and reclassify English language learners in kindergarten to grade 12. Scores on item parcels based on the standards tested from the four domains of reading, writing, listening, and speaking were used for the analyses. Five different factor models were tested: a single factor model, a correlated two-factor model, a correlated four-factor model, a second-order factor model and a bifactor model. The results indicate that the four-factor model, second-order model, and bifactor model fit the data well. The four-factor model hypothesized constructs for reading, writing, listening and speaking. The second-order model hypothesized a second-order English language proficiency factor as well as the four lower-order factors of reading, writing, listening and speaking. The bifactor model hypothesized a general English language proficiency factor as well as the four domain specific factors of reading, writing, listening, and speaking. The Chi-square difference tests indicated that the bifactor model best explains the factor structure of the ELDA. The results from this study are consistent with the findings in the literature about the multifactorial nature of language but differ from the conclusion about the factor structures reported in previous studies. The overall proficiency levels on the ELDA gives more weight to the reading and writing sections of the test than the speaking and listening sections. This study has implications on the rules used for determining proficiency levels and recommends the use of conjunctive scoring where all constructs are weighted equally contrary to current practice.

ContributorsKuriakose, Anju Susan (Author) / Macswan, Jeff (Thesis advisor) / Haladyna, Thomas (Thesis advisor) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)

Created2011

The factor structure of curriculum-based writing indices at Grades 3, 7, and 10

Description

National assessment data indicate that the large majority of students in America perform below expected proficiency levels in the area of writing. Given the importance of writing skills, this is a significant problem. Curriculum-based measurement, when used for progress monitoring and intervention planning, has been shown to lead to improved…

National assessment data indicate that the large majority of students in America perform below expected proficiency levels in the area of writing. Given the importance of writing skills, this is a significant problem. Curriculum-based measurement, when used for progress monitoring and intervention planning, has been shown to lead to improved academic achievement. However, researchers have not yet been able to establish the validity of curriculum-based measures of writing (CBM-W). This study examined the structural validity of CBM-W using exploratory factor analysis. The participants for this study were 253 third, 154 seventh, and 154 tenth grade students. Each participant completed a 3-minute writing sample in response to a narrative prompt. The writing samples were scored for fifteen different CBM-W indices. Separate analyses were conducted for each grade level to examine differences in the CBM-W construct across grade levels. Due to extreme multicollinearity, principal components analysis rather than common factor analysis was used to examine the structure of writing as measured by CBM-W indices. The overall structure of CBM-W indices was found to remain stable across grade levels. In all cases a three-component solution was supported, with the components being labeled production, accuracy, and sentence complexity. Limitations of the study and implications for progress monitoring with CBM-W are discussed, including the recommendation for a combination of variables that may provide more reliable and valid measurement of the writing construct.

ContributorsBrown, Alec Judd (Author) / Watkins, Marley (Thesis advisor) / Caterino, Linda (Thesis advisor) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)

Created2012

Assessing measurement invariance and latent mean differences with bifactor multidimensional data in structural equation modeling

Description

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of…

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of mismatch in dimensionality between data and analysis models with multiple-group analyses at the population and sample levels. Datasets were generated using a bifactor model with different factor structures and were analyzed with bifactor and single-factor models to assess misspecification effects on assessments of MI and latent mean differences. As baseline models, the bifactor models fit data well and had minimal bias in latent mean estimation. However, the low convergence rates of fitting bifactor models to data with complex structures and small sample sizes caused concern. On the other hand, effects of fitting the misspecified single-factor models on the assessments of MI and latent means differed by the bifactor structures underlying data. For data following one general factor and one group factor affecting a small set of indicators, the effects of ignoring the group factor in analysis models on the tests of MI and latent mean differences were mild. In contrast, for data following one general factor and several group factors, oversimplifications of analysis models can lead to inaccurate conclusions regarding MI assessment and latent mean estimation.

ContributorsXu, Yuning (Author) / Green, Samuel (Thesis advisor) / Levy, Roy (Committee member) / Thompson, Marilyn (Committee member) / Arizona State University (Publisher)

Created2018

Model criticism for growth curve models via posterior predictive model checking

Description

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance…

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance of several discrepancy functions was investigated in a Monte Carlo simulation study. The discrepancy functions of interest included two types of conditional concordance correlation (CCC) functions, two types of R2 functions, two types of standardized generalized dimensionality discrepancy (SGDDM) functions, the likelihood ratio (LR), and the likelihood ratio difference test (LRT). Key outcomes included effect sizes of the design factors on the realized values of discrepancy functions, distributions of posterior predictive p-values (PPP-values), and the proportion of extreme PPP-values.

In terms of the realized values, the behavior of the CCC and R2 functions were generally consistent with prior research. However, as diagnostics, these functions were extremely conservative even when some aspect of the data was unaccounted for. In contrast, the conditional SGDDM (SGDDMC), LR, and LRT were generally sensitive to the underspecifications investigated in this work on all outcomes considered. Although the proportions of extreme PPP-values for these functions tended to increase in null situations for non-normal data, this behavior may have reflected the true misfit that resulted from the specification of normal prior distributions. Importantly, the LR and the SGDDMC to a greater extent exhibited some potential for untangling the sources of data-model misfit. Owing to connections of growth curve models to the more fundamental frameworks of multilevel modeling, structural equation models with a mean structure, and Bayesian hierarchical models, the results of the current work may have broader implications that warrant further research.

ContributorsFay, Derek (Author) / Levy, Roy (Thesis advisor) / Thompson, Marilyn (Committee member) / Enders, Craig (Committee member) / Arizona State University (Publisher)

Created2015

Robustness of Latent variable interaction methods to nonnormal exogenous indicators

Description

For this thesis a Monte Carlo simulation was conducted to investigate the robustness of three latent interaction modeling approaches (constrained product indicator, generalized appended product indicator (GAPI), and latent moderated structural equations (LMS)) under high degrees of nonnormality of the exogenous indicators, which have not been investigated in previous literature.…

For this thesis a Monte Carlo simulation was conducted to investigate the robustness of three latent interaction modeling approaches (constrained product indicator, generalized appended product indicator (GAPI), and latent moderated structural equations (LMS)) under high degrees of nonnormality of the exogenous indicators, which have not been investigated in previous literature. Results showed that the constrained product indicator and LMS approaches yielded biased estimates of the interaction effect when the exogenous indicators were highly nonnormal. When the violation of nonnormality was not severe (symmetric with excess kurtosis < 1), the LMS approach with ML estimation yielded the most precise latent interaction effect estimates. The LMS approach with ML estimation also had the highest statistical power among the three approaches, given that the actual Type-I error rates of the Wald and likelihood ratio test of interaction effect were acceptable. In highly nonnormal conditions, only the GAPI approach with ML estimation yielded unbiased latent interaction effect estimates, with an acceptable actual Type-I error rate of both the Wald test and likelihood ratio test of interaction effect. No support for the use of the Satorra-Bentler or Yuan-Bentler ML corrections was found across all three methods.

ContributorsCham, Hei Ning (Author) / West, Stephen G. (Thesis advisor) / Aiken, Leona S. (Committee member) / Enders, Craig K. (Committee member) / Arizona State University (Publisher)

Created2010

Cascade Model of Executive Functioning, Prosocial Skills, and Academic Achievement

Description

Social Emotional Learning (SEL) programs abound in schools worldwide, adopted in large part on limited and varied evidence that the social/SEL skills acquired in these programs contribute to academic achievement. However, large-scale studies with the most common SEL program in the United States (Second Step®) have yielded no evidence of…

Social Emotional Learning (SEL) programs abound in schools worldwide, adopted in large part on limited and varied evidence that the social/SEL skills acquired in these programs contribute to academic achievement. However, large-scale studies with the most common SEL program in the United States (Second Step®) have yielded no evidence of academic benefits, despite revisions to the Second Step® measure (i.e., DESSA – SSE) to include “skills for learning” (i.e., executive functioning skills). The dearth of academic effects could reflect programmatic or measurement flaws. The purpose of this paper is to explore the latter and unpack the core “inputs” of Second Step® to determine whether the social-emotional or executive functioning components may be differently related to academic achievement. Such questions have important implications for evaluating program theory/logic and for the SEL field more broadly. The current study addresses this broader aim by assessing the longitudinal, bi-directional relationship among Executive Functioning, Prosocial Skills (as a proxy for SEL skills), and academic achievement in Kindergarten and Grade 1 students (N = 3,029) from rural and urban schools (N = 61). Widely utilized curriculum-based measures of reading and math tests were administered directly to students to assess academic achievement, while teachers reported on students’ Prosocial Skills using an established measure. A bi-factorial measure of executive functioning was derived from exploratory and confirmatory factor analyses from teacher-reported rating scale data. Results based on autoregressive cross-lagged panel model using accelerated longitudinal design lend some support for a longitudinal bidirectional relationship between the executive functioning components of shifting and emotional regulation (EF 2) and Prosocial Skills. Furthermore, while results support extant research that the executive functioning components of working memory, planning, and problem solving (EF 1) positively predict academic achievement, the executive functioning components of shifting and emotional regulation (EF 2) and Prosocial Skills are not meaningful nor consistent predictors of academic achievement. Implications and limitations are discussed.

ContributorsDesfosses, Danielle (Author) / Low, Sabina (Thesis advisor) / Thompson, Marilyn (Committee member) / Grimm, Kevin (Committee member) / Swanson, Jodi (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by