Matching Items (3)
Filtering by

Clear all filters

151957-Thumbnail Image.png
Description
Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.
ContributorsCham, Hei Ning (Author) / Tein, Jenn-Yun (Thesis advisor) / Enders, Stephen G (Thesis advisor) / Enders, Craig K. (Committee member) / Mackinnon, David P (Committee member) / Arizona State University (Publisher)
Created2013
151501-Thumbnail Image.png
Description
Daily dairies and other intensive measurement methods are increasingly used to study the relationships between two time varying variables X and Y. These data are commonly analyzed using longitudinal multilevel or bivariate growth curve models that allow for random effects of intercept (and sometimes also slope) but which do not

Daily dairies and other intensive measurement methods are increasingly used to study the relationships between two time varying variables X and Y. These data are commonly analyzed using longitudinal multilevel or bivariate growth curve models that allow for random effects of intercept (and sometimes also slope) but which do not address the effects of weekly cycles in the data. Three Monte Carlo studies investigated the impact of omitting the weekly cycles in daily dairy data under the multilevel model framework. In cases where cycles existed in both the time-varying predictor series (X) and the time-varying outcome series (Y) but were ignored, the effects of the within- and between-person components of X on Y tended to be biased, as were their corresponding standard errors. The direction and magnitude of the bias depended on the phase difference between the cycles in the two series. In cases where cycles existed in only one series but were ignored, the standard errors of the regression coefficients for the within- and between-person components of X tended to be biased, and the direction and magnitude of bias depended on which series contained cyclical components.
ContributorsLiu, Yu (Author) / West, Stephen G. (Thesis advisor) / Enders, Craig K. (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2013
150618-Thumbnail Image.png
Description
Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.
ContributorsCoxe, Stefany (Author) / Aiken, Leona S. (Thesis advisor) / West, Stephen G. (Thesis advisor) / Mackinnon, David P (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2012