Search Content

Mediation as a novel method for increasing statistical power

Description

Including a covariate can increase power to detect an effect between two variables. Although previous research has studied power in mediation models, the extent to which the inclusion of a mediator will increase the power to detect a relation between two variables has not been investigated. The first study identified…

Including a covariate can increase power to detect an effect between two variables. Although previous research has studied power in mediation models, the extent to which the inclusion of a mediator will increase the power to detect a relation between two variables has not been investigated. The first study identified situations where empirical and analytical power of two tests of significance for a single mediator model was greater than power of a bivariate significance test. Results from the first study indicated that including a mediator increased statistical power in small samples with large effects and in large samples with small effects. Next, a study was conducted to assess when power was greater for a significance test for a two mediator model as compared with power of a bivariate significance test. Results indicated that including two mediators increased power in small samples when both specific mediated effects were large and in large samples when both specific mediated effects were small. Implications of the results and directions for future research are then discussed.

ContributorsO'Rourke, Holly Patricia (Author) / Mackinnon, David P (Thesis advisor) / Enders, Craig K. (Committee member) / Millsap, Roger (Committee member) / Arizona State University (Publisher)

Created2013

A continuous latent factor model for non-ignorable missing data in longitudinal studies

Description

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment…

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.

ContributorsZhang, Jun (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St Louis, Robert D. (Committee member) / Arizona State University (Publisher)

Created2013

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

Description

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’…

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions.

ContributorsValdivia, Arturo (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Broatch, Jennifer (Committee member) / Arizona State University (Publisher)

Created2013

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The…

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

ContributorsCham, Hei Ning (Author) / Tein, Jenn-Yun (Thesis advisor) / Enders, Stephen G (Thesis advisor) / Enders, Craig K. (Committee member) / Mackinnon, David P (Committee member) / Arizona State University (Publisher)

Created2013

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs…

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

ContributorsIsmay, Chester (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Kao, Ming-Hung (Committee member) / Lanchier, Nicolas (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2013

Daily diary data: effects of cycles on inferences

Description

Daily dairies and other intensive measurement methods are increasingly used to study the relationships between two time varying variables X and Y. These data are commonly analyzed using longitudinal multilevel or bivariate growth curve models that allow for random effects of intercept (and sometimes also slope) but which do not…

Daily dairies and other intensive measurement methods are increasingly used to study the relationships between two time varying variables X and Y. These data are commonly analyzed using longitudinal multilevel or bivariate growth curve models that allow for random effects of intercept (and sometimes also slope) but which do not address the effects of weekly cycles in the data. Three Monte Carlo studies investigated the impact of omitting the weekly cycles in daily dairy data under the multilevel model framework. In cases where cycles existed in both the time-varying predictor series (X) and the time-varying outcome series (Y) but were ignored, the effects of the within- and between-person components of X on Y tended to be biased, as were their corresponding standard errors. The direction and magnitude of the bias depended on the phase difference between the cycles in the two series. In cases where cycles existed in only one series but were ignored, the standard errors of the regression coefficients for the within- and between-person components of X tended to be biased, and the direction and magnitude of bias depended on which series contained cyclical components.

ContributorsLiu, Yu (Author) / West, Stephen G. (Thesis advisor) / Enders, Craig K. (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2013

Connecting pain intensity to work goal and lifestyle goal progress: examining mediation and moderation using multi-level modeling

Description

The present study examined the association of pain intensity and goal progress in a community sample of 132 adults with chronic pain who participated in a 21 day diary study. Multilevel modeling was employed to investigate the effect of morning pain intensity on evening goal progress as mediated by pain's…

The present study examined the association of pain intensity and goal progress in a community sample of 132 adults with chronic pain who participated in a 21 day diary study. Multilevel modeling was employed to investigate the effect of morning pain intensity on evening goal progress as mediated by pain's interference with afternoon goal pursuit. Moderation effects of pain acceptance and pain catastrophizing on the associations between pain and interference with both work and lifestyle goal pursuit were also tested. The results showed that the relationship between morning pain and pain's interference with work goal pursuit in the afternoon was significantly moderated by a pain acceptance. In addition, it was found that the mediated effect differed across levels of pain acceptance; that is: (1) there was a significant mediation effect when pain acceptance was at its mean and one standard deviation below the mean; but (2) there was no mediation effect when pain acceptance was one standard deviation above the mean. It appears that high pain acceptance significantly attenuates the power of nociception in disrupting one's work goal pursuit. However, in the lifestyle goal model, none of the moderators were significant nor was there a significant association between pain interference with goal pursuit and goal progress. Only morning pain intensity significantly predicted afternoon interference with lifestyle goal pursuit. Further interpretation of the present findings and potential explanations of those inconsistencies are elaborated on discussion. Limitations and the clinical implication of the current study were considered, along with suggestions for future studies.

ContributorsMun, Chung Jung (Author) / Karoly, Paul (Thesis advisor) / Okun, Morris A. (Committee member) / Enders, Craig K. (Committee member) / Arizona State University (Publisher)

Created2014

Father involvement in Mexican American families

Description

Research demonstrating the importance of the paternal role has been largely conducted using samples of Caucasian men, leaving a gap in what is known about fathering in minority cultures. Family systems theories highlight the dynamic interrelations between familial roles and relationships, and suggest that comprehensive studies of fathering require attention…

Research demonstrating the importance of the paternal role has been largely conducted using samples of Caucasian men, leaving a gap in what is known about fathering in minority cultures. Family systems theories highlight the dynamic interrelations between familial roles and relationships, and suggest that comprehensive studies of fathering require attention to the broad family and cultural context. During the early infancy period, mothers' and fathers' postpartum adjustment may represent a critical source of influence on father involvement. For the current study, Mexican American (MA) women (N = 125) and a subset of their romantic partners/biological fathers (N = 57) reported on their depressive symptoms and levels of father involvement (paternal engagement, accessibility, and responsibility) during the postpartum period. Descriptive analyses suggested that fathers are involved in meaningful levels of care during infancy. Greater paternal postpartum depression (PPD) was associated with lower levels of father involvement. Maternal PPD interacted with paternal gender role attitudes to predict father involvement. At higher levels of maternal PPD, involvement increased among fathers adhering to less segregated gender role attitudes and decreased among fathers who endorsed more segregated gender role attitudes. Within select models, differences in the relations were observed between mothers' and fathers' reports of paternal involvement. Results bring attention to the importance of examining contextual influences on early fathering in MA families and highlight the unique information that may be gathered from separate maternal and paternal reports of father involvement.

ContributorsRoubinov, Danielle S (Author) / Luecken, Linda J. (Thesis advisor) / Crnic, Keith A (Committee member) / Enders, Craig K. (Committee member) / Gonzales, Nancy A. (Committee member) / Arizona State University (Publisher)

Created2014

Robust experimental designs for fMRI with an uncertain design matrix

Description

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and…

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also on the experimental subject's probabilistic behavior during the experiment. The design matrix is thus uncertain at the design stage, making it diffcult to select good designs. By taking this uncertainty into account, a very efficient approach for obtaining high-quality fMRI designs is developed in this study. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that the proposed approach can outperform an existing method in terms of computing time, and the quality of the obtained designs.

ContributorsZhou, Lin (Author) / Kao, Ming-Hung (Thesis advisor) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Welfert, Bruno (Committee member) / Arizona State University (Publisher)

Created2014

Multilevel multiple imputation: an examination of competing methods

Description

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS)…

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers.

ContributorsMistler, Stephen (Author) / Enders, Craig K. (Thesis advisor) / Aiken, Leona (Committee member) / Levy, Roy (Committee member) / West, Stephen G. (Committee member) / Arizona State University (Publisher)

Created2015