Matching Items (4)

152220-Thumbnail Image.png

A continuous latent factor model for non-ignorable missing data in longitudinal studies

Description

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.

Contributors

Agent

Created

Date Created
  • 2013

153224-Thumbnail Image.png

Experimental designs for generalized linear models and functional magnetic resonance imaging

Description

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact designs for GLMs, that are robust against parameter, link and model

uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm

that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.

Contributors

Agent

Created

Date Created
  • 2014

153850-Thumbnail Image.png

A test and confidence set for comparing the location of quadratic growth curves

Description

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane.

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and y-coordinates of the vertex using two methods, the gradient method and the delta method. Under some models, an indirect test on the location of the curve can be based on the intercept and slope parameters, but in other models, a direct test on the vertex is required. We present a quadratic-form statistic for a test of the null hypothesis that there is no shift in the location of the vertex in a linear mixed model. The statistic has an asymptotic chi-squared distribution. For 2nd degree polynomials of two independent samples, we present an approximate confidence region for the difference of vertices of two quadratic growth curves using the modified gradient method and delta method. Another chi-square test statistic is derived for a direct test on the vertex and is compared to an F test statistic for the indirect test. Power functions are derived for both the indirect F test and the direct chi-square test. We calculate the theoretical power and present a simulation study to investigate the power of the tests. We also present a simulation study to assess the influence of sample size, measurement occasions and nature of the random effects. The test statistics will be applied to the Tell Efficacy longitudinal study, in which sound identification scores and language protocol scores for children are modeled as quadratic growth curves for two independent groups, TELL and control curriculum. The interpretation of shift in the location of the vertices is also presented.

Contributors

Agent

Created

Date Created
  • 2015

155335-Thumbnail Image.png

Functional traits affecting photosynthesis, growth, and mortality of trees inferred from a field study and simulation experiments

Description

Functional traits research has improved our understanding of how plants respond to their environments, identifying key trade-offs among traits. These studies primarily rely on correlative methods to infer trade-offs and

Functional traits research has improved our understanding of how plants respond to their environments, identifying key trade-offs among traits. These studies primarily rely on correlative methods to infer trade-offs and often overlook traits that are difficult to measure (e.g., root traits, tissue senescence rates), limiting their predictive ability under novel conditions. I aimed to address these limitations and develop a better understanding of the trait space occupied by trees by integrating data and process models, spanning leaves to whole-trees, via modern statistical and computational methods. My first research chapter (Chapter 2) simultaneously fits a photosynthesis model to measurements of fluorescence and photosynthetic response curves, improving estimates of mesophyll conductance (gm) and other photosynthetic traits. I assessed how gm varies across environmental gradients and relates to other photosynthetic traits for 4 woody species in Arizona. I found that gm was lower at high aridity sites, varied little within a site, and is an important trait for obtaining accurate estimates of photosynthesis and related traits under dry conditions. Chapter 3 evaluates the importance of functional traits for whole-tree performance by fitting an individual-based model of tree growth and mortality to millions of measurements of tree heights and diameters to assess the theoretical trait space (TTS) of “healthy” North American trees. The TTS contained complicated, multi-variate structure indicative of potential trade-offs leading to successful growth. In Chapter 4, I applied an environmental filter (light stress) to the TTS, leading to simulated stand-level mortality rates up to 50%. Tree-level mortality was explained by 6 of the 32 traits explored, with the most important being radiation-use efficiency. The multidimentional space comprising these 6 traits differed in volume and location between trees that survived and died, indicating that selective mortality alters the TTS.

Contributors

Agent

Created

Date Created
  • 2017