In the first part of this dissertation, a theoretical result was developed to facilitate the search of locally symmetric optimal designs for mixed responses models with one continuous covariate. Then, the study was extended to mixed responses models that include group effects. Two types of mixed responses models with group effects were investigated. The first type includes models having no common parameters across subject group, and the second type of models allows some common parameters (e.g., a common slope) across groups. In addition to complete class results, an efficient algorithm (PSO-FM) was proposed to search for the A- and D-optimal designs. Finally, the first-order mixed responses model is extended to a type of a quadratic mixed responses model with a quadratic polynomial predictor placed in its linear model.
In the first paper, an alternative approach to the partitioned Generalized Method of Moments logistic regression model for longitudinal binary outcomes is presented. This method relies on Bayes estimators and is utilized when the partitioned Generalized Method of Moments model provides numerically unstable estimates of the regression coefficients. It is used to model obesity status in the Add Health study and cognitive impairment diagnosis in the National Alzheimer’s Coordination Center database.
The second paper develops a model that allows the joint modeling of two or more binary outcomes that provide an overall measure of a subject’s trait over time. The simultaneous modelling of all outcomes provides a complete picture of the overall measure of interest. This approach accounts for the correlation among and between the outcomes across time and the changing effects of time-dependent covariates on the outcomes. The model is used to analyze four outcomes measuring overall the quality of life in the Chinese Longitudinal Healthy Longevity Study.
The third paper presents an approach that allows for estimation of cross-sectional and lagged effects of the covariates on the outcome as well as the feedback of the response on future covariates. This is done in two-parts, in part-1, the effects of time-dependent covariates on the outcomes are estimated, then, in part-2, the outcome influences on future values of the covariates are measured. These model parameters are obtained through a Generalized Method of Moments procedure that uses valid moment conditions between the outcome and the covariates. Child morbidity in the Philippines and obesity status in the Add Health data are analyzed.
Under sparsity, the structure of DSDs can allow for the screening and optimization of a system in one step, but in non-sparse situations estimation of second-order models requires augmentation of the DSD. In this work, augmentation strategies for DSDs were considered, given the assumption that the correct form of the model for the response of interest is quadratic. Series of augmented designs were constructed and explored, and power calculations, model-robustness criteria, model-discrimination criteria, and simulation study results were used to identify the number of augmented runs necessary for (1) effectively identifying active model effects, and (2) precisely predicting a response of interest. When the goal is identification of active effects, it is shown that supersaturated designs are sufficient; when the goal is prediction, it is shown that little is gained by augmenting beyond the design that is saturated for the full quadratic model. Surprisingly, augmentation strategies based on the I-optimality criterion do not lead to better predictions than strategies based on the D-optimality criterion.
Computational limitations can render standard statistical methods infeasible in the face of massive datasets, necessitating subsampling strategies. In the big data context, the primary objective is often prediction but the correct form of the model for the response of interest is likely unknown. Here, two new methods of subdata selection were proposed. The first is based on clustering, the second is based on space-filling designs, and both are free from model assumptions. The performance of the proposed methods was explored visually via low-dimensional simulated examples; via real data applications; and via large simulation studies. In all cases the proposed methods were compared to existing, widely used subdata selection methods. The conditions under which the proposed methods provide advantages over standard subdata selection strategies were identified.
We propose a novel, efficient approach for obtaining high-quality experimental designs for event-related functional magnetic resonance imaging (ER-fMRI), a popular brain mapping technique. Our proposed approach combines a greedy hill-climbing algorithm and a cyclic permutation method. When searching for optimal ER-fMRI designs, the proposed approach focuses only on a promising restricted class of designs with equal frequency of occurrence across stimulus types. The computational time is significantly reduced. We demonstrate that our proposed approach is very efficient compared with a recently proposed genetic algorithm approach. We also apply our approach in obtaining designs that are robust against misspecification of error correlations.
K-shuff is a new algorithm for comparing the similarity of gene sequence libraries, providing measures of the structural and compositional diversity as well as the significance of the differences between these measures. Inspired by Ripley’s K-function for spatial point pattern analysis, the Intra K-function or IKF measures the structural diversity, including both the richness and overall similarity of the sequences, within a library. The Cross K-function or CKF measures the compositional diversity between gene libraries, reflecting both the number of OTUs shared as well as the overall similarity in OTUs. A Monte Carlo testing procedure then enables statistical evaluation of both the structural and compositional diversity between gene libraries. For 16S rRNA gene libraries from complex bacterial communities such as those found in seawater, salt marsh sediments, and soils, K-shuff yields reproducible estimates of structural and compositional diversity with libraries greater than 50 sequences. Similarly, for pyrosequencing libraries generated from a glacial retreat chronosequence and Illumina® libraries generated from US homes, K-shuff required >300 and 100 sequences per sample, respectively. Power analyses demonstrated that K-shuff is sensitive to small differences in Sanger or Illumina® libraries. This extra sensitivity of K-shuff enabled examination of compositional differences at much deeper taxonomic levels, such as within abundant OTUs. This is especially useful when comparing communities that are compositionally very similar but functionally different. K-shuff will therefore prove beneficial for conventional microbiome analysis as well as specific hypothesis testing.
Study Region: 43 rivers in Spain with measurement stations for air and water temperatures.
Study Focus: River water temperatures influence aquatic ecosystem dynamics. This work aims to develop transferable river temperature forecasting models, which are not confined to sites with historical measurements of air and water temperatures. For that purpose, we estimate nonlinear mixed models (NLMM), which are based on site-specific time-series models and account for seasonality and S-shaped air-to-water temperature associations. A detailed evaluation of the short-term forecasting performance of both NLMM and site-specific models is undertaken. Measurements from 31 measurement sites were used to estimate model parameters whereas data from 12 additional sites were used solely for the evaluation of NLMM.
New Hydrological Insights for the Region: Mixed models achieve levels of accuracy analogous to linear site-specific time-series regressions. Nonlinear site-specific models attain 1-day ahead forecasting accuracy close to 1 °C in terms of mean absolute error (MAE) and root mean square error (RMSE). Our results may facilitate adaptive management of freshwater resources in Spain in accordance with European water policy directives.