Matching Items (11)
Filtering by

Clear all filters

156371-Thumbnail Image.png
Description
Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection.

Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among each other and there is no restriction on the possible values of the continuous covariate. The locally D-optimal design structures for such models are identified and results for obtaining smaller optimal designs using orthogonal arrays (OAs) are presented. Topic II considers GLMs with multiple covariates under the assumptions that all but one covariate are bounded within specified intervals and interaction effects among those bounded covariates may also exist. An explicit formula for D-optimal designs is derived and OA-based smaller D-optimal designs for models with one or two two-factor interactions are also constructed. Topic III considers multiple-covariate logistic models. All covariates are nonnegative and there is no interaction among them. Two types of D-optimal design structures are identified and their global D-optimality is proved using the celebrated equivalence theorem.
ContributorsWang, Zhongsheng (Author) / Stufken, John (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)
Created2018
156163-Thumbnail Image.png
Description
In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, the adjusted generalized quasi-likelihood model shows improved performance for random effect estimates. In addition, submodels to address deviation in skewness and kurtosis are provided to jointly model the mean, variance, skewness, and kurtosis. The additional models identify covariates influencing the third and fourth moments. A cutoff to trim the data is provided which improves parameter estimation and model fit. For each topic, findings are demonstrated through comprehensive simulation studies and numerical examples. Examples evaluated include data on children’s morbidity in the Philippines, adolescent health from the National Longitudinal Study of Adolescent to Adult Health, as well as proteomic assays for breast cancer screening.
ContributorsIrimata, Katherine E (Author) / Wilson, Jeffrey R (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Arizona State University (Publisher)
Created2018
155789-Thumbnail Image.png
Description
Functional magnetic resonance imaging (fMRI) is one of the popular tools to study human brain functions. High-quality experimental designs are crucial to the success of fMRI experiments as they allow the collection of informative data for making precise and valid inference with minimum cost. The primary goal of this study

Functional magnetic resonance imaging (fMRI) is one of the popular tools to study human brain functions. High-quality experimental designs are crucial to the success of fMRI experiments as they allow the collection of informative data for making precise and valid inference with minimum cost. The primary goal of this study is on identifying the best sequence of mental stimuli (i.e. fMRI design) with respect to some statistically meaningful optimality criteria. This work focuses on two related topics in this research field. The first topic is on finding optimal designs for fMRI when the design matrix is uncertain. This challenging design issue occurs in many modern fMRI experiments, in which the design matrix of the statistical model depends on both the selected design and the experimental subject's uncertain behavior during the experiment. As a result, the design matrix cannot be fully determined at the design stage that makes it difficult to select a good design. For the commonly used linear model with autoregressive errors, this study proposes a very efficient approach for obtaining high-quality fMRI designs for such experiments. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that our proposed approach can outperform the existing method in terms of computing time, and the quality of the obtained designs. The second topic of the research is to find optimal designs for fMRI when a wavelet-based technique is considered in the fMRI data analysis. An efficient computer algorithm to search for optimal fMRI designs for such cases is developed. This algorithm is inspired by simulated annealing and a recently proposed algorithm by Saleh et al. (2017). As demonstrated in the case studies, the proposed approach makes it possible to efficiently obtain high-quality designs for fMRI studies, and is practically useful.
ContributorsZhou, Lin (Author) / Kao, Ming-Hung (Thesis advisor) / Welfert, Bruno (Thesis advisor) / Jackiewicz, Zdzislaw (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Taylor, Jesse Earl (Committee member) / Arizona State University (Publisher)
Created2017
155868-Thumbnail Image.png
Description
This study concerns optimal designs for experiments where responses consist of both binary and continuous variables. Many experiments in engineering, medical studies, and other fields have such mixed responses. Although in recent decades several statistical methods have been developed for jointly modeling both types of response variables, an effective way

This study concerns optimal designs for experiments where responses consist of both binary and continuous variables. Many experiments in engineering, medical studies, and other fields have such mixed responses. Although in recent decades several statistical methods have been developed for jointly modeling both types of response variables, an effective way to design such experiments remains unclear. To address this void, some useful results are developed to guide the selection of optimal experimental designs in such studies. The results are mainly built upon a powerful tool called the complete class approach and a nonlinear optimization algorithm. The complete class approach was originally developed for a univariate response, but it is extended to the case of bivariate responses of mixed variable types. Consequently, the number of candidate designs are significantly reduced. An optimization algorithm is then applied to efficiently search the small class of candidate designs for the D- and A-optimal designs. Furthermore, the optimality of the obtained designs is verified by the general equivalence theorem. In the first part of the study, the focus is on a simple, first-order model. The study is expanded to a model with a quadratic polynomial predictor. The obtained designs can help to render a precise statistical inference in practice or serve as a benchmark for evaluating the quality of other designs.
ContributorsKim, Soohyun (Author) / Kao, Ming-Hung (Thesis advisor) / Dueck, Amylou (Committee member) / Pan, Rong (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Arizona State University (Publisher)
Created2017
155445-Thumbnail Image.png
Description
The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on two-way subtables. However, due to variables with a large number of categories and small sample size, even the GFfit statistic may have low power and inaccurate Type I error level due to sparseness in the two-way subtable. In this dissertation, the theoretical power and empirical power of the GFfit statistic are studied. A method based on subsets of orthogonal components for the GFfit statistic on the subtables is developed to improve the performance of the GFfit statistic. Simulation results for power and type I error rate for several different cases along with comparisons to other diagnostics are presented.
ContributorsZhu, Junfei (Author) / Reiser, Mark R. (Thesis advisor) / Stufken, John (Committee member) / Zheng, Yi (Committee member) / St Louis, Robert (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)
Created2017
171508-Thumbnail Image.png
Description
Longitudinal data involving multiple subjects is quite popular in medical and social science areas. I consider generalized linear mixed models (GLMMs) applied to such longitudinal data, and the optimal design searching problem under such models. In this case, based on optimal design theory, the optimality criteria depend on the estimated

Longitudinal data involving multiple subjects is quite popular in medical and social science areas. I consider generalized linear mixed models (GLMMs) applied to such longitudinal data, and the optimal design searching problem under such models. In this case, based on optimal design theory, the optimality criteria depend on the estimated parameters, which leads to local optimality. Moreover, the information matrix under a GLMM doesn't have a closed-form expression. My dissertation includes three topics related to this design problem. The first part is searching for locally optimal designs under GLMMs with longitudinal data. I apply penalized quasi-likelihood (PQL) method to approximate the information matrix and compare several approximations to show the superiority of PQL over other approximations. Under different local parameters and design restrictions, locally D- and A- optimal designs are constructed based on the approximation. An interesting finding is that locally optimal designs sometimes apply different designs to different subjects. Finally, the robustness of these locally optimal designs is discussed. In the second part, an unknown observational covariate is added to the previous model. With an unknown observational variable in the experiment, expected optimality criteria are considered. Under different assumptions of the unknown variable and parameter settings, locally optimal designs are constructed and discussed. In the last part, Bayesian optimal designs are considered under logistic mixed models. Considering different priors of the local parameters, Bayesian optimal designs are generated. Bayesian design under such a model is usually expensive in time. The running time in this dissertation is optimized to an acceptable amount with accurate results. I also discuss the robustness of these Bayesian optimal designs, which is the motivation of applying such an approach.
ContributorsShi, Yao (Author) / Stufken, John (Thesis advisor) / Kao, Ming-Hung (Thesis advisor) / Lan, Shiwei (Committee member) / Pan, Rong (Committee member) / Reiser, Mark (Committee member) / Arizona State University (Publisher)
Created2022
157719-Thumbnail Image.png
Description
Functional brain imaging experiments are widely conducted in many fields for study- ing the underlying brain activity in response to mental stimuli. For such experiments, it is crucial to select a good sequence of mental stimuli that allow researchers to collect informative data for making precise and valid statistical inferences

Functional brain imaging experiments are widely conducted in many fields for study- ing the underlying brain activity in response to mental stimuli. For such experiments, it is crucial to select a good sequence of mental stimuli that allow researchers to collect informative data for making precise and valid statistical inferences at minimum cost. In contrast to most existing studies, the aim of this study is to obtain optimal designs for brain mapping technology with an ultra-high temporal resolution with respect to some common statistical optimality criteria. The first topic of this work is on finding optimal designs when the primary interest is in estimating the Hemodynamic Response Function (HRF), a function of time describing the effect of a mental stimulus to the brain. A major challenge here is that the design matrix of the statistical model is greatly enlarged. As a result, it is very difficult, if not infeasible, to compute and compare the statistical efficiencies of competing designs. For tackling this issue, an efficient approach is built on subsampling the design matrix and the use of an efficient computer algorithm is proposed. It is demonstrated through the analytical and simulation results that the proposed approach can outperform the existing methods in terms of computing time, and the quality of the obtained designs. The second topic of this work is to find optimal designs when another set of popularly used basis functions is considered for modeling the HRF, e.g., to detect brain activations. Although the statistical model for analyzing the data remains linear, the parametric functions of interest under this setting are often nonlinear. The quality of the de- sign will then depend on the true value of some unknown parameters. To address this issue, the maximin approach is considered to identify designs that maximize the relative efficiencies over the parameter space. As shown in the case studies, these maximin designs yield high performance for detecting brain activation compared to the traditional designs that are widely used in practice.
ContributorsAlghamdi, Reem (Author) / Kao, Ming-Hung (Thesis advisor) / Fricks, John (Committee member) / Pan, Rong (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Arizona State University (Publisher)
Created2019
157893-Thumbnail Image.png
Description
One of the premier technologies for studying human brain functions is the event-related functional magnetic resonance imaging (fMRI). The main design issue for such experiments is to find the optimal sequence for mental stimuli. This optimal design sequence allows for collecting informative data to make precise statistical inferences about the

One of the premier technologies for studying human brain functions is the event-related functional magnetic resonance imaging (fMRI). The main design issue for such experiments is to find the optimal sequence for mental stimuli. This optimal design sequence allows for collecting informative data to make precise statistical inferences about the inner workings of the brain. Unfortunately, this is not an easy task, especially when the error correlation of the response is unknown at the design stage. In the literature, the maximin approach was proposed to tackle this problem. However, this is an expensive and time-consuming method, especially when the correlated noise follows high-order autoregressive models. The main focus of this dissertation is to develop an efficient approach to reduce the amount of the computational resources needed to obtain A-optimal designs for event-related fMRI experiments. One proposed idea is to combine the Kriging approximation method, which is widely used in spatial statistics and computer experiments with a knowledge-based genetic algorithm. Through case studies, a demonstration is made to show that the new search method achieves similar design efficiencies as those attained by the traditional method, but the new method gives a significant reduction in computing time. Another useful strategy is also proposed to find such designs by considering only the boundary points of the parameter space of the correlation parameters. The usefulness of this strategy is also demonstrated via case studies. The first part of this dissertation focuses on finding optimal event-related designs for fMRI with simple trials when each stimulus consists of only one component (e.g., a picture). The study is then extended to the case of compound trials when stimuli of multiple components (e.g., a cue followed by a picture) are considered.
ContributorsAlrumayh, Amani (Author) / Kao, Ming-Hung (Thesis advisor) / Stufken, John (Committee member) / Reiser, Mark R. (Committee member) / Pan, Rong (Committee member) / Cheng, Dan (Committee member) / Arizona State University (Publisher)
Created2019
158061-Thumbnail Image.png
Description
Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed data. However, the research on optimal designs for this type of experiments is still scarce. The joint mixed responses model

Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed data. However, the research on optimal designs for this type of experiments is still scarce. The joint mixed responses model that is considered here involves a mixture of ordinary linear models for the continuous response and a generalized linear model for the binary response. Using the complete class approach, tighter upper bounds on the number of support points required for finding locally optimal designs are derived for the mixed responses models studied in this work.

In the first part of this dissertation, a theoretical result was developed to facilitate the search of locally symmetric optimal designs for mixed responses models with one continuous covariate. Then, the study was extended to mixed responses models that include group effects. Two types of mixed responses models with group effects were investigated. The first type includes models having no common parameters across subject group, and the second type of models allows some common parameters (e.g., a common slope) across groups. In addition to complete class results, an efficient algorithm (PSO-FM) was proposed to search for the A- and D-optimal designs. Finally, the first-order mixed responses model is extended to a type of a quadratic mixed responses model with a quadratic polynomial predictor placed in its linear model.
ContributorsKhogeer, Hazar Abdulrahman (Author) / Kao, Ming-Hung (Thesis advisor) / Stufken, John (Committee member) / Reiser, Mark R. (Committee member) / Zheng, Yi (Committee member) / Cheng, Dan (Committee member) / Arizona State University (Publisher)
Created2020
158520-Thumbnail Image.png
Description
In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric predictive modeling with big data were developed.

Under sparsity, the structure

In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric predictive modeling with big data were developed.

Under sparsity, the structure of DSDs can allow for the screening and optimization of a system in one step, but in non-sparse situations estimation of second-order models requires augmentation of the DSD. In this work, augmentation strategies for DSDs were considered, given the assumption that the correct form of the model for the response of interest is quadratic. Series of augmented designs were constructed and explored, and power calculations, model-robustness criteria, model-discrimination criteria, and simulation study results were used to identify the number of augmented runs necessary for (1) effectively identifying active model effects, and (2) precisely predicting a response of interest. When the goal is identification of active effects, it is shown that supersaturated designs are sufficient; when the goal is prediction, it is shown that little is gained by augmenting beyond the design that is saturated for the full quadratic model. Surprisingly, augmentation strategies based on the I-optimality criterion do not lead to better predictions than strategies based on the D-optimality criterion.

Computational limitations can render standard statistical methods infeasible in the face of massive datasets, necessitating subsampling strategies. In the big data context, the primary objective is often prediction but the correct form of the model for the response of interest is likely unknown. Here, two new methods of subdata selection were proposed. The first is based on clustering, the second is based on space-filling designs, and both are free from model assumptions. The performance of the proposed methods was explored visually via low-dimensional simulated examples; via real data applications; and via large simulation studies. In all cases the proposed methods were compared to existing, widely used subdata selection methods. The conditions under which the proposed methods provide advantages over standard subdata selection strategies were identified.
ContributorsNachtsheim, Abigael (Author) / Stufken, John (Thesis advisor) / Fricks, John (Committee member) / Kao, Ming-Hung (Committee member) / Montgomery, Douglas C. (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2020