Matching Items (7)

Filtering by

Clear all filters

157026-Thumbnail Image.png

Bootstrapped information-theoretic model selection with error control (BITSEC)

Description

Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social

Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social sciences. De- spite their common usage, model selection methods are not driven by a notion of statistical confidence, so their results entail an unknown de- gree of uncertainty. This paper introduces a general framework which extends notions of Type-I and Type-II error to model selection. A theo- retical method for controlling Type-I error using Difference of Goodness of Fit (DGOF) distributions is given, along with a bootstrap approach that approximates the procedure. Results are presented for simulated experiments using normal distributions, random walk models, nested linear regression, and nonnested regression including nonlinear mod- els. Tests are performed using an R package developed by the author which will be made publicly available on journal publication of research results.

Contributors

Agent

Created

Date Created
2018

156576-Thumbnail Image.png

Three essays on shrinkage estimation and model selection of linear and nonlinear time series models

Description

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed through regressing current values on previous realizations and proxy innovations. The classic paradigm fails when dynamics are nonlinear; in this case, parametric, regime-switching specifications model changes in level, ARMA dynamics, and volatility, using a finite number of latent states. If the states can be identified using past endogenous or exogenous information, a threshold autoregressive (TAR) or logistic smooth transition autoregressive (LSTAR) model may simplify complex nonlinear associations to conditional weakly stationary processes. For ARMA, TAR, and STAR, order parameters quantify the extent past information is associated with the future. Unfortunately, even if model orders are known a priori, the possibility of over-fitting can lead to sub-optimal forecasting performance. By intentionally overestimating these orders, a linear representation of the full model is exploited and Bayesian regularization can be used to achieve sparsity. Global-local shrinkage priors for AR, MA, and exogenous coefficients are adopted to pull posterior means toward 0 without over-shrinking relevant effects. This dissertation introduces, evaluates, and compares Bayesian techniques that automatically perform model selection and coefficient estimation of ARMA, TAR, and STAR models. Multiple Monte Carlo experiments illustrate the accuracy of these methods in finding the "true" data generating process. Practical applications demonstrate their efficacy in forecasting.

Contributors

Agent

Created

Date Created
2018

157719-Thumbnail Image.png

Experimental design issues in functional brain imaging with high temporal resolution

Description

Functional brain imaging experiments are widely conducted in many fields for study- ing the underlying brain activity in response to mental stimuli. For such experiments, it is crucial to select a good sequence of mental stimuli that allow researchers to

Functional brain imaging experiments are widely conducted in many fields for study- ing the underlying brain activity in response to mental stimuli. For such experiments, it is crucial to select a good sequence of mental stimuli that allow researchers to collect informative data for making precise and valid statistical inferences at minimum cost. In contrast to most existing studies, the aim of this study is to obtain optimal designs for brain mapping technology with an ultra-high temporal resolution with respect to some common statistical optimality criteria. The first topic of this work is on finding optimal designs when the primary interest is in estimating the Hemodynamic Response Function (HRF), a function of time describing the effect of a mental stimulus to the brain. A major challenge here is that the design matrix of the statistical model is greatly enlarged. As a result, it is very difficult, if not infeasible, to compute and compare the statistical efficiencies of competing designs. For tackling this issue, an efficient approach is built on subsampling the design matrix and the use of an efficient computer algorithm is proposed. It is demonstrated through the analytical and simulation results that the proposed approach can outperform the existing methods in terms of computing time, and the quality of the obtained designs. The second topic of this work is to find optimal designs when another set of popularly used basis functions is considered for modeling the HRF, e.g., to detect brain activations. Although the statistical model for analyzing the data remains linear, the parametric functions of interest under this setting are often nonlinear. The quality of the de- sign will then depend on the true value of some unknown parameters. To address this issue, the maximin approach is considered to identify designs that maximize the relative efficiencies over the parameter space. As shown in the case studies, these maximin designs yield high performance for detecting brain activation compared to the traditional designs that are widely used in practice.

Contributors

Agent

Created

Date Created
2019

158338-Thumbnail Image.png

Simultaneous Material Microstructure Classification and Discovery using Acoustic Emission Signals

Description

Acoustic emission (AE) signals have been widely employed for tracking material properties and structural characteristics. In this study, the aim is to analyze the AE signals gathered during a scanning probe lithography process to classify the known microstructure types and

Acoustic emission (AE) signals have been widely employed for tracking material properties and structural characteristics. In this study, the aim is to analyze the AE signals gathered during a scanning probe lithography process to classify the known microstructure types and discover unknown surface microstructures/anomalies. To achieve this, a Hidden Markov Model is developed to consider the temporal dependency of the high-resolution AE data. Furthermore, the posterior classification probability and the negative likelihood score for microstructure classification and discovery are computed. Subsequently, a diagnostic procedure to identify the dominant AE frequencies that were used to track the microstructural characteristics is presented. In addition, machine learning methods such as KNN, Naive Bayes, and Logistic Regression classifiers are applied. Finally, the proposed approach applied to identify the surface microstructures of additively manufactured Ti-6Al-4V and show that it not only achieved a high classification accuracy (e.g., more than 90\%) but also correctly identified the microstructural anomalies that may be subjected to further investigation to discover new material phases/properties.

Contributors

Agent

Created

Date Created
2020

158520-Thumbnail Image.png

Contributions to Optimal Experimental Design and Strategic Subdata Selection for Big Data

Description

In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric

In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric predictive modeling with big data were developed.

Under sparsity, the structure of DSDs can allow for the screening and optimization of a system in one step, but in non-sparse situations estimation of second-order models requires augmentation of the DSD. In this work, augmentation strategies for DSDs were considered, given the assumption that the correct form of the model for the response of interest is quadratic. Series of augmented designs were constructed and explored, and power calculations, model-robustness criteria, model-discrimination criteria, and simulation study results were used to identify the number of augmented runs necessary for (1) effectively identifying active model effects, and (2) precisely predicting a response of interest. When the goal is identification of active effects, it is shown that supersaturated designs are sufficient; when the goal is prediction, it is shown that little is gained by augmenting beyond the design that is saturated for the full quadratic model. Surprisingly, augmentation strategies based on the I-optimality criterion do not lead to better predictions than strategies based on the D-optimality criterion.

Computational limitations can render standard statistical methods infeasible in the face of massive datasets, necessitating subsampling strategies. In the big data context, the primary objective is often prediction but the correct form of the model for the response of interest is likely unknown. Here, two new methods of subdata selection were proposed. The first is based on clustering, the second is based on space-filling designs, and both are free from model assumptions. The performance of the proposed methods was explored visually via low-dimensional simulated examples; via real data applications; and via large simulation studies. In all cases the proposed methods were compared to existing, widely used subdata selection methods. The conditions under which the proposed methods provide advantages over standard subdata selection strategies were identified.

Contributors

Agent

Created

Date Created
2020

161250-Thumbnail Image.png

Statistical Inference of Dynamics in Neurons via Stochastic EM

Description

Inside cells, axonal and dendritic transport by motor proteins is a process that is responsible for supplying cargo, such as vesicles and organelles, to support neuronal function. Motor proteins achieve transport through a cycle of chemical and mechanical processes. Particle

Inside cells, axonal and dendritic transport by motor proteins is a process that is responsible for supplying cargo, such as vesicles and organelles, to support neuronal function. Motor proteins achieve transport through a cycle of chemical and mechanical processes. Particle tracking experiments are used to study this intracellular cargo transport by recording multi-dimensional, discrete cargo position trajectories over time. However, due to experimental limitations, much of the mechanochemical process cannot be directly observed, making mathematical modeling and statistical inference an essential tool for identifying the underlying mechanisms. The cargo movement during transport is modeled using a switching stochastic differential equation framework that involves classification into one of three proposed hidden regimes. Each regime is characterized by different levels of velocity and stochasticity. The equations are presented as a state-space model with Markovian properties. Through a stochastic expectation-maximization algorithm, statistical inference can be made based on the observed trajectory. Regime predictions and particle location predictions are calculated through an auxiliary particle filter and particle smoother. Based on these predictions, parameters are estimated through maximum likelihood. Diagnostics are proposed that can assess model performance and therefore also be a form of model selection criteria. Model selection is used to find the most accurate regime models and the optimal number of regimes for a certain motor-cargo system. A method for incorporating a second positional dimension is also introduced. These methods are tested on both simulated data and different types of experimental data.

Contributors

Agent

Created

Date Created
2021

158850-Thumbnail Image.png

Spatial Regression and Gaussian Process BART

Description

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models

Spatial regression is one of the central topics in spatial statistics. Based on the goals, interpretation or prediction, spatial regression models can be classified into two categories, linear mixed regression models and nonlinear regression models. This dissertation explored these models and their real world applications. New methods and models were proposed to overcome the challenges in practice. There are three major parts in the dissertation.

In the first part, nonlinear regression models were embedded into a multistage workflow to predict the spatial abundance of reef fish species in the Gulf of Mexico. There were two challenges, zero-inflated data and out of sample prediction. The methods and models in the workflow could effectively handle the zero-inflated sampling data without strong assumptions. Three strategies were proposed to solve the out of sample prediction problem. The results and discussions showed that the nonlinear prediction had the advantages of high accuracy, low bias and well-performed in multi-resolution.

In the second part, a two-stage spatial regression model was proposed for analyzing soil carbon stock (SOC) data. In the first stage, there was a spatial linear mixed model that captured the linear and stationary effects. In the second stage, a generalized additive model was used to explain the nonlinear and nonstationary effects. The results illustrated that the two-stage model had good interpretability in understanding the effect of covariates, meanwhile, it kept high prediction accuracy which is competitive to the popular machine learning models, like, random forest, xgboost and support vector machine.

A new nonlinear regression model, Gaussian process BART (Bayesian additive regression tree), was proposed in the third part. Combining advantages in both BART and Gaussian process, the model could capture the nonlinear effects of both observed and latent covariates. To develop the model, first, the traditional BART was generalized to accommodate correlated errors. Then, the failure of likelihood based Markov chain Monte Carlo (MCMC) in parameter estimating was discussed. Based on the idea of analysis of variation, back comparing and tuning range, were proposed to tackle this failure. Finally, effectiveness of the new model was examined by experiments on both simulation and real data.

Contributors

Agent

Created

Date Created
2020