Matching Items (4)

157026-Thumbnail Image.png

Bootstrapped information-theoretic model selection with error control (BITSEC)

Description

Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which

Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social sciences. De- spite their common usage, model selection methods are not driven by a notion of statistical confidence, so their results entail an unknown de- gree of uncertainty. This paper introduces a general framework which extends notions of Type-I and Type-II error to model selection. A theo- retical method for controlling Type-I error using Difference of Goodness of Fit (DGOF) distributions is given, along with a bootstrap approach that approximates the procedure. Results are presented for simulated experiments using normal distributions, random walk models, nested linear regression, and nonnested regression including nonlinear mod- els. Tests are performed using an R package developed by the author which will be made publicly available on journal publication of research results.

Contributors

Agent

Created

Date Created
  • 2018

157649-Thumbnail Image.png

Optimal sampling for linear function approximation and high-order finite difference methods over complex regions

Description

I focus on algorithms that generate good sampling points for function approximation. In 1D, it is well known that polynomial interpolation using equispaced points is unstable. On the other hand,

I focus on algorithms that generate good sampling points for function approximation. In 1D, it is well known that polynomial interpolation using equispaced points is unstable. On the other hand, using Chebyshev nodes provides both stable and highly accurate points for polynomial interpolation. In higher dimensional complex regions, optimal sampling points are not known explicitly. This work presents robust algorithms that find good sampling points in complex regions for polynomial interpolation, least-squares, and radial basis function (RBF) methods. The quality of these nodes is measured using the Lebesgue constant. I will also consider optimal sampling for constrained optimization, used to solve PDEs, where boundary conditions must be imposed. Furthermore, I extend the scope of the problem to include finding near-optimal sampling points for high-order finite difference methods. These high-order finite difference methods can be implemented using either piecewise polynomials or RBFs.

Contributors

Agent

Created

Date Created
  • 2019

154905-Thumbnail Image.png

Determining appropriate sample sizes and their effects on key parameters in longitudinal three-level models

Description

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75,

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained using a large-scale longitudinal data set from North Carolina, provided by the National Center on Assessment and Accountability for Special Education (NCAASE). I discuss ranges of sample sizes that are inadequate or adequate for convergence, absolute bias, relative bias, root mean squared error (RMSE), and coverage of individual parameter estimates. The current study, with the help of a detailed two-part simulation design for various sample sizes, model complexity and ICCs, provides various options of adequate sample sizes under different conditions. This study emphasizes that adequate sample sizes at either L1, L2, and L3 can be adjusted according to different interests in parameter estimates, different ranges of acceptable absolute bias, relative bias, root mean squared error, and coverage. Under different model complexity and varying ICC conditions, this study aims to help researchers identify L1, L2, and L3 sample size or both as the source of variation in absolute bias, relative bias, RMSE, or coverage proportions for a certain parameter estimate. This assists researchers in making better decisions for selecting adequate sample sizes in a three-level HLM analysis. A limitation of the study was the use of only a single distribution for the dependent and explanatory variables, different types of distributions and their effects might result in different sample size recommendations.

Contributors

Agent

Created

Date Created
  • 2016

Robust margin based classifiers for small sample data

Description

In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small

In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably more important than any other data type, because the data point could be a cancer patient or the classication decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classications are typically higher in the presence of outlier data points. The aim of this thesis is to develop a maximum margin classier that is suited to address the lack of robustness of discriminant based classiers (like the Support Vector Machine (SVM)) to noise and outliers. The underlying notion is to adopt and develop a natural loss function that is more robust to outliers and more representative of the true loss function of the data. It is demonstrated experimentally that SVM's are indeed susceptible to outliers and that the new classier developed, here coined as Robust-SVM (RSVM), is superior to all studied classier on the synthetic datasets. It is superior to the SVM in both the synthetic and experimental data from biomedical studies and is competent to a classier derived on similar lines when real life data examples are considered.

Contributors

Agent

Created

Date Created
  • 2011