Matching Items (17)

Sun Devil Fitness Complex (SDFC) Tempe User Satisfaction Survey

Description

The purpose of this study was to assess usage and satisfaction of a large university recreation fitness center. Data from 471 respondents was collected during Spring 2018. Although users were

The purpose of this study was to assess usage and satisfaction of a large university recreation fitness center. Data from 471 respondents was collected during Spring 2018. Although users were satisfied overall, we obtained useful information to guide center administration towards improved usage rates and experiences for users of the center.

Contributors

Agent

Created

Date Created
  • 2018-05

129299-Thumbnail Image.png

SATURATED LOCALLY OPTIMAL DESIGNS UNDER DIFFERENTIABLE OPTIMALITY CRITERIA

Description

We develop general theory for finding locally optimal designs in a class of single-covariate models under any differentiable optimality criterion. Yang and Stufken [Ann. Statist. 40 (2012) 1665–1681] and Dette

We develop general theory for finding locally optimal designs in a class of single-covariate models under any differentiable optimality criterion. Yang and Stufken [Ann. Statist. 40 (2012) 1665–1681] and Dette and Schorning [Ann. Statist. 41 (2013) 1260–1267] gave complete class results for optimal designs under such models. Based on their results, saturated optimal designs exist; however, how to find such designs has not been addressed. We develop tools to find saturated optimal designs, and also prove their uniqueness under mild conditions.

Contributors

Agent

Created

Date Created
  • 2015-02-01

132759-Thumbnail Image.png

Analysis of Santa Monica Water Usage Data for Water Conservation

Description

Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite

Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite a growing economy and population. Subsequently, city planners and water suppliers have been struggling to understand this new trend and whether it will continue over the coming years. This leads to inefficient water management practices as well as flawed water storage design, both of which have adverse impacts on the economy and environment. Water usage data, provided by the city of Santa Monica, was analyzed using a combination of hydro-climatic and demographic variables to dissect these trends and variation in usage. The data proved to be tremendously difficult to work with; several values were missing or erroneously reported, and additional variables had to be brought from external sources to help explain the variation. Upon completion of the data processing, several statistical techniques including regression and clustering models were built to identify potential correlations and understand the consumers’ behavior. The regression models highlighted temperature and precipitation as significant stimuli of water usage, while the cluster models emphasized high volume consumers and their respective demographic traits. However, the overall model accuracy and fit was very poor for the models due to the inadequate quality of data collection and management. The imprecise measurement process for recording water usage along with varying levels of granularity across the different variables prevented the models from revealing meaningful associations. Moving forward, smart meter technology needs to be considered as it accurately captures real-time water usage and transmits the information to data hubs which then implement predictive analytics to provide updated trends. This efficient system will allow cities across the nation to stay abreast of future water usage developments and conserve time, resources, and the environment.

Contributors

Agent

Created

Date Created
  • 2019-05

158061-Thumbnail Image.png

Locally Optimal Experimental Designs for Mixed Responses Models

Description

Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed

Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed data. However, the research on optimal designs for this type of experiments is still scarce. The joint mixed responses model that is considered here involves a mixture of ordinary linear models for the continuous response and a generalized linear model for the binary response. Using the complete class approach, tighter upper bounds on the number of support points required for finding locally optimal designs are derived for the mixed responses models studied in this work.

In the first part of this dissertation, a theoretical result was developed to facilitate the search of locally symmetric optimal designs for mixed responses models with one continuous covariate. Then, the study was extended to mixed responses models that include group effects. Two types of mixed responses models with group effects were investigated. The first type includes models having no common parameters across subject group, and the second type of models allows some common parameters (e.g., a common slope) across groups. In addition to complete class results, an efficient algorithm (PSO-FM) was proposed to search for the A- and D-optimal designs. Finally, the first-order mixed responses model is extended to a type of a quadratic mixed responses model with a quadratic polynomial predictor placed in its linear model.

Contributors

Agent

Created

Date Created
  • 2020

156163-Thumbnail Image.png

Essays on the identification and modeling of variance

Description

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, the adjusted generalized quasi-likelihood model shows improved performance for random effect estimates. In addition, submodels to address deviation in skewness and kurtosis are provided to jointly model the mean, variance, skewness, and kurtosis. The additional models identify covariates influencing the third and fourth moments. A cutoff to trim the data is provided which improves parameter estimation and model fit. For each topic, findings are demonstrated through comprehensive simulation studies and numerical examples. Examples evaluated include data on children’s morbidity in the Philippines, adolescent health from the National Longitudinal Study of Adolescent to Adult Health, as well as proteomic assays for breast cancer screening.

Contributors

Agent

Created

Date Created
  • 2018

157893-Thumbnail Image.png

Maximin designs for event-related fMRI with uncertain error correlation

Description

One of the premier technologies for studying human brain functions is the event-related functional magnetic resonance imaging (fMRI). The main design issue for such experiments is to find the optimal

One of the premier technologies for studying human brain functions is the event-related functional magnetic resonance imaging (fMRI). The main design issue for such experiments is to find the optimal sequence for mental stimuli. This optimal design sequence allows for collecting informative data to make precise statistical inferences about the inner workings of the brain. Unfortunately, this is not an easy task, especially when the error correlation of the response is unknown at the design stage. In the literature, the maximin approach was proposed to tackle this problem. However, this is an expensive and time-consuming method, especially when the correlated noise follows high-order autoregressive models. The main focus of this dissertation is to develop an efficient approach to reduce the amount of the computational resources needed to obtain A-optimal designs for event-related fMRI experiments. One proposed idea is to combine the Kriging approximation method, which is widely used in spatial statistics and computer experiments with a knowledge-based genetic algorithm. Through case studies, a demonstration is made to show that the new search method achieves similar design efficiencies as those attained by the traditional method, but the new method gives a significant reduction in computing time. Another useful strategy is also proposed to find such designs by considering only the boundary points of the parameter space of the correlation parameters. The usefulness of this strategy is also demonstrated via case studies. The first part of this dissertation focuses on finding optimal event-related designs for fMRI with simple trials when each stimulus consists of only one component (e.g., a picture). The study is then extended to the case of compound trials when stimuli of multiple components (e.g., a cue followed by a picture) are considered.

Contributors

Agent

Created

Date Created
  • 2019

158208-Thumbnail Image.png

Optimal Sampling Designs for Functional Data Analysis

Description

Functional regression models are widely considered in practice. To precisely understand an underlying functional mechanism, a good sampling schedule for collecting informative functional data is necessary, especially when data collection

Functional regression models are widely considered in practice. To precisely understand an underlying functional mechanism, a good sampling schedule for collecting informative functional data is necessary, especially when data collection is limited. However, scarce research has been conducted on the optimal sampling schedule design for the functional regression model so far. To address this design issue, efficient approaches are proposed for generating the best sampling plan in the functional regression setting. First, three optimal experimental designs are considered under a function-on-function linear model: the schedule that maximizes the relative efficiency for recovering the predictor function, the schedule that maximizes the relative efficiency for predicting the response function, and the schedule that maximizes the mixture of the relative efficiencies of both the predictor and response functions. The obtained sampling plan allows a precise recovery of the predictor function and a precise prediction of the response function. The proposed approach can also be reduced to identify the optimal sampling plan for the problem with a scalar-on-function linear regression model. In addition, the optimality criterion on predicting a scalar response using a functional predictor is derived when the quadratic relationship between these two variables is present, and proofs of important properties of the derived optimality criterion are also provided. To find such designs, an algorithm that is comparably fast, and can generate nearly optimal designs is proposed. As the optimality criterion includes quantities that must be estimated from prior knowledge (e.g., a pilot study), the effectiveness of the suggested optimal design highly depends on the quality of the estimates. However, in many situations, the estimates are unreliable; thus, a bootstrap aggregating (bagging) approach is employed for enhancing the quality of estimates and for finding sampling schedules stable to the misspecification of estimates. Through case studies, it is demonstrated that the proposed designs outperform other designs in terms of accurately predicting the response and recovering the predictor. It is also proposed that bagging-enhanced design generates a more robust sampling design under the misspecification of estimated quantities.

Contributors

Agent

Created

Date Created
  • 2020

155598-Thumbnail Image.png

An information based optimal subdata selection algorithm for big data linear regression and a suitable variable selection algorithm

Description

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA is studied from three aspects: nonorthogonality, including interaction terms and variable misspecification. A new accurate variable selection algorithm is proposed to help the implementation of IBOSS algorithms when a large number of variables are present with sparse important variables among them. Aggregating random subsample results, this variable selection algorithm is much more accurate than the LASSO method using full data. Since the time complexity is associated with the number of variables only, it is also very computationally efficient if the number of variables is fixed as n increases and not massively large. More importantly, using subsamples it solves the problem that full data cannot be stored in the memory when a data set is too large.

Contributors

Agent

Created

Date Created
  • 2017

153049-Thumbnail Image.png

Robust experimental designs for fMRI with an uncertain design matrix

Description

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also on the experimental subject's probabilistic behavior during the experiment. The design matrix is thus uncertain at the design stage, making it diffcult to select good designs. By taking this uncertainty into account, a very efficient approach for obtaining high-quality fMRI designs is developed in this study. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that the proposed approach can outperform an existing method in terms of computing time, and the quality of the obtained designs.

Contributors

Agent

Created

Date Created
  • 2014

155445-Thumbnail Image.png

A power study of Gffit statistics as somponents of Pearson chi-square

Description

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on two-way subtables. However, due to variables with a large number of categories and small sample size, even the GFfit statistic may have low power and inaccurate Type I error level due to sparseness in the two-way subtable. In this dissertation, the theoretical power and empirical power of the GFfit statistic are studied. A method based on subsets of orthogonal components for the GFfit statistic on the subtables is developed to improve the performance of the GFfit statistic. Simulation results for power and type I error rate for several different cases along with comparisons to other diagnostics are presented.

Contributors

Agent

Created

Date Created
  • 2017