Search Content

Analysis of Santa Monica Water Usage Data for Water Conservation

Description

Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite a growing economy and population. Subsequently, city planners and water suppliers have been struggling to understand this new trend and…

Historically, per capita water demand has tended to increase proportionately with population growth. However, the last two decades have exhibited a different trend; per capita water usage is declining despite a growing economy and population. Subsequently, city planners and water suppliers have been struggling to understand this new trend and whether it will continue over the coming years. This leads to inefficient water management practices as well as flawed water storage design, both of which have adverse impacts on the economy and environment. Water usage data, provided by the city of Santa Monica, was analyzed using a combination of hydro-climatic and demographic variables to dissect these trends and variation in usage. The data proved to be tremendously difficult to work with; several values were missing or erroneously reported, and additional variables had to be brought from external sources to help explain the variation. Upon completion of the data processing, several statistical techniques including regression and clustering models were built to identify potential correlations and understand the consumers’ behavior. The regression models highlighted temperature and precipitation as significant stimuli of water usage, while the cluster models emphasized high volume consumers and their respective demographic traits. However, the overall model accuracy and fit was very poor for the models due to the inadequate quality of data collection and management. The imprecise measurement process for recording water usage along with varying levels of granularity across the different variables prevented the models from revealing meaningful associations. Moving forward, smart meter technology needs to be considered as it accurately captures real-time water usage and transmits the information to data hubs which then implement predictive analytics to provide updated trends. This efficient system will allow cities across the nation to stay abreast of future water usage developments and conserve time, resources, and the environment.

ContributorsPendyala, Kiran Vinaysai (Author) / Garcia, Margaret (Thesis director) / Stufken, John (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Optimal Designs under Logistic Mixed Models

Description

Longitudinal data involving multiple subjects is quite popular in medical and social science areas. I consider generalized linear mixed models (GLMMs) applied to such longitudinal data, and the optimal design searching problem under such models. In this case, based on optimal design theory, the optimality criteria depend on the estimated…

Longitudinal data involving multiple subjects is quite popular in medical and social science areas. I consider generalized linear mixed models (GLMMs) applied to such longitudinal data, and the optimal design searching problem under such models. In this case, based on optimal design theory, the optimality criteria depend on the estimated parameters, which leads to local optimality. Moreover, the information matrix under a GLMM doesn't have a closed-form expression. My dissertation includes three topics related to this design problem. The first part is searching for locally optimal designs under GLMMs with longitudinal data. I apply penalized quasi-likelihood (PQL) method to approximate the information matrix and compare several approximations to show the superiority of PQL over other approximations. Under different local parameters and design restrictions, locally D- and A- optimal designs are constructed based on the approximation. An interesting finding is that locally optimal designs sometimes apply different designs to different subjects. Finally, the robustness of these locally optimal designs is discussed. In the second part, an unknown observational covariate is added to the previous model. With an unknown observational variable in the experiment, expected optimality criteria are considered. Under different assumptions of the unknown variable and parameter settings, locally optimal designs are constructed and discussed. In the last part, Bayesian optimal designs are considered under logistic mixed models. Considering different priors of the local parameters, Bayesian optimal designs are generated. Bayesian design under such a model is usually expensive in time. The running time in this dissertation is optimized to an acceptable amount with accurate results. I also discuss the robustness of these Bayesian optimal designs, which is the motivation of applying such an approach.

ContributorsShi, Yao (Author) / Stufken, John (Thesis advisor) / Kao, Ming-Hung (Thesis advisor) / Lan, Shiwei (Committee member) / Pan, Rong (Committee member) / Reiser, Mark (Committee member) / Arizona State University (Publisher)

Created2022

Threshold regression estimation via lasso, elastic-net, and lad-lasso: a simulation study with applications to urban traffic data

Description

Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related to estimating the value of the threshold. Conventional methods utilize…

Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related to estimating the value of the threshold. Conventional methods utilize an iterative search procedure, seeking to minimize the sum of squares criterion. However, when unnecessary variables are included in the model or certain variables drop out of the model depending on the regime, this method may have high variability. This paper proposes Lasso-type methods as an alternative to ordinary least squares. By incorporating an L_{1} penalty term, Lasso methods perform variable selection, thus potentially reducing some of the variance in estimating the threshold parameter. This paper discusses the results of a study in which two different underlying model structures were simulated. The first is a regression model with correlated predictors, whereas the second is a self-exciting threshold autoregressive model. Finally the proposed Lasso-type methods are compared to conventional methods in an application to urban traffic data.

ContributorsVan Schaijik, Maria (Author) / Kamarianakis, Yiannis (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Arizona State University (Publisher)

Created2015

Robust experimental designs for fMRI with an uncertain design matrix

Description

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and…

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also on the experimental subject's probabilistic behavior during the experiment. The design matrix is thus uncertain at the design stage, making it diffcult to select good designs. By taking this uncertainty into account, a very efficient approach for obtaining high-quality fMRI designs is developed in this study. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that the proposed approach can outperform an existing method in terms of computing time, and the quality of the obtained designs.

ContributorsZhou, Lin (Author) / Kao, Ming-Hung (Thesis advisor) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Welfert, Bruno (Committee member) / Arizona State University (Publisher)

Created2014

Locally D-optimal designs for generalized linear models

Description

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained…

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection.

Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among each other and there is no restriction on the possible values of the continuous covariate. The locally D-optimal design structures for such models are identified and results for obtaining smaller optimal designs using orthogonal arrays (OAs) are presented. Topic II considers GLMs with multiple covariates under the assumptions that all but one covariate are bounded within specified intervals and interaction effects among those bounded covariates may also exist. An explicit formula for D-optimal designs is derived and OA-based smaller D-optimal designs for models with one or two two-factor interactions are also constructed. Topic III considers multiple-covariate logistic models. All covariates are nonnegative and there is no interaction among them. Two types of D-optimal design structures are identified and their global D-optimality is proved using the celebrated equivalence theorem.

ContributorsWang, Zhongsheng (Author) / Stufken, John (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Zheng, Yi (Committee member) / Arizona State University (Publisher)

Created2018

Optimum Experimental Design Issues in Functional Neuroimaging Studies

Description

Functional magnetic resonance imaging (fMRI) is one of the popular tools to study human brain functions. High-quality experimental designs are crucial to the success of fMRI experiments as they allow the collection of informative data for making precise and valid inference with minimum cost. The primary goal of this study…

Functional magnetic resonance imaging (fMRI) is one of the popular tools to study human brain functions. High-quality experimental designs are crucial to the success of fMRI experiments as they allow the collection of informative data for making precise and valid inference with minimum cost. The primary goal of this study is on identifying the best sequence of mental stimuli (i.e. fMRI design) with respect to some statistically meaningful optimality criteria. This work focuses on two related topics in this research field. The first topic is on finding optimal designs for fMRI when the design matrix is uncertain. This challenging design issue occurs in many modern fMRI experiments, in which the design matrix of the statistical model depends on both the selected design and the experimental subject's uncertain behavior during the experiment. As a result, the design matrix cannot be fully determined at the design stage that makes it difficult to select a good design. For the commonly used linear model with autoregressive errors, this study proposes a very efficient approach for obtaining high-quality fMRI designs for such experiments. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that our proposed approach can outperform the existing method in terms of computing time, and the quality of the obtained designs. The second topic of the research is to find optimal designs for fMRI when a wavelet-based technique is considered in the fMRI data analysis. An efficient computer algorithm to search for optimal fMRI designs for such cases is developed. This algorithm is inspired by simulated annealing and a recently proposed algorithm by Saleh et al. (2017). As demonstrated in the case studies, the proposed approach makes it possible to efficiently obtain high-quality designs for fMRI studies, and is practically useful.

ContributorsZhou, Lin (Author) / Kao, Ming-Hung (Thesis advisor) / Welfert, Bruno (Thesis advisor) / Jackiewicz, Zdzislaw (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Taylor, Jesse Earl (Committee member) / Arizona State University (Publisher)

Created2017

fMRI design under autoregressive model with one type of stimulus

Description

Functional magnetic resonance imaging (fMRI) is used to study brain activity due

to stimuli presented to subjects in a scanner. It is important to conduct statistical

inference on such time series fMRI data obtained. It is also important to select optimal designs for practical experiments. Design selection under autoregressive models

have not been…

Functional magnetic resonance imaging (fMRI) is used to study brain activity due

to stimuli presented to subjects in a scanner. It is important to conduct statistical

inference on such time series fMRI data obtained. It is also important to select optimal designs for practical experiments. Design selection under autoregressive models

have not been thoroughly discussed before. This paper derives general information

matrices for orthogonal designs under autoregressive model with an arbitrary number

of correlation coefficients. We further provide the minimum trace of orthogonal circulant designs under AR(1) model, which is used as a criterion to compare practical

designs such as M-sequence designs and circulant (almost) orthogonal array designs.

We also explore optimal designs under AR(2) model. In practice, types of stimuli can

be more than one, but in this paper we only consider the simplest situation with only

one type of stimuli.

ContributorsChen, Chuntao (Author) / Stufken, John (Thesis advisor) / Reiser, Mark R. (Committee member) / Kamarianakis, Ioannis (Committee member) / Arizona State University (Publisher)

Created2017

Essays on the identification and modeling of variance

Description

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two…

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, the adjusted generalized quasi-likelihood model shows improved performance for random effect estimates. In addition, submodels to address deviation in skewness and kurtosis are provided to jointly model the mean, variance, skewness, and kurtosis. The additional models identify covariates influencing the third and fourth moments. A cutoff to trim the data is provided which improves parameter estimation and model fit. For each topic, findings are demonstrated through comprehensive simulation studies and numerical examples. Examples evaluated include data on children’s morbidity in the Philippines, adolescent health from the National Longitudinal Study of Adolescent to Adult Health, as well as proteomic assays for breast cancer screening.

ContributorsIrimata, Katherine E (Author) / Wilson, Jeffrey R (Thesis advisor) / Kamarianakis, Ioannis (Committee member) / Kao, Ming-Hung (Committee member) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Arizona State University (Publisher)

Created2018

A power study of Gffit statistics as somponents of Pearson chi-square

Description

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due…

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on two-way subtables. However, due to variables with a large number of categories and small sample size, even the GFfit statistic may have low power and inaccurate Type I error level due to sparseness in the two-way subtable. In this dissertation, the theoretical power and empirical power of the GFfit statistic are studied. A method based on subsets of orthogonal components for the GFfit statistic on the subtables is developed to improve the performance of the GFfit statistic. Simulation results for power and type I error rate for several different cases along with comparisons to other diagnostics are presented.

ContributorsZhu, Junfei (Author) / Reiser, Mark R. (Thesis advisor) / Stufken, John (Committee member) / Zheng, Yi (Committee member) / St Louis, Robert (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)

Created2017

Optimal Sampling Designs for Functional Data Analysis

Description

Functional regression models are widely considered in practice. To precisely understand an underlying functional mechanism, a good sampling schedule for collecting informative functional data is necessary, especially when data collection is limited. However, scarce research has been conducted on the optimal sampling schedule design for the functional regression model so…

Functional regression models are widely considered in practice. To precisely understand an underlying functional mechanism, a good sampling schedule for collecting informative functional data is necessary, especially when data collection is limited. However, scarce research has been conducted on the optimal sampling schedule design for the functional regression model so far. To address this design issue, efficient approaches are proposed for generating the best sampling plan in the functional regression setting. First, three optimal experimental designs are considered under a function-on-function linear model: the schedule that maximizes the relative efficiency for recovering the predictor function, the schedule that maximizes the relative efficiency for predicting the response function, and the schedule that maximizes the mixture of the relative efficiencies of both the predictor and response functions. The obtained sampling plan allows a precise recovery of the predictor function and a precise prediction of the response function. The proposed approach can also be reduced to identify the optimal sampling plan for the problem with a scalar-on-function linear regression model. In addition, the optimality criterion on predicting a scalar response using a functional predictor is derived when the quadratic relationship between these two variables is present, and proofs of important properties of the derived optimality criterion are also provided. To find such designs, an algorithm that is comparably fast, and can generate nearly optimal designs is proposed. As the optimality criterion includes quantities that must be estimated from prior knowledge (e.g., a pilot study), the effectiveness of the suggested optimal design highly depends on the quality of the estimates. However, in many situations, the estimates are unreliable; thus, a bootstrap aggregating (bagging) approach is employed for enhancing the quality of estimates and for finding sampling schedules stable to the misspecification of estimates. Through case studies, it is demonstrated that the proposed designs outperform other designs in terms of accurately predicting the response and recovering the predictor. It is also proposed that bagging-enhanced design generates a more robust sampling design under the misspecification of estimated quantities.

ContributorsRha, Hyungmin (Author) / Kao, Ming-Hung (Thesis advisor) / Pan, Rong (Thesis advisor) / Stufken, John (Committee member) / Reiser, Mark R. (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by