Matching Items (10)

Filtering by

Clear all filters

133570-Thumbnail Image.png

Regression Analysis on Colony Collapse Disorder in the United States

Description

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD) was coined to explain the sudden and sharp decline of the honey bee colonies that beekeepers were experiencing. Colony collapses have been rising higher compared to expected averages over the years, and during the winter season losses are even more severe than what is normally acceptable. There are some possible explanations pointing towards meteorological variables, diseases, and even pesticide usage. Despite the cause of CCD being unknown, thousands of beekeepers have reported their losses, and even numbers of infected colonies and colonies under certain stressors in the most recent years. Using the data that was reported to The United States Department of Agriculture (USDA), as well as weather data collected by The National Centers for Environmental Information (NOAA) and the National Centers for Environmental Information (NCEI), regression analysis was used to investigate honey bee colonies to find relationships between stressors in honey bee colonies and meteorological variables, and colony collapses during the winter months. The regression analysis focused on the winter season, or quarter 4 of the year, which includes the months of October, November, and December. In the model, the response variables was the percentage of colonies lost in quarter 4. Through the model, it was concluded that certain weather thresholds and the percentage increase of colonies under certain stressors were related to colony loss.

Contributors

Agent

Created

Date Created
2018-05

134937-Thumbnail Image.png

The Value Added of the ASU Spirit Squad to Sun Devil Athletics

Description

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has

Several studies on cheerleading as a sport can be found in the literature; however, there is no research done on the value added to the experience at a university, to an athletic department or at a particular sport. It has been the feeling that collegiate and professional cheerleaders are not given the appropriate recognition nor credit for the amount of work they do. This contribution is sometimes in question as it depends on the school and the sports teams. The benefits are believed to vary based on the university or professional teams. This research investigated how collegiate cheerleaders and dancers add value to the university sport experience. We interviewed key personnel at the university and conference level and polled spectators at sporting events such as basketball and football. We found that the university administration and athletic personnel see the ASU Spirit Squad as value added but spectators had a totally different perspective. The university acknowledges the added value of the Spirit Squad and its necessity. Spectators attend ASU sporting events to support the university and for the entertainment. They enjoy watching the ASU Spirit Squad perform but would continue to attend ASU sporting events even if cheerleaders and dancers were not there.

Contributors

Created

Date Created
2017-05

134976-Thumbnail Image.png

Comparison of Regression and Tree Analyses for Predicting Alcohol-Related Problems

Description

Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly caused by alcohol consumption. Investigating predictors and reasons for alcohol-related

Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly caused by alcohol consumption. Investigating predictors and reasons for alcohol-related problems is of importance, as alcohol-related problems could be prevented by quitting or limiting consumption of alcohol. We were interested in predicting alcohol-related problems using multiple linear regression and regression trees, and then comparing the regressions to the tree. Impaired control, anxiety sensitivity, mother permissiveness, father permissiveness, gender, and age were included as predictors. The data used was comprised of participants (n=835) sampled from students at Arizona State University. A multiple linear regression without interactions, multiple linear regression with two-way interactions and squares, and a regression tree were used and compared. The regression and the tree had similar results. Multiple interactions of variables predicted alcohol-related problems. Overall, the tree was easier to interpret than the regressions, however, the regressions provided specific predicted alcohol-related problems scores, whereas the tree formed large groups and had a predicted alcohol-related problems score for each group. Nevertheless, the tree still predicted alcohol-related problems nearly as well, if not better than the regressions.

Contributors

Agent

Created

Date Created
2016-12

155978-Thumbnail Image.png

Three essays on comparative simulation in three-level hierarchical data structure

Description

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression and covariance parameters as well as the intraclass correlation coefficients are usually obtained. In those cases, I have resorted to use of Laplace approximation and large sample theory approach for point and interval estimates such as Wald-type confidence intervals and profile likelihood confidence intervals. These methods rely on distributional assumptions and large sample theory. However, when dealing with small hierarchical datasets they often result in severe bias or non-convergence. I present a generalized quasi-likelihood approach and a generalized method of moments approach; both do not rely on any distributional assumptions but only moments of response. As an alternative to the typical large sample theory approach, I present bootstrapping hierarchical logistic regression models which provides more accurate interval estimates for small binary hierarchical data. These models substitute computations as an alternative to the traditional Wald-type and profile likelihood confidence intervals. I use a latent variable approach with a new split bootstrap method for estimating intraclass correlation coefficients when analyzing binary data obtained from a three-level hierarchical structure. It is especially useful with small sample size and easily expanded to multilevel. Comparisons are made to existing approaches through both theoretical justification and simulation studies. Further, I demonstrate my findings through an analysis of three numerical examples, one based on cancer in remission data, one related to the China’s antibiotic abuse study, and a third related to teacher effectiveness in schools from a state of southwest US.

Contributors

Agent

Created

Date Created
2017

156371-Thumbnail Image.png

Locally D-optimal designs for generalized linear models

Description

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection.

Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among each other and there is no restriction on the possible values of the continuous covariate. The locally D-optimal design structures for such models are identified and results for obtaining smaller optimal designs using orthogonal arrays (OAs) are presented. Topic II considers GLMs with multiple covariates under the assumptions that all but one covariate are bounded within specified intervals and interaction effects among those bounded covariates may also exist. An explicit formula for D-optimal designs is derived and OA-based smaller D-optimal designs for models with one or two two-factor interactions are also constructed. Topic III considers multiple-covariate logistic models. All covariates are nonnegative and there is no interaction among them. Two types of D-optimal design structures are identified and their global D-optimality is proved using the celebrated equivalence theorem.

Contributors

Agent

Created

Date Created
2018

155598-Thumbnail Image.png

An information based optimal subdata selection algorithm for big data linear regression and a suitable variable selection algorithm

Description

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA is studied from three aspects: nonorthogonality, including interaction terms and variable misspecification. A new accurate variable selection algorithm is proposed to help the implementation of IBOSS algorithms when a large number of variables are present with sparse important variables among them. Aggregating random subsample results, this variable selection algorithm is much more accurate than the LASSO method using full data. Since the time complexity is associated with the number of variables only, it is also very computationally efficient if the number of variables is fixed as n increases and not massively large. More importantly, using subsamples it solves the problem that full data cannot be stored in the memory when a data set is too large.

Contributors

Agent

Created

Date Created
2017

155445-Thumbnail Image.png

A power study of Gffit statistics as somponents of Pearson chi-square

Description

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on two-way subtables. However, due to variables with a large number of categories and small sample size, even the GFfit statistic may have low power and inaccurate Type I error level due to sparseness in the two-way subtable. In this dissertation, the theoretical power and empirical power of the GFfit statistic are studied. A method based on subsets of orthogonal components for the GFfit statistic on the subtables is developed to improve the performance of the GFfit statistic. Simulation results for power and type I error rate for several different cases along with comparisons to other diagnostics are presented.

Contributors

Agent

Created

Date Created
2017

158387-Thumbnail Image.png

Spatial Mortality Modeling in Actuarial Science

Description

Modeling human survivorship is a core area of research within the actuarial com

munity. With life insurance policies and annuity products as dominant financial

instruments which depend on future mortality rates, there is a risk that observed

human mortality experiences will differ from

Modeling human survivorship is a core area of research within the actuarial com

munity. With life insurance policies and annuity products as dominant financial

instruments which depend on future mortality rates, there is a risk that observed

human mortality experiences will differ from projected when they are sold. From an

insurer’s portfolio perspective, to curb this risk, it is imperative that models of hu

man survivorship are constantly being updated and equipped to accurately gauge and

forecast mortality rates. At present, the majority of actuarial research in mortality

modeling involves factor-based approaches which operate at a global scale, placing

little attention on the determinants and interpretable risk factors of mortality, specif

ically from a spatial perspective. With an abundance of research being performed

in the field of spatial statistics and greater accessibility to localized mortality data,

there is a clear opportunity to extend the existing body of mortality literature to

wards the spatial domain. It is the objective of this dissertation to introduce these

new statistical approaches to equip the field of actuarial science to include geographic

space into the mortality modeling context.

First, this dissertation evaluates the underlying spatial patterns of mortality across

the United States, and introduces a spatial filtering methodology to generate latent

spatial patterns which capture the essence of these mortality rates in space. Second,

local modeling techniques are illustrated, and a multiscale geographically weighted

regression (MGWR) model is generated to describe the variation of mortality rates

across space in an interpretable manner which allows for the investigation of the

presence of spatial variability in the determinants of mortality. Third, techniques for

updating traditional mortality models are introduced, culminating in the development

of a model which addresses the relationship between space, economic growth, and

mortality. It is through these applications that this dissertation demonstrates the

utility in updating actuarial mortality models from a spatial perspective.

Contributors

Agent

Created

Date Created
2020

158415-Thumbnail Image.png

Essays on the Modeling of Binary Longitudinal Data with Time-dependent Covariates

Description

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source of correlation. Most methods used to analyze longitudinal data average

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source of correlation. Most methods used to analyze longitudinal data average the effects of time-dependent covariates on outcomes over time and provide a single regression coefficient per time-dependent covariate. This denies researchers the opportunity to follow the changing impact of time-dependent covariates on the outcomes. This dissertation addresses such issue through the use of partitioned regression coefficients in three different papers.

In the first paper, an alternative approach to the partitioned Generalized Method of Moments logistic regression model for longitudinal binary outcomes is presented. This method relies on Bayes estimators and is utilized when the partitioned Generalized Method of Moments model provides numerically unstable estimates of the regression coefficients. It is used to model obesity status in the Add Health study and cognitive impairment diagnosis in the National Alzheimer’s Coordination Center database.

The second paper develops a model that allows the joint modeling of two or more binary outcomes that provide an overall measure of a subject’s trait over time. The simultaneous modelling of all outcomes provides a complete picture of the overall measure of interest. This approach accounts for the correlation among and between the outcomes across time and the changing effects of time-dependent covariates on the outcomes. The model is used to analyze four outcomes measuring overall the quality of life in the Chinese Longitudinal Healthy Longevity Study.

The third paper presents an approach that allows for estimation of cross-sectional and lagged effects of the covariates on the outcome as well as the feedback of the response on future covariates. This is done in two-parts, in part-1, the effects of time-dependent covariates on the outcomes are estimated, then, in part-2, the outcome influences on future values of the covariates are measured. These model parameters are obtained through a Generalized Method of Moments procedure that uses valid moment conditions between the outcome and the covariates. Child morbidity in the Philippines and obesity status in the Add Health data are analyzed.

Contributors

Agent

Created

Date Created
2020

158061-Thumbnail Image.png

Locally Optimal Experimental Designs for Mixed Responses Models

Description

Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed data. However, the research on optimal designs for this type

Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed data. However, the research on optimal designs for this type of experiments is still scarce. The joint mixed responses model that is considered here involves a mixture of ordinary linear models for the continuous response and a generalized linear model for the binary response. Using the complete class approach, tighter upper bounds on the number of support points required for finding locally optimal designs are derived for the mixed responses models studied in this work.

In the first part of this dissertation, a theoretical result was developed to facilitate the search of locally symmetric optimal designs for mixed responses models with one continuous covariate. Then, the study was extended to mixed responses models that include group effects. Two types of mixed responses models with group effects were investigated. The first type includes models having no common parameters across subject group, and the second type of models allows some common parameters (e.g., a common slope) across groups. In addition to complete class results, an efficient algorithm (PSO-FM) was proposed to search for the A- and D-optimal designs. Finally, the first-order mixed responses model is extended to a type of a quadratic mixed responses model with a quadratic polynomial predictor placed in its linear model.

Contributors

Agent

Created

Date Created
2020