Matching Items (16)

134976-Thumbnail Image.png

Comparison of Regression and Tree Analyses for Predicting Alcohol-Related Problems

Description

Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly

Problems related to alcohol consumption cause not only extra economic expenses, but are an expense to the health of both drinkers and non-drinkers due to the harm directly and indirectly caused by alcohol consumption. Investigating predictors and reasons for alcohol-related problems is of importance, as alcohol-related problems could be prevented by quitting or limiting consumption of alcohol. We were interested in predicting alcohol-related problems using multiple linear regression and regression trees, and then comparing the regressions to the tree. Impaired control, anxiety sensitivity, mother permissiveness, father permissiveness, gender, and age were included as predictors. The data used was comprised of participants (n=835) sampled from students at Arizona State University. A multiple linear regression without interactions, multiple linear regression with two-way interactions and squares, and a regression tree were used and compared. The regression and the tree had similar results. Multiple interactions of variables predicted alcohol-related problems. Overall, the tree was easier to interpret than the regressions, however, the regressions provided specific predicted alcohol-related problems scores, whereas the tree formed large groups and had a predicted alcohol-related problems score for each group. Nevertheless, the tree still predicted alcohol-related problems nearly as well, if not better than the regressions.

Contributors

Agent

Created

Date Created
  • 2016-12

133570-Thumbnail Image.png

Regression Analysis on Colony Collapse Disorder in the United States

Description

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has

In the last decade, the population of honey bees across the globe has declined sharply leaving scientists and bee keepers to wonder why? Amongst all nations, the United States has seen some of the greatest declines in the last 10 plus years. Without a definite explanation, Colony Collapse Disorder (CCD) was coined to explain the sudden and sharp decline of the honey bee colonies that beekeepers were experiencing. Colony collapses have been rising higher compared to expected averages over the years, and during the winter season losses are even more severe than what is normally acceptable. There are some possible explanations pointing towards meteorological variables, diseases, and even pesticide usage. Despite the cause of CCD being unknown, thousands of beekeepers have reported their losses, and even numbers of infected colonies and colonies under certain stressors in the most recent years. Using the data that was reported to The United States Department of Agriculture (USDA), as well as weather data collected by The National Centers for Environmental Information (NOAA) and the National Centers for Environmental Information (NCEI), regression analysis was used to investigate honey bee colonies to find relationships between stressors in honey bee colonies and meteorological variables, and colony collapses during the winter months. The regression analysis focused on the winter season, or quarter 4 of the year, which includes the months of October, November, and December. In the model, the response variables was the percentage of colonies lost in quarter 4. Through the model, it was concluded that certain weather thresholds and the percentage increase of colonies under certain stressors were related to colony loss.

Contributors

Agent

Created

Date Created
  • 2018-05

135483-Thumbnail Image.png

Issues of Validity in High-Stakes Testing

Description

Responsible test use requires validation \u2014 the process of collecting evidence to support the inferences drawn from test scores. In high-stakes testing contexts, the need for validation is especially great;

Responsible test use requires validation \u2014 the process of collecting evidence to support the inferences drawn from test scores. In high-stakes testing contexts, the need for validation is especially great; the far-reaching nature of high-stakes testing affects the educational, professional, and financial futures of stakeholders. The Standards for Educational and Psychological Measurement (AERA et al., 2014) offers specific guidance in developing and implementing tests. Still, concerns exist over the extent to which test developers and users of high-stakes tests are making valid inferences from test scores. This paper explores the current state of high-stakes educational testing and the validity issues surrounding it. Drawing on measurement theory literature, educational literature, and professional standards of test development and use, I assess the significance of these concerns and their potential implications for the stakeholders of high-stakes testing programs.

Contributors

Agent

Created

Date Created
  • 2016-05

129271-Thumbnail Image.png

On-the-Fly Assembled Multistage Adaptive Testing

Description

Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing

Recently, multistage testing (MST) has been adopted by several important large-scale testing programs and become popular among practitioners and researchers. Stemming from the decades of history of computerized adaptive testing (CAT), the rapidly growing MST alleviates several major problems of earlier CAT applications. Nevertheless, MST is only one among all possible solutions to these problems. This article presents a new adaptive testing design, “on-the-fly assembled multistage adaptive testing” (OMST), which combines the benefits of CAT and MST and offsets their limitations. Moreover, OMST also provides some unique advantages over both CAT and MST. A simulation study was conducted to compare OMST with MST and CAT, and the results demonstrated the promising features of OMST. Finally, the “Discussion” section provides suggestions on possible future adaptive testing designs based on the OMST framework, which could provide great flexibility for adaptive tests in the digital future and open an avenue for all types of hybrid designs based on the different needs of specific tests.

Contributors

Agent

Created

Date Created
  • 2015-03-01

158061-Thumbnail Image.png

Locally Optimal Experimental Designs for Mixed Responses Models

Description

Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed

Bivariate responses that comprise mixtures of binary and continuous variables are common in medical, engineering, and other scientific fields. There exist many works concerning the analysis of such mixed data. However, the research on optimal designs for this type of experiments is still scarce. The joint mixed responses model that is considered here involves a mixture of ordinary linear models for the continuous response and a generalized linear model for the binary response. Using the complete class approach, tighter upper bounds on the number of support points required for finding locally optimal designs are derived for the mixed responses models studied in this work.

In the first part of this dissertation, a theoretical result was developed to facilitate the search of locally symmetric optimal designs for mixed responses models with one continuous covariate. Then, the study was extended to mixed responses models that include group effects. Two types of mixed responses models with group effects were investigated. The first type includes models having no common parameters across subject group, and the second type of models allows some common parameters (e.g., a common slope) across groups. In addition to complete class results, an efficient algorithm (PSO-FM) was proposed to search for the A- and D-optimal designs. Finally, the first-order mixed responses model is extended to a type of a quadratic mixed responses model with a quadratic polynomial predictor placed in its linear model.

Contributors

Agent

Created

Date Created
  • 2020

155598-Thumbnail Image.png

An information based optimal subdata selection algorithm for big data linear regression and a suitable variable selection algorithm

Description

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA is studied from three aspects: nonorthogonality, including interaction terms and variable misspecification. A new accurate variable selection algorithm is proposed to help the implementation of IBOSS algorithms when a large number of variables are present with sparse important variables among them. Aggregating random subsample results, this variable selection algorithm is much more accurate than the LASSO method using full data. Since the time complexity is associated with the number of variables only, it is also very computationally efficient if the number of variables is fixed as n increases and not massively large. More importantly, using subsamples it solves the problem that full data cannot be stored in the memory when a data set is too large.

Contributors

Agent

Created

Date Created
  • 2017

155978-Thumbnail Image.png

Three essays on comparative simulation in three-level hierarchical data structure

Description

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression and covariance parameters as well as the intraclass correlation coefficients are usually obtained. In those cases, I have resorted to use of Laplace approximation and large sample theory approach for point and interval estimates such as Wald-type confidence intervals and profile likelihood confidence intervals. These methods rely on distributional assumptions and large sample theory. However, when dealing with small hierarchical datasets they often result in severe bias or non-convergence. I present a generalized quasi-likelihood approach and a generalized method of moments approach; both do not rely on any distributional assumptions but only moments of response. As an alternative to the typical large sample theory approach, I present bootstrapping hierarchical logistic regression models which provides more accurate interval estimates for small binary hierarchical data. These models substitute computations as an alternative to the traditional Wald-type and profile likelihood confidence intervals. I use a latent variable approach with a new split bootstrap method for estimating intraclass correlation coefficients when analyzing binary data obtained from a three-level hierarchical structure. It is especially useful with small sample size and easily expanded to multilevel. Comparisons are made to existing approaches through both theoretical justification and simulation studies. Further, I demonstrate my findings through an analysis of three numerical examples, one based on cancer in remission data, one related to the China’s antibiotic abuse study, and a third related to teacher effectiveness in schools from a state of southwest US.

Contributors

Agent

Created

Date Created
  • 2017

158415-Thumbnail Image.png

Essays on the Modeling of Binary Longitudinal Data with Time-dependent Covariates

Description

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source

Longitudinal studies contain correlated data due to the repeated measurements on the same subject. The changing values of the time-dependent covariates and their association with the outcomes presents another source of correlation. Most methods used to analyze longitudinal data average the effects of time-dependent covariates on outcomes over time and provide a single regression coefficient per time-dependent covariate. This denies researchers the opportunity to follow the changing impact of time-dependent covariates on the outcomes. This dissertation addresses such issue through the use of partitioned regression coefficients in three different papers.

In the first paper, an alternative approach to the partitioned Generalized Method of Moments logistic regression model for longitudinal binary outcomes is presented. This method relies on Bayes estimators and is utilized when the partitioned Generalized Method of Moments model provides numerically unstable estimates of the regression coefficients. It is used to model obesity status in the Add Health study and cognitive impairment diagnosis in the National Alzheimer’s Coordination Center database.

The second paper develops a model that allows the joint modeling of two or more binary outcomes that provide an overall measure of a subject’s trait over time. The simultaneous modelling of all outcomes provides a complete picture of the overall measure of interest. This approach accounts for the correlation among and between the outcomes across time and the changing effects of time-dependent covariates on the outcomes. The model is used to analyze four outcomes measuring overall the quality of life in the Chinese Longitudinal Healthy Longevity Study.

The third paper presents an approach that allows for estimation of cross-sectional and lagged effects of the covariates on the outcome as well as the feedback of the response on future covariates. This is done in two-parts, in part-1, the effects of time-dependent covariates on the outcomes are estimated, then, in part-2, the outcome influences on future values of the covariates are measured. These model parameters are obtained through a Generalized Method of Moments procedure that uses valid moment conditions between the outcome and the covariates. Child morbidity in the Philippines and obesity status in the Add Health data are analyzed.

Contributors

Agent

Created

Date Created
  • 2020

158387-Thumbnail Image.png

Spatial Mortality Modeling in Actuarial Science

Description

Modeling human survivorship is a core area of research within the actuarial com

munity. With life insurance policies and annuity products as dominant financial

instruments which depend on future mortality rates, there

Modeling human survivorship is a core area of research within the actuarial com

munity. With life insurance policies and annuity products as dominant financial

instruments which depend on future mortality rates, there is a risk that observed

human mortality experiences will differ from projected when they are sold. From an

insurer’s portfolio perspective, to curb this risk, it is imperative that models of hu

man survivorship are constantly being updated and equipped to accurately gauge and

forecast mortality rates. At present, the majority of actuarial research in mortality

modeling involves factor-based approaches which operate at a global scale, placing

little attention on the determinants and interpretable risk factors of mortality, specif

ically from a spatial perspective. With an abundance of research being performed

in the field of spatial statistics and greater accessibility to localized mortality data,

there is a clear opportunity to extend the existing body of mortality literature to

wards the spatial domain. It is the objective of this dissertation to introduce these

new statistical approaches to equip the field of actuarial science to include geographic

space into the mortality modeling context.

First, this dissertation evaluates the underlying spatial patterns of mortality across

the United States, and introduces a spatial filtering methodology to generate latent

spatial patterns which capture the essence of these mortality rates in space. Second,

local modeling techniques are illustrated, and a multiscale geographically weighted

regression (MGWR) model is generated to describe the variation of mortality rates

across space in an interpretable manner which allows for the investigation of the

presence of spatial variability in the determinants of mortality. Third, techniques for

updating traditional mortality models are introduced, culminating in the development

of a model which addresses the relationship between space, economic growth, and

mortality. It is through these applications that this dissertation demonstrates the

utility in updating actuarial mortality models from a spatial perspective.

Contributors

Agent

Created

Date Created
  • 2020

158444-Thumbnail Image.png

Ethnic Differences in Health and Cardiovascular Risk Factors of Asians in Arizona

Description

This research is an anthology of a series of papers intended to describe the health state, healthcare experiences, healthcare preventive practice, healthcare barriers, and cardiovascular disease (CVD) risk factors of

This research is an anthology of a series of papers intended to describe the health state, healthcare experiences, healthcare preventive practice, healthcare barriers, and cardiovascular disease (CVD) risk factors of Asian Americans (AA) residing in Arizona (AZ). Asian Americans are known to be vulnerable populations and there is paucity of data on interventions to reduce CVD risk factors. An extensive literature review showed no available disaggregated health data of AA in AZ. The Neuman Systems Model guided this study. Chapter 1 elucidates the importance of conducting the research. It provides an overview of the literature, theory, and methodology of the study. Chapters 2 and 3 describe the results of a cross-sectional descriptive secondary analysis using the 2013, 2015, and 2017 Behavior Risk Factor Surveillance System (BRFSS) datasets. The outcomes demonstrate the disaggregated epidemiological phenomenon of AA. There were variations in their social determinants of health, healthcare barriers, healthcare preventive practice, CVD risk factors, and healthcare experiences based on perceived racism. It highlighted modifiable and non-modifiable predictors of hypertension (HTN) and diabetes. Chapter 4 is an integrative review of interventions implemented to reduce CVD risks tailored for Filipino Americans. Chapter 5 summarizes the research findings. The results may provide the community of practicing nurses, researchers, and clinicians the evidence to plan, prioritize, and implement comprehensive, theoretically guided, and culturally tailored community-led primary and secondary prevention programs to improve their health outcomes. The data may serve as a tool for stakeholders and policy makers to advocate for public health policies that will elevate population health of AA or communities of color in AZ to be in line with non-Hispanic White counterparts.

Contributors

Agent

Created

Date Created
  • 2020