Search Content

A continuous latent factor model for non-ignorable missing data in longitudinal studies

Description

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment…

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses. Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required. In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.

ContributorsZhang, Jun (Author) / Reiser, Mark R. (Thesis advisor) / Barber, Jarrett (Thesis advisor) / Kao, Ming-Hung (Committee member) / Wilson, Jeffrey (Committee member) / St Louis, Robert D. (Committee member) / Arizona State University (Publisher)

Created2013

Alternative methods via random forest to identify interactions in a general framework and variable importance in the context of value-added models

Description

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’…

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect to the unknown underlying model. In that regard, this study proposes alternative ways to rank teacher effects that are not dependent on a given model by introducing two variable importance measures (VIMs), the node-proportion and the covariate-proportion. These VIMs are novel because they take into account the final configuration of the terminal nodes in the constitutive trees in a random forest. In a simulation study, under a variety of conditions, true rankings of teacher effects are compared with estimated rankings obtained using three sources: the newly proposed VIMs, existing VIMs, and EBLUPs from the assumed linear model specification. The newly proposed VIMs outperform all others in various scenarios where the model was misspecified. The second study develops two novel interaction measures. These measures could be used within but are not restricted to the VAM framework. The distribution-based measure is constructed to identify interactions in a general setting where a model specification is not assumed in advance. In turn, the mean-based measure is built to estimate interactions when the model specification is assumed to be linear. Both measures are unique in their construction; they take into account not only the outcome values, but also the internal structure of the trees in a random forest. In a separate simulation study, under a variety of conditions, the proposed measures are found to identify and estimate second-order interactions.

ContributorsValdivia, Arturo (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Broatch, Jennifer (Committee member) / Arizona State University (Publisher)

Created2013

Testing independence of parallel pseudorandom number streams: incorporating the data's multivariate nature

Description

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs…

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of stream dependence can be missed by each approach because neither one fully addresses the multivariate nature of the accumulated data when generators are run in parallel. This dissertation identifies these potential faults in the parallel testing methodologies of TestU01 and investigates two different methods to better detect inter-stream dependencies: correlation motivated multivariate tests and vector time series based tests. These methods have been implemented in an extension to TestU01 built in C++ and the unique aspects of this extension are discussed. A variety of different generation scenarios are then examined using the TestU01 suite in concert with the extension. This enhanced software package is found to better detect certain forms of inter-stream dependencies than the original TestU01 suites of tests.

ContributorsIsmay, Chester (Author) / Eubank, Randall (Thesis advisor) / Young, Dennis (Committee member) / Kao, Ming-Hung (Committee member) / Lanchier, Nicolas (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2013

Public perceptions of climate change: risk, trust, and policy

Description

Global climate change (GCC) is among the most important issues of the 21st century. Adaptation to and mitigation of climate change are some of the salient local and regional challenges scientists, decision makers, and the general public face today and will be in the near future. However, designed adaptation and…

Global climate change (GCC) is among the most important issues of the 21st century. Adaptation to and mitigation of climate change are some of the salient local and regional challenges scientists, decision makers, and the general public face today and will be in the near future. However, designed adaptation and mitigation strategies do not guarantee success in coping with global climate change. Despite the robust and convincing body for anthropogenic global climate change research and science there is still a significant gap between the recommendations provided by the scientific community and the actual actions by the public and policy makers. In order to design, implement, and generate sufficient public support for policies and planning interventions at the national and international level, it is necessary to have a good understanding of the public's perceptions regarding GCC. Based on survey research in nine countries, the purpose of this study is two-fold: First, to understand the nature of public perceptions of global climate change in different countries; and secondly to identi-fy perception factors which have a significant impact on the public's willingness to sup-port GCC policies or commit to behavioral changes to reduce GHG emissions. Factors such as trust in GCC information which need to be considered in future climate change communication efforts are also dealt with in this dissertation. This study has identified several aspects that need to be considered in future communication programs. GCC is characterized by high uncertainties, unfamiliar risks, and other characteristics of hazards which make personal connections, responsibility and engagement difficult. Communication efforts need to acknowledge these obstacles, build up trust and motivate the public to be more engaged in reducing GCC by emphasizing the multiple benefits of many policies outside of just reducing GCC. Levels of skepticism among the public towards the reality of GCC as well as the trustworthiness and sufficien-cy of the scientific findings varies by country. Thus, communicators need to be aware of their audience in order to decide how educational their program needs to be.

ContributorsHagen, Bjoern (Author) / Pijawka, David (Thesis advisor) / Brazel, Anthony (Committee member) / Chhetri, Netra (Committee member) / Guhathakurta, Subhrajit (Committee member) / Arizona State University (Publisher)

Created2013

Daily diary data: effects of cycles on inferences

Description

Daily dairies and other intensive measurement methods are increasingly used to study the relationships between two time varying variables X and Y. These data are commonly analyzed using longitudinal multilevel or bivariate growth curve models that allow for random effects of intercept (and sometimes also slope) but which do not…

Daily dairies and other intensive measurement methods are increasingly used to study the relationships between two time varying variables X and Y. These data are commonly analyzed using longitudinal multilevel or bivariate growth curve models that allow for random effects of intercept (and sometimes also slope) but which do not address the effects of weekly cycles in the data. Three Monte Carlo studies investigated the impact of omitting the weekly cycles in daily dairy data under the multilevel model framework. In cases where cycles existed in both the time-varying predictor series (X) and the time-varying outcome series (Y) but were ignored, the effects of the within- and between-person components of X on Y tended to be biased, as were their corresponding standard errors. The direction and magnitude of the bias depended on the phase difference between the cycles in the two series. In cases where cycles existed in only one series but were ignored, the standard errors of the regression coefficients for the within- and between-person components of X tended to be biased, and the direction and magnitude of bias depended on which series contained cyclical components.

ContributorsLiu, Yu (Author) / West, Stephen G. (Thesis advisor) / Enders, Craig K. (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)

Created2013

Natural desert and human controlled landscapes: remote sensing of LULC response to drought

Description

Droughts are a common phenomenon of the arid South-west USA climate. Despite water limitations, the region has been substantially transformed by agriculture and urbanization. The water requirements to support these human activities along with the projected increase in droughts intensity and frequency challenge long term sustainability and water security, thus…

Droughts are a common phenomenon of the arid South-west USA climate. Despite water limitations, the region has been substantially transformed by agriculture and urbanization. The water requirements to support these human activities along with the projected increase in droughts intensity and frequency challenge long term sustainability and water security, thus the need to spatially and temporally characterize land use/land cover response to drought and quantify water consumption is crucial. This dissertation evaluates changes in `undisturbed' desert vegetation in response to water availability to characterize climate-driven variability. A new model coupling phenology and spectral unmixing was applied to Landsat time series (1987-2010) in order to derive fractional cover (FC) maps of annuals, perennials, and evergreen vegetation. Results show that annuals FC is controlled by short term water availability and antecedent soil moisture. Perennials FC follow wet-dry multi-year regime shifts, while evergreen is completely decoupled from short term changes in water availability. Trend analysis suggests that different processes operate at the local scale. Regionally, evergreen cover increased while perennials and annuals cover decreased. Subsequently, urban land cover was compared with its surrounding desert. A distinct signal of rain use efficiency and aridity index was documented from remote sensing and a soil-water-balance model. It was estimated that a total of 295 mm of water input is needed to sustain current greenness. Finally, an energy balance model was developed to spatio-temporally estimate evapotranspiration (ET) as a proxy for water consumption, and evaluate land use/land cover types in response to drought. Agricultural fields show an average ET of 9.3 mm/day with no significant difference between drought and wet conditions, implying similar level of water usage regardless of climatic conditions. Xeric neighborhoods show significant variability between dry and wet conditions, while mesic neighborhoods retain high ET of 400-500 mm during drought due to irrigation. Considering the potentially limited water availability, land use/land cover changes due to population increases, and the threat of a warming and drying climate, maintaining large water-consuming, irrigated landscapes challenges sustainable practices of water conservation and the need to provide amenities of this desert area for enhancing quality of life.

ContributorsKaplan, Shai (Author) / Myint, Soe Win (Thesis advisor) / Brazel, Anthony J. (Committee member) / Georgescu, Matei (Committee member) / Arizona State University (Publisher)

Created2014

Problems of transportation planning during winter storms in Portland, Oregon, and Seattle, Washington: a comparative study

Description

Winter storms decrease the safety of roadways as it brings ice and snow to the roads and increases accidents, delays, and travel time. Not only are personal vehicles affected, but public transportation, commercial transportation, and emergency vehicles are affected as well. Portland, Oregon, and Seattle, Washington, both suffer from mild,…

Winter storms decrease the safety of roadways as it brings ice and snow to the roads and increases accidents, delays, and travel time. Not only are personal vehicles affected, but public transportation, commercial transportation, and emergency vehicles are affected as well. Portland, Oregon, and Seattle, Washington, both suffer from mild, but sometimes extreme, storms that affect the entire city. Taking a closer look at the number of crashes reported by the City of Portland and the City of Seattle, it is seen that there is an increase in percent of crashes with reported road conditions of snow and ice. Both cities appear to have nearly the same reported crash percentages. Recommendations in combating the issue of increased accidents and the disruption of the city itself include looking into communication between the climate research institution and city planners that could help with planning for better mitigation during storms, a street or gas tax, although an impact study is important to keep in mind to make sure no part of the population is at risk; and engineering revolutions such as Solar Roadways that could benefit all cities.

ContributorsHoots, Danielle (Author) / Crewe, Katherine (Thesis advisor) / Golub, Aaron (Committee member) / Brazel, Anthony (Committee member) / Arizona State University (Publisher)

Created2015

Robust experimental designs for fMRI with an uncertain design matrix

Description

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and…

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also on the experimental subject's probabilistic behavior during the experiment. The design matrix is thus uncertain at the design stage, making it diffcult to select good designs. By taking this uncertainty into account, a very efficient approach for obtaining high-quality fMRI designs is developed in this study. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that the proposed approach can outperform an existing method in terms of computing time, and the quality of the obtained designs.

ContributorsZhou, Lin (Author) / Kao, Ming-Hung (Thesis advisor) / Reiser, Mark R. (Committee member) / Stufken, John (Committee member) / Welfert, Bruno (Committee member) / Arizona State University (Publisher)

Created2014

Experimental designs for generalized linear models and functional magnetic resonance imaging

Description

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact…

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact designs for GLMs, that are robust against parameter, link and model

uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm

that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.

ContributorsTemkit, M'Hamed (Author) / Kao, Jason (Thesis advisor) / Reiser, Mark R. (Committee member) / Barber, Jarrett (Committee member) / Montgomery, Douglas C. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

Understanding open spaces in an arid city

Description

This doctoral dissertation research aims to develop a comprehensive definition of urban open spaces and to determine the extent of environmental, social and economic impacts of open spaces on cities and the people living there. The approach I take to define urban open space is to apply fuzzy set theory…

This doctoral dissertation research aims to develop a comprehensive definition of urban open spaces and to determine the extent of environmental, social and economic impacts of open spaces on cities and the people living there. The approach I take to define urban open space is to apply fuzzy set theory to conceptualize the physical characteristics of open spaces. In addition, a 'W-green index' is developed to quantify the scope of greenness in urban open spaces. Finally, I characterize the environmental impact of open spaces' greenness on the surface temperature, explore the social benefits through observing recreation and relaxation, and identify the relationship between housing price and open space be creating a hedonic model on nearby housing to quantify the economic impact. Fuzzy open space mapping helps to investigate the landscape characteristics of existing-recognized open spaces as well as other areas that can serve as open spaces. Research findings indicated that two fuzzy open space values are effective to the variability in different land-use types and between arid and humid cities. W-Green index quantifies the greenness for various types of open spaces. Most parks in Tempe, Arizona are grass-dominant with higher W-Green index, while natural landscapes are shrub-dominant with lower index. W-Green index has the advantage to explain vegetation composition and structural characteristics in open spaces. The outputs of comprehensive analyses show that the different qualities and types of open spaces, including size, greenness, equipment (facility), and surrounding areas, have different patterns in the reduction of surface temperature and the number of physical activities. The variance in housing prices through the distance to park was, however, not clear in this research. This dissertation project provides better insight into how to describe, plan, and prioritize the functions and types of urban open spaces need for sustainable living. This project builds a comprehensive framework for analyzing urban open spaces in an arid city. This dissertation helps expand the view for urban environment and play a key role in establishing a strategy and finding decision-makings.

ContributorsKim, Won Kyung (Author) / Wentz, Elizabeth (Thesis advisor) / Myint, Soe W (Thesis advisor) / Brazel, Anthony (Committee member) / Guhathakurta, Subhrajit (Committee member) / Arizona State University (Publisher)

Created2011