Search Content

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The…

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

ContributorsCham, Hei Ning (Author) / Tein, Jenn-Yun (Thesis advisor) / Enders, Stephen G (Thesis advisor) / Enders, Craig K. (Committee member) / Mackinnon, David P (Committee member) / Arizona State University (Publisher)

Created2013

Spatio-temporal data mining to detect changes and clusters in trajectories

Description

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic…

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.

ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2012

Experimental designs for generalized linear models and functional magnetic resonance imaging

Description

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact…

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing

exact designs for GLMs, that are robust against parameter, link and model

uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm

that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO.

ContributorsTemkit, M'Hamed (Author) / Kao, Jason (Thesis advisor) / Reiser, Mark R. (Committee member) / Barber, Jarrett (Committee member) / Montgomery, Douglas C. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

The application of Bayesian networks in system reliability

Description

In this paper, a literature review is presented on the application of Bayesian networks applied in system reliability analysis. It is shown that Bayesian networks have become a popular modeling framework for system reliability analysis due to the benefits that Bayesian networks have the capability and flexibility to model complex…

In this paper, a literature review is presented on the application of Bayesian networks applied in system reliability analysis. It is shown that Bayesian networks have become a popular modeling framework for system reliability analysis due to the benefits that Bayesian networks have the capability and flexibility to model complex systems, update the probability according to evidences and give a straightforward and compact graphical representation. Research on approaches for Bayesian network learning and inference are summarized. Two groups of models with multistate nodes were developed for scenarios from constant to continuous time to apply and contrast Bayesian networks with classical fault tree method. The expanded model discretized the continuous variables and provided failure related probability distribution over time.

ContributorsZhou, Duan (Author) / Pan, Rong (Thesis advisor) / McCarville, Daniel R. (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2014

Product design optimization under epistemic uncertainty

Description

This dissertation is to address product design optimization including reliability-based design optimization (RBDO) and robust design with epistemic uncertainty. It is divided into four major components as outlined below. Firstly, a comprehensive study of uncertainties is performed, in which sources of uncertainty are listed, categorized and the impacts are discussed.…

This dissertation is to address product design optimization including reliability-based design optimization (RBDO) and robust design with epistemic uncertainty. It is divided into four major components as outlined below. Firstly, a comprehensive study of uncertainties is performed, in which sources of uncertainty are listed, categorized and the impacts are discussed. Epistemic uncertainty is of interest, which is due to lack of knowledge and can be reduced by taking more observations. In particular, the strategies to address epistemic uncertainties due to implicit constraint function are discussed. Secondly, a sequential sampling strategy to improve RBDO under implicit constraint function is developed. In modern engineering design, an RBDO task is often performed by a computer simulation program, which can be treated as a black box, as its analytical function is implicit. An efficient sampling strategy on learning the probabilistic constraint function under the design optimization framework is presented. The method is a sequential experimentation around the approximate most probable point (MPP) at each step of optimization process. It is compared with the methods of MPP-based sampling, lifted surrogate function, and non-sequential random sampling. Thirdly, a particle splitting-based reliability analysis approach is developed in design optimization. In reliability analysis, traditional simulation methods such as Monte Carlo simulation may provide accurate results, but are often accompanied with high computational cost. To increase the efficiency, particle splitting is integrated into RBDO. It is an improvement of subset simulation with multiple particles to enhance the diversity and stability of simulation samples. This method is further extended to address problems with multiple probabilistic constraints and compared with the MPP-based methods. Finally, a reliability-based robust design optimization (RBRDO) framework is provided to integrate the consideration of design reliability and design robustness simultaneously. The quality loss objective in robust design, considered together with the production cost in RBDO, are used formulate a multi-objective optimization problem. With the epistemic uncertainty from implicit performance function, the sequential sampling strategy is extended to RBRDO, and a combined metamodel is proposed to tackle both controllable variables and uncontrollable variables. The solution is a Pareto frontier, compared with a single optimal solution in RBDO.

ContributorsZhuang, Xiaotian (Author) / Pan, Rong (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Zhang, Muhong (Committee member) / Du, Xiaoping (Committee member) / Arizona State University (Publisher)

Created2012

Reliability based design optimization of systems with dynamic failure probabilities of components

Description

This research is to address the design optimization of systems for a specified reliability level, considering the dynamic nature of component failure rates. In case of designing a mechanical system (especially a load-sharing system), the failure of one component will lead to increase in probability of failure of remaining components.…

This research is to address the design optimization of systems for a specified reliability level, considering the dynamic nature of component failure rates. In case of designing a mechanical system (especially a load-sharing system), the failure of one component will lead to increase in probability of failure of remaining components. Many engineering systems like aircrafts, automobiles, and construction bridges will experience this phenomenon.

In order to design these systems, the Reliability-Based Design Optimization framework using Sequential Optimization and Reliability Assessment (SORA) method is developed. The dynamic nature of component failure probability is considered in the system reliability model. The Stress-Strength Interference (SSI) theory is used to build the limit state functions of components and the First Order Reliability Method (FORM) lies at the heart of reliability assessment. Also, in situations where the user needs to determine the optimum number of components and reduce component redundancy, this method can be used to optimally allocate the required number of components to carry the system load. The main advantage of this method is that the computational efficiency is high and also any optimization and reliability assessment technique can be incorporated. Different cases of numerical examples are provided to validate the methodology.

ContributorsBala Subramaniyan, Arun (Author) / Pan, Rong (Thesis advisor) / Askin, Ronald (Committee member) / Ju, Feng (Committee member) / Arizona State University (Publisher)

Created2016

A Bayesian network approach to early reliability assessment of complex systems

Description

Bayesian networks are powerful tools in system reliability assessment due to their flexibility in modeling the reliability structure of complex systems. This dissertation develops Bayesian network models for system reliability analysis through the use of Bayesian inference techniques.

Bayesian networks generalize fault trees by allowing components and subsystems to be related…

Bayesian networks are powerful tools in system reliability assessment due to their flexibility in modeling the reliability structure of complex systems. This dissertation develops Bayesian network models for system reliability analysis through the use of Bayesian inference techniques.

Bayesian networks generalize fault trees by allowing components and subsystems to be related by conditional probabilities instead of deterministic relationships; thus, they provide analytical advantages to the situation when the failure structure is not well understood, especially during the product design stage. In order to tackle this problem, one needs to utilize auxiliary information such as the reliability information from similar products and domain expertise. For this purpose, a Bayesian network approach is proposed to incorporate data from functional analysis and parent products. The functions with low reliability and their impact on other functions in the network are identified, so that design changes can be suggested for system reliability improvement.

A complex system does not necessarily have all components being monitored at the same time, causing another challenge in the reliability assessment problem. Sometimes there are a limited number of sensors deployed in the system to monitor the states of some components or subsystems, but not all of them. Data simultaneously collected from multiple sensors on the same system are analyzed using a Bayesian network approach, and the conditional probabilities of the network are estimated by combining failure information and expert opinions at both system and component levels. Several data scenarios with discrete, continuous and hybrid data (both discrete and continuous data) are analyzed. Posterior distributions of the reliability parameters of the system and components are assessed using simultaneous data.

Finally, a Bayesian framework is proposed to incorporate different sources of prior information and reconcile these different sources, including expert opinions and component information, in order to form a prior distribution for the system. Incorporating expert opinion in the form of pseudo-observations substantially simplifies statistical modeling, as opposed to the pooling techniques and supra Bayesian methods used for combining prior distributions in the literature.

The methods proposed are demonstrated with several case studies.

ContributorsYontay, Petek (Author) / Pan, Rong (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Shunk, Dan L. (Committee member) / Du, Xiaoping (Committee member) / Arizona State University (Publisher)

Created2016

Maximin designs for event-related fMRI with uncertain error correlation

Description

One of the premier technologies for studying human brain functions is the event-related functional magnetic resonance imaging (fMRI). The main design issue for such experiments is to find the optimal sequence for mental stimuli. This optimal design sequence allows for collecting informative data to make precise statistical inferences about the…

One of the premier technologies for studying human brain functions is the event-related functional magnetic resonance imaging (fMRI). The main design issue for such experiments is to find the optimal sequence for mental stimuli. This optimal design sequence allows for collecting informative data to make precise statistical inferences about the inner workings of the brain. Unfortunately, this is not an easy task, especially when the error correlation of the response is unknown at the design stage. In the literature, the maximin approach was proposed to tackle this problem. However, this is an expensive and time-consuming method, especially when the correlated noise follows high-order autoregressive models. The main focus of this dissertation is to develop an efficient approach to reduce the amount of the computational resources needed to obtain A-optimal designs for event-related fMRI experiments. One proposed idea is to combine the Kriging approximation method, which is widely used in spatial statistics and computer experiments with a knowledge-based genetic algorithm. Through case studies, a demonstration is made to show that the new search method achieves similar design efficiencies as those attained by the traditional method, but the new method gives a significant reduction in computing time. Another useful strategy is also proposed to find such designs by considering only the boundary points of the parameter space of the correlation parameters. The usefulness of this strategy is also demonstrated via case studies. The first part of this dissertation focuses on finding optimal event-related designs for fMRI with simple trials when each stimulus consists of only one component (e.g., a picture). The study is then extended to the case of compound trials when stimuli of multiple components (e.g., a cue followed by a picture) are considered.

ContributorsAlrumayh, Amani (Author) / Kao, Ming-Hung (Thesis advisor) / Stufken, John (Committee member) / Reiser, Mark R. (Committee member) / Pan, Rong (Committee member) / Cheng, Dan (Committee member) / Arizona State University (Publisher)

Created2019

Filtering by