Search Content

Propensity score estimation with random forests

Description

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The…

Random Forests is a statistical learning method which has been proposed for propensity score estimation models that involve complex interactions, nonlinear relationships, or both of the covariates. In this dissertation I conducted a simulation study to examine the effects of three Random Forests model specifications in propensity score analysis. The results suggested that, depending on the nature of data, optimal specification of (1) decision rules to select the covariate and its split value in a Classification Tree, (2) the number of covariates randomly sampled for selection, and (3) methods of estimating Random Forests propensity scores could potentially produce an unbiased average treatment effect estimate after propensity scores weighting by the odds adjustment. Compared to the logistic regression estimation model using the true propensity score model, Random Forests had an additional advantage in producing unbiased estimated standard error and correct statistical inference of the average treatment effect. The relationship between the balance on the covariates' means and the bias of average treatment effect estimate was examined both within and between conditions of the simulation. Within conditions, across repeated samples there was no noticeable correlation between the covariates' mean differences and the magnitude of bias of average treatment effect estimate for the covariates that were imbalanced before adjustment. Between conditions, small mean differences of covariates after propensity score adjustment were not sensitive enough to identify the optimal Random Forests model specification for propensity score analysis.

ContributorsCham, Hei Ning (Author) / Tein, Jenn-Yun (Thesis advisor) / Enders, Stephen G (Thesis advisor) / Enders, Craig K. (Committee member) / Mackinnon, David P (Committee member) / Arizona State University (Publisher)

Created2013

Simulation-based Bayesian optimal accelerated life test design and model discrimination

Description

Accelerated life testing (ALT) is the process of subjecting a product to stress conditions (temperatures, voltage, pressure etc.) in excess of its normal operating levels to accelerate failures. Product failure typically results from multiple stresses acting on it simultaneously. Multi-stress factor ALTs are challenging as they increase the number of…

Accelerated life testing (ALT) is the process of subjecting a product to stress conditions (temperatures, voltage, pressure etc.) in excess of its normal operating levels to accelerate failures. Product failure typically results from multiple stresses acting on it simultaneously. Multi-stress factor ALTs are challenging as they increase the number of experiments due to the stress factor-level combinations resulting from the increased number of factors. Chapter 2 provides an approach for designing ALT plans with multiple stresses utilizing Latin hypercube designs that reduces the simulation cost without loss of statistical efficiency. A comparison to full grid and large-sample approximation methods illustrates the approach computational cost gain and flexibility in determining optimal stress settings with less assumptions and more intuitive unit allocations.

Implicit in the design criteria of current ALT designs is the assumption that the form of the acceleration model is correct. This is unrealistic assumption in many real-world problems. Chapter 3 provides an approach for ALT optimum design for model discrimination. We utilize the Hellinger distance measure between predictive distributions. The optimal ALT plan at three stress levels was determined and its performance was compared to good compromise plan, best traditional plan and well-known 4:2:1 compromise test plans. In the case of linear versus quadratic ALT models, the proposed method increased the test plan's ability to distinguish among competing models and provided better guidance as to which model is appropriate for the experiment.

Chapter 4 extends the approach of Chapter 3 to ALT sequential model discrimination. An initial experiment is conducted to provide maximum possible information with respect to model discrimination. The follow-on experiment is planned by leveraging the most current information to allow for Bayesian model comparison through posterior model probability ratios. Results showed that performance of plan is adversely impacted by the amount of censoring in the data, in the case of linear vs. quadratic model form at three levels of constant stress, sequential testing can improve model recovery rate by approximately 8% when data is complete, but no apparent advantage in adopting sequential testing was found in the case of right-censored data when censoring is in excess of a certain amount.

ContributorsNasir, Ehab (Author) / Pan, Rong (Thesis advisor) / Runger, George C. (Committee member) / Gel, Esma (Committee member) / Kao, Ming-Hung (Committee member) / Montgomery, Douglas C. (Committee member) / Arizona State University (Publisher)

Created2014

Analysis of no-confounding designs using the dantzig selector

Description

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation…

No-confounding designs (NC) in 16 runs for 6, 7, and 8 factors are non-regular fractional factorial designs that have been suggested as attractive alternatives to the regular minimum aberration resolution IV designs because they do not completely confound any two-factor interactions with each other. These designs allow for potential estimation of main effects and a few two-factor interactions without the need for follow-up experimentation. Analysis methods for non-regular designs is an area of ongoing research, because standard variable selection techniques such as stepwise regression may not always be the best approach. The current work investigates the use of the Dantzig selector for analyzing no-confounding designs. Through a series of examples it shows that this technique is very effective for identifying the set of active factors in no-confounding designs when there are three of four active main effects and up to two active two-factor interactions.

To evaluate the performance of Dantzig selector, a simulation study was conducted and the results based on the percentage of type II errors are analyzed. Also, another alternative for 6 factor NC design, called the Alternate No-confounding design in six factors is introduced in this study. The performance of this Alternate NC design in 6 factors is then evaluated by using Dantzig selector as an analysis method. Lastly, a section is dedicated to comparing the performance of NC-6 and Alternate NC-6 designs.

ContributorsKrishnamoorthy, Archana (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Thesis advisor) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2014

Optimal design of experiments for functional responses

Description

Functional or dynamic responses are prevalent in experiments in the fields of engineering, medicine, and the sciences, but proposals for optimal designs are still sparse for this type of response. Experiments with dynamic responses result in multiple responses taken over a spectrum variable, so the design matrix for a dynamic…

Functional or dynamic responses are prevalent in experiments in the fields of engineering, medicine, and the sciences, but proposals for optimal designs are still sparse for this type of response. Experiments with dynamic responses result in multiple responses taken over a spectrum variable, so the design matrix for a dynamic response have more complicated structures. In the literature, the optimal design problem for some functional responses has been solved using genetic algorithm (GA) and approximate design methods. The goal of this dissertation is to develop fast computer algorithms for calculating exact D-optimal designs.

First, we demonstrated how the traditional exchange methods could be improved to generate a computationally efficient algorithm for finding G-optimal designs. The proposed two-stage algorithm, which is called the cCEA, uses a clustering-based approach to restrict the set of possible candidates for PEA, and then improves the G-efficiency using CEA.

The second major contribution of this dissertation is the development of fast algorithms for constructing D-optimal designs that determine the optimal sequence of stimuli in fMRI studies. The update formula for the determinant of the information matrix was improved by exploiting the sparseness of the information matrix, leading to faster computation times. The proposed algorithm outperforms genetic algorithm with respect to computational efficiency and D-efficiency.

The third contribution is a study of optimal experimental designs for more general functional response models. First, the B-spline system is proposed to be used as the non-parametric smoother of response function and an algorithm is developed to determine D-optimal sampling points of a spectrum variable. Second, we proposed a two-step algorithm for finding the optimal design for both sampling points and experimental settings. In the first step, the matrix of experimental settings is held fixed while the algorithm optimizes the determinant of the information matrix for a mixed effects model to find the optimal sampling times. In the second step, the optimal sampling times obtained from the first step is held fixed while the algorithm iterates on the information matrix to find the optimal experimental settings. The designs constructed by this approach yield superior performance over other designs found in literature.

ContributorsSaleh, Moein (Author) / Pan, Rong (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Runger, George C. (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)

Created2015

Efficient formulations for next-generation choice-based network revenue management for airline implementation

Description

Revenue management is at the core of airline operations today; proprietary algorithms and heuristics are used to determine prices and availability of tickets on an almost-continuous basis. While initial developments in revenue management were motivated by industry practice, later developments overcoming fundamental omissions from earlier models show significant improvement, despite…

Revenue management is at the core of airline operations today; proprietary algorithms and heuristics are used to determine prices and availability of tickets on an almost-continuous basis. While initial developments in revenue management were motivated by industry practice, later developments overcoming fundamental omissions from earlier models show significant improvement, despite their focus on relatively esoteric aspects of the problem, and have limited potential for practical use due to computational requirements. This dissertation attempts to address various modeling and computational issues, introducing realistic choice-based demand revenue management models. In particular, this work introduces two optimization formulations alongside a choice-based demand modeling framework, improving on the methods that choice-based revenue management literature has created to date, by providing sensible models for airline implementation.

The first model offers an alternative formulation to the traditional choice-based revenue management problem presented in the literature, and provides substantial gains in expected revenue while limiting the problem’s computational complexity. Making assumptions on passenger demand, the Choice-based Mixed Integer Program (CMIP) provides a significantly more compact formulation when compared to other choice-based revenue management models, and consistently outperforms previous models.

Despite the prevalence of choice-based revenue management models in literature, the assumptions made on purchasing behavior inhibit researchers to create models that properly reflect passenger sensitivities to various ticket attributes, such as price, number of stops, and flexibility options. This dissertation introduces a general framework for airline choice-based demand modeling that takes into account various ticket attributes in addition to price, providing a framework for revenue management models to relate airline companies’ product design strategies to the practice of revenue management through decisions on ticket availability and price.

Finally, this dissertation introduces a mixed integer non-linear programming formulation for airline revenue management that accommodates the possibility of simultaneously setting prices and availabilities on a network. Traditional revenue management models primarily focus on availability, only, forcing secondary models to optimize prices. The Price-dynamic Choice-based Mixed Integer Program (PCMIP) eliminates this two-step process, aligning passenger purchase behavior with revenue management policies, and is shown to outperform previously developed models, providing a new frontier of research in airline revenue management.

ContributorsClough, Michael C (Author) / Gel, Esma (Thesis advisor) / Jacobs, Timothy (Thesis advisor) / Askin, Ronald (Committee member) / Montgomery, Douglas C. (Committee member) / Arizona State University (Publisher)

Created2016

Measurement systems analysis studies: a look at the partition of variation (POV) method

Description

The Partition of Variance (POV) method is a simplistic way to identify large sources of variation in manufacturing systems. This method identifies the variance by estimating the variance of the means (between variance) and the means of the variance (within variance). The project shows that the method correctly identifies the…

The Partition of Variance (POV) method is a simplistic way to identify large sources of variation in manufacturing systems. This method identifies the variance by estimating the variance of the means (between variance) and the means of the variance (within variance). The project shows that the method correctly identifies the variance source when compared to the ANOVA method. Although the variance estimators deteriorate when varying degrees of non-normality is introduced through simulation; however, the POV method is shown to be a more stable measure of variance in the aggregate. The POV method also provides non-negative, stable estimates for interaction when compared to the ANOVA method. The POV method is shown to be more stable, particularly in low sample size situations. Based on these findings, it is suggested that the POV is not a replacement for more complex analysis methods, but rather, a supplement to them. POV is ideal for preliminary analysis due to the ease of implementation, the simplicity of interpretation, and the lack of dependency on statistical analysis packages or statistical knowledge.

ContributorsLittle, David John (Author) / Borror, Connie (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Broatch, Jennifer (Committee member) / Arizona State University (Publisher)

Created2015

Filtering by

Propensity score estimation with random forests

Simulation-based Bayesian optimal accelerated life test design and model discrimination

Analysis of no-confounding designs using the dantzig selector

Optimal design of experiments for functional responses

Efficient formulations for next-generation choice-based network revenue management for airline implementation

Measurement systems analysis studies: a look at the partition of variation (POV) method