Search Content

Semiconductor yield modeling using generalized linear models

Description

Yield is a key process performance characteristic in the capital-intensive semiconductor fabrication process. In an industry where machines cost millions of dollars and cycle times are a number of months, predicting and optimizing yield are critical to process improvement, customer satisfaction, and financial success. Semiconductor yield modeling is…

Yield is a key process performance characteristic in the capital-intensive semiconductor fabrication process. In an industry where machines cost millions of dollars and cycle times are a number of months, predicting and optimizing yield are critical to process improvement, customer satisfaction, and financial success. Semiconductor yield modeling is essential to identifying processing issues, improving quality, and meeting customer demand in the industry. However, the complicated fabrication process, the massive amount of data collected, and the number of models available make yield modeling a complex and challenging task. This work presents modeling strategies to forecast yield using generalized linear models (GLMs) based on defect metrology data. The research is divided into three main parts. First, the data integration and aggregation necessary for model building are described, and GLMs are constructed for yield forecasting. This technique yields results at both the die and the wafer levels, outperforms existing models found in the literature based on prediction errors, and identifies significant factors that can drive process improvement. This method also allows the nested structure of the process to be considered in the model, improving predictive capabilities and violating fewer assumptions. To account for the random sampling typically used in fabrication, the work is extended by using generalized linear mixed models (GLMMs) and a larger dataset to show the differences between batch-specific and population-averaged models in this application and how they compare to GLMs. These results show some additional improvements in forecasting abilities under certain conditions and show the differences between the significant effects identified in the GLM and GLMM models. The effects of link functions and sample size are also examined at the die and wafer levels. The third part of this research describes a methodology for integrating classification and regression trees (CART) with GLMs. This technique uses the terminal nodes identified in the classification tree to add predictors to a GLM. This method enables the model to consider important interaction terms in a simpler way than with the GLM alone, and provides valuable insight into the fabrication process through the combination of the tree structure and the statistical analysis of the GLM.

ContributorsKrueger, Dana Cheree (Author) / Montgomery, Douglas C. (Thesis advisor) / Fowler, John (Committee member) / Pan, Rong (Committee member) / Pfund, Michele (Committee member) / Arizona State University (Publisher)

Created2011

Multivariate charts for multivariate poisson-distributed data

Description

There has been much research involving simultaneous monitoring of several correlated quality characteristics that rely on the assumptions of multivariate normality and independence. In real world applications, these assumptions are not always met, particularly when small counts are of interest. In general, the use of normal approximation to the Poisson…

There has been much research involving simultaneous monitoring of several correlated quality characteristics that rely on the assumptions of multivariate normality and independence. In real world applications, these assumptions are not always met, particularly when small counts are of interest. In general, the use of normal approximation to the Poisson distribution seems to be justified when the Poisson means are large enough. A new two-sided Multivariate Poisson Exponentially Weighted Moving Average (MPEWMA) control chart is proposed, and the control limits are directly derived from the multivariate Poisson distribution. The MPEWMA and the conventional Multivariate Exponentially Weighted Moving Average (MEWMA) charts are evaluated by using the multivariate Poisson framework. The MPEWMA chart outperforms the MEWMA with the normal-theory limits in terms of the in-control average run lengths. An extension study of the two-sided MPEWMA to a one-sided version is performed; this is useful for detecting an increase in the count means. The results of comparison with the one-sided MEWMA chart are quite similar to the two-sided case. The implementation of the MPEWMA scheme for multiple count data is illustrated, with step by step guidelines and several examples. In addition, the method is compared to other model-based control charts that are used to monitor the residual values such as the regression adjustment. The MPEWMA scheme shows better performance on detecting the mean shift in count data when positive correlation exists among all variables.

ContributorsLaungrungrong, Busaba (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Thesis advisor) / Fowler, John (Committee member) / Young, Dennis (Committee member) / Arizona State University (Publisher)

Created2010

Profile monitoring-- control chart schemes for monitoring linear and low order polynomial profiles

Description

The emergence of new technologies as well as a fresh look at analyzing existing processes have given rise to a new type of response characteristic, known as a profile. Profiles are useful when a quality variable is functionally dependent on one or more explanatory, or independent, variables. So, instead of…

The emergence of new technologies as well as a fresh look at analyzing existing processes have given rise to a new type of response characteristic, known as a profile. Profiles are useful when a quality variable is functionally dependent on one or more explanatory, or independent, variables. So, instead of observing a single measurement on each unit or product a set of values is obtained over a range which, when plotted, takes the shape of a curve. Traditional multivariate monitoring schemes are inadequate for monitoring profiles due to high dimensionality and poor use of the information stored in functional form leading to very large variance-covariance matrices. Profile monitoring has become an important area of study in statistical process control and is being actively addressed by researchers across the globe. This research explores the understanding of the area in three parts. A comparative analysis is conducted of two linear profile-monitoring techniques based on probability of false alarm rate and average run length (ARL) under shifts in the model parameters. The two techniques studied are control chart based on classical calibration statistic and a control chart based on the parameters of a linear model. The research demonstrates that a profile characterized by a parametric model is more efficient monitoring scheme than one based on monitoring only the individual features of the profile. A likelihood ratio based changepoint control chart is proposed for detecting a sustained step shift in low order polynomial profiles. The test statistic is plotted on a Shewhart like chart with control limits derived from asymptotic distribution theory. The statistic is factored to reflect the variation due to the parameters in to aid in interpreting an out of control signal. The research also looks at the robust parameter design study of profiles, also referred to as signal response systems. Such experiments are often necessary for understanding and reducing the common cause variation in systems. A split-plot approach is proposed to analyze the profiles. It is demonstrated that an explicit modeling of variance components using generalized linear mixed models approach has more precise point estimates and tighter confidence intervals.

ContributorsGupta, Shilpa (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie M. (Thesis advisor) / Fowler, John (Committee member) / Prewitt, Kathy (Committee member) / Kulahci, Murat (Committee member) / Arizona State University (Publisher)

Created2010

Mixture-process variable design experiments with control and noise variables within a split-plot structure

Description

In mixture-process variable experiments, it is common that the number of runs is greater than in mixture-only or process-variable experiments. These experiments have to estimate the parameters from the mixture components, process variables, and interactions of both variables. In some of these experiments there are variables that are hard to…

In mixture-process variable experiments, it is common that the number of runs is greater than in mixture-only or process-variable experiments. These experiments have to estimate the parameters from the mixture components, process variables, and interactions of both variables. In some of these experiments there are variables that are hard to change or cannot be controlled under normal operating conditions. These situations often prohibit a complete randomization for the experimental runs due to practical and economical considerations. Furthermore, the process variables can be categorized into two types: variables that are controllable and directly affect the response, and variables that are uncontrollable and primarily affect the variability of the response. These uncontrollable variables are called noise factors and assumed controllable in a laboratory environment for the purpose of conducting experiments. The model containing both noise variables and control factors can be used to determine factor settings for the control factor that makes the response "robust" to the variability transmitted from the noise factors. These types of experiments can be analyzed in a model for the mean response and a model for the slope of the response within a split-plot structure. When considering the experimental designs, low prediction variances for the mean and slope model are desirable. The methods for the mixture-process variable designs with noise variables considering a restricted randomization are demonstrated and some mixture-process variable designs that are robust to the coefficients of interaction with noise variables are evaluated using fraction design space plots with the respect to the prediction variance properties. Finally, the G-optimal design that minimizes the maximum prediction variance over the entire design region is created using a genetic algorithm.

ContributorsCho, Tae Yeon (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie M. (Thesis advisor) / Shunk, Dan L. (Committee member) / Gel, Esma S (Committee member) / Kulahci, Murat (Committee member) / Arizona State University (Publisher)

Created2010

Optimization of surgery delivery systems

Description

Optimization of surgical operations is a challenging managerial problem for surgical suite directors. This dissertation presents modeling and solution techniques for operating room (OR) planning and scheduling problems. First, several sequencing and patient appointment time setting heuristics are proposed for scheduling an Outpatient Procedure Center. A discrete event simulation model…

Optimization of surgical operations is a challenging managerial problem for surgical suite directors. This dissertation presents modeling and solution techniques for operating room (OR) planning and scheduling problems. First, several sequencing and patient appointment time setting heuristics are proposed for scheduling an Outpatient Procedure Center. A discrete event simulation model is used to evaluate how scheduling heuristics perform with respect to the competing criteria of expected patient waiting time and expected surgical suite overtime for a single day compared to current practice. Next, a bi-criteria Genetic Algorithm is used to determine if better solutions can be obtained for this single day scheduling problem. The efficacy of the bi-criteria Genetic Algorithm, when surgeries are allowed to be moved to other days, is investigated. Numerical experiments based on real data from a large health care provider are presented. The analysis provides insight into the best scheduling heuristics, and the tradeoff between patient and health care provider based criteria. Second, a multi-stage stochastic mixed integer programming formulation for the allocation of surgeries to ORs over a finite planning horizon is studied. The demand for surgery and surgical duration are random variables. The objective is to minimize two competing criteria: expected surgery cancellations and OR overtime. A decomposition method, Progressive Hedging, is implemented to find near optimal surgery plans. Finally, properties of the model are discussed and methods are proposed to improve the performance of the algorithm based on the special structure of the model. It is found simple rules can improve schedules used in practice. Sequencing surgeries from the longest to shortest mean duration causes high expected overtime, and should be avoided, while sequencing from the shortest to longest mean duration performed quite well in our experiments. Expending greater computational effort with more sophisticated optimization methods does not lead to substantial improvements. However, controlling daily procedure mix may achieve substantial improvements in performance. A novel stochastic programming model for a dynamic surgery planning problem is proposed in the dissertation. The efficacy of the progressive hedging algorithm is investigated. It is found there is a significant correlation between the performance of the algorithm and type and number of scenario bundles in a problem instance. The computational time spent to solve scenario subproblems is among the most significant factors that impact the performance of the algorithm. The quality of the solutions can be improved by detecting and preventing cyclical behaviors.

ContributorsGul, Serhat (Author) / Fowler, John W. (Thesis advisor) / Denton, Brian T. (Thesis advisor) / Wu, Teresa (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2010

Public health surveillance in high-dimensions with supervised learning

Description

Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change…

Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods are typically limited to spatial and/or temporal covariate information and often fail to use all the information available in modern data that can be paramount in unveiling these subtle changes. Additional complexities associated with modern health data that are often not accounted for by traditional methods include: covariates of mixed type, missing values, and high-order interactions among covariates. This work proposes a transform of public health surveillance to supervised learning, so that an appropriate learner can inherently address all the complexities described previously. At the same time, quantitative measures from the learner can be used to define signal criteria to detect changes in rates of events. A Feature Selection (FS) method is used to identify covariates that contribute to a model and to generate a signal. A measure of statistical significance is included to control false alarms. An alternative Percentile method identifies the specific cases that lead to changes using class probability estimates from tree-based ensembles. This second method is intended to be less computationally intensive and significantly simpler to implement. Finally, a third method labeled Rule-Based Feature Value Selection (RBFVS) is proposed for identifying the specific regions in high-dimensional space where the changes are occurring. Results on simulated examples are used to compare the FS method and the Percentile method. Note this work emphasizes the application of the proposed methods on public health surveillance. Nonetheless, these methods can easily be extended to a variety of applications where counts (or rates) of events are monitored for changes. Such problems commonly occur in domains such as manufacturing, economics, environmental systems, engineering, as well as in public health.

ContributorsDavila, Saylisse (Author) / Runger, George C. (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Young, Dennis (Committee member) / Gel, Esma (Committee member) / Arizona State University (Publisher)

Created2010

Locating counting sensors in traffic network to estimate origin-destination volumes

Description

Improving the quality of Origin-Destination (OD) demand estimates increases the effectiveness of design, evaluation and implementation of traffic planning and management systems. The associated bilevel Sensor Location Flow-Estimation problem considers two important research questions: (1) how to compute the best estimates of the flows of interest by using anticipated data…

Improving the quality of Origin-Destination (OD) demand estimates increases the effectiveness of design, evaluation and implementation of traffic planning and management systems. The associated bilevel Sensor Location Flow-Estimation problem considers two important research questions: (1) how to compute the best estimates of the flows of interest by using anticipated data from given candidate sensors location; and (2) how to decide on the optimum subset of links where sensors should be located. In this dissertation, a decision framework is developed to optimally locate and obtain high quality OD volume estimates in vehicular traffic networks. The framework includes a traffic assignment model to load the OD traffic volumes on routes in a known choice set, a sensor location model to decide on which subset of links to locate counting sensors to observe traffic volumes, and an estimation model to obtain best estimates of OD or route flow volumes. The dissertation first addresses the deterministic route flow estimation problem given apriori knowledge of route flows and their uncertainties. Two procedures are developed to locate "perfect" and "noisy" sensors respectively. Next, it addresses a stochastic route flow estimation problem. A hierarchical linear Bayesian model is developed, where the real route flows are assumed to be generated from a Multivariate Normal distribution with two parameters: "mean" and "variance-covariance matrix". The prior knowledge for the "mean" parameter is described by a probability distribution. When assuming the "variance-covariance matrix" parameter is known, a Bayesian A-optimal design is developed. When the "variance-covariance matrix" parameter is unknown, Markov Chain Monte Carlo approach is used to estimate the aposteriori quantities. In all the sensor location model the objective is the maximization of the reduction in the variances of the distribution of the estimates of the OD volume. Developed models are compared with other available models in the literature. The comparison showed that the models developed performed better than available models.

ContributorsWang, Ning (Author) / Mirchandani, Pitu (Thesis advisor) / Murray, Alan (Committee member) / Pendyala, Ram (Committee member) / Runger, George C. (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2013

Locating Arrays: Construction, Analysis, and Robustness

Description

Modern computer systems are complex engineered systems involving a large collection of individual parts, each with many parameters, or factors, affecting system performance. One way to understand these complex systems and their performance is through experimentation. However, most modern computer systems involve such a large number of factors that thorough…

Modern computer systems are complex engineered systems involving a large collection of individual parts, each with many parameters, or factors, affecting system performance. One way to understand these complex systems and their performance is through experimentation. However, most modern computer systems involve such a large number of factors that thorough experimentation on all of them is impossible. An initial screening step is thus necessary to determine which factors are relevant to the system's performance and which factors can be eliminated from experimentation.

Factors may impact system performance in different ways. A factor at a specific level may significantly affect performance as a main effect, or in combination with other main effects as an interaction. For screening, it is necessary both to identify the presence of these effects and to locate the factors responsible for them. A locating array is a relatively new experimental design that causes every main effect and interaction to occur and distinguishes all sets of d main effects and interactions from each other in the tests where they occur. This design is therefore helpful in screening complex systems.

The process of screening using locating arrays involves multiple steps. First, a locating array is constructed for all possibly significant factors. Next, the system is executed for all tests indicated by the locating array and a response is observed. Finally, the response is analyzed to identify the significant system factors for future experimentation. However, simply constructing a reasonably sized locating array for a large system is no easy task and analyzing the response of the tests presents additional difficulties due to the large number of possible predictors and the inherent imbalance in the experimental design itself. Further complications can arise from noise in the system or errors in testing.

This thesis has three contributions. First, it provides an algorithm to construct locating arrays using the Lovász Local Lemma with Moser-Tardos resampling. Second, it gives an algorithm to analyze the system response efficiently. Finally, it studies the robustness of the analysis to the heavy-hitters assumption underlying the approach as well as to varying amounts of system noise.

ContributorsSeidel, Stephen (Author) / Syrotiuk, Violet R. (Thesis advisor) / Colbourn, Charles J (Committee member) / Montgomery, Douglas C. (Committee member) / Arizona State University (Publisher)

Created2018

Data Fusion and Systems Engineering Approaches for Quality and Performance Improvement of Health Care Systems: From Diagnosis to Care to System-level Decision-making

Description

Technology advancements in diagnostic imaging, smart sensing, and health information systems have resulted in a data-rich environment in health care, which offers a great opportunity for Precision Medicine. The objective of my research is to develop data fusion and system informatics approaches for quality and performance improvement of health care.…

Technology advancements in diagnostic imaging, smart sensing, and health information systems have resulted in a data-rich environment in health care, which offers a great opportunity for Precision Medicine. The objective of my research is to develop data fusion and system informatics approaches for quality and performance improvement of health care. In my dissertation, I focus on three emerging problems in health care and develop novel statistical models and machine learning algorithms to tackle these problems from diagnosis to care to system-level decision-making.

The first topic is diagnosis/subtyping of migraine to customize effective treatment to different subtypes of patients. Existing clinical definitions of subtypes use somewhat arbitrary boundaries primarily based on patient self-reported symptoms, which are subjective and error-prone. My research develops a novel Multimodality Factor Mixture Model that discovers subtypes of migraine from multimodality imaging MRI data, which provides complementary accurate measurements of the disease. Patients in the different subtypes show significantly different clinical characteristics of the disease. Treatment tailored and optimized for patients of the same subtype paves the road toward Precision Medicine.

The second topic focuses on coordinated patient care. Care coordination between nurses and with other health care team members is important for providing high-quality and efficient care to patients. The recently developed Nurse Care Coordination Instrument (NCCI) is the first of its kind that enables large-scale quantitative data to be collected. My research develops a novel Multi-response Multi-level Model (M3) that enables transfer learning in NCCI data fusion. M3 identifies key factors that contribute to improving care coordination, and facilitates the design and optimization of nurses’ training, workload assignment, and practice environment, which leads to improved patient outcomes.

The last topic is about system-level decision-making for Alzheimer’s disease early detection at the early stage of Mild Cognitive Impairment (MCI), by predicting each MCI patient’s risk of converting to AD using imaging and proteomic biomarkers. My research proposes a systems engineering approach that integrates the multi-perspectives, including prediction accuracy, biomarker cost/availability, patient heterogeneity and diagnostic efficiency, and allows for system-wide optimized decision regarding the biomarker testing process for prediction of MCI conversion.

ContributorsSi, Bing (Author) / Li, Jing (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Schwedt, Todd (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2018

Interaction Testing, Fault Location, and Anonymous Attribute-Based Authorization

Description

This dissertation studies three classes of combinatorial arrays with practical applications in testing, measurement, and security. Covering arrays are widely studied in software and hardware testing to indicate the presence of faulty interactions. Locating arrays extend covering arrays to achieve identification of the interactions causing a fault by requiring additional…

This dissertation studies three classes of combinatorial arrays with practical applications in testing, measurement, and security. Covering arrays are widely studied in software and hardware testing to indicate the presence of faulty interactions. Locating arrays extend covering arrays to achieve identification of the interactions causing a fault by requiring additional conditions on how interactions are covered in rows. This dissertation introduces a new class, the anonymizing arrays, to guarantee a degree of anonymity by bounding the probability a particular row is identified by the interaction presented. Similarities among these arrays lead to common algorithmic techniques for their construction which this dissertation explores. Differences arising from their application domains lead to the unique features of each class, requiring tailoring the techniques to the specifics of each problem.

One contribution of this work is a conditional expectation algorithm to build covering arrays via an intermediate combinatorial object. Conditional expectation efficiently finds intermediate-sized arrays that are particularly useful as ingredients for additional recursive algorithms. A cut-and-paste method creates large arrays from small ingredients. Performing transformations on the copies makes further improvements by reducing redundancy in the composed arrays and leads to fewer rows.

This work contains the first algorithm for constructing locating arrays for general values of $d$ and $t$. A randomized computational search algorithmic framework verifies if a candidate array is $(\bar{d},t)$-locating by partitioning the search space and performs random resampling if a candidate fails. Algorithmic parameters determine which columns to resample and when to add additional rows to the candidate array. Additionally, analysis is conducted on the performance of the algorithmic parameters to provide guidance on how to tune parameters to prioritize speed, accuracy, or a combination of both.

This work proposes anonymizing arrays as a class related to covering arrays with a higher coverage requirement and constraints. The algorithms for covering and locating arrays are tailored to anonymizing array construction. An additional property, homogeneity, is introduced to meet the needs of attribute-based authorization. Two metrics, local and global homogeneity, are designed to compare anonymizing arrays with the same parameters. Finally, a post-optimization approach reduces the homogeneity of an anonymizing array.

ContributorsLanus, Erin (Author) / Colbourn, Charles J (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Montgomery, Douglas C. (Committee member) / Syrotiuk, Violet R. (Committee member) / Arizona State University (Publisher)

Created2019