Search Content

Production scheduling and system configuration for capacitated flow lines with application in the semiconductor backend process

Description

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process…

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process is still very difficult due to the wide product mix, large number of parallel machines, product family related setups, machine-product qualification, and weekly demand consisting of thousands of lots. In this research, a novel mixed-integer-linear-programming (MILP) model is proposed for the batch production scheduling of a semiconductor back-end facility. In the MILP formulation, the manufacturing process is modeled as a flexible flow line with bottleneck stages, unrelated parallel machines, product family related sequence-independent setups, and product-machine qualification considerations. However, this MILP formulation is difficult to solve for real size problem instances. In a semiconductor back-end facility, production scheduling usually needs to be done every day while considering updated demand forecast for a medium term planning horizon. Due to the limitation on the solvable size of the MILP model, a deterministic scheduling system (DSS), consisting of an optimizer and a scheduler, is proposed to provide sub-optimal solutions in a short time for real size problem instances. The optimizer generates a tentative production plan. Then the scheduler sequences each lot on each individual machine according to the tentative production plan and scheduling rules. Customized factory rules and additional resource constraints are included in the DSS, such as preventive maintenance schedule, setup crew availability, and carrier limitations. Small problem instances are randomly generated to compare the performances of the MILP model and the deterministic scheduling system. Then experimental design is applied to understand the behavior of the DSS and identify the best configuration of the DSS under different demand scenarios. Product-machine qualification decisions have long-term and significant impact on production scheduling. A robust product-machine qualification matrix is critical for meeting demand when demand quantity or mix varies. In the second part of this research, a stochastic mixed integer programming model is proposed to balance the tradeoff between current machine qualification costs and future backorder costs with uncertain demand. The L-shaped method and acceleration techniques are proposed to solve the stochastic model. Computational results are provided to compare the performance of different solution methods.

ContributorsFu, Mengying (Author) / Askin, Ronald G. (Thesis advisor) / Zhang, Muhong (Thesis advisor) / Fowler, John W (Committee member) / Pan, Rong (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Mass spectrometry: reverse process for synbody discovery & validation of peptide microarray data : a story of landing lights

Description

A synbody is a newly developed protein binding peptide which can be rapidly produced by chemical methods. The advantages of the synbody producing process make it a potential human proteome binding reagent. Most of the synbodies are designed to bind to specific proteins. The peptides incorporated in a synbody are…

A synbody is a newly developed protein binding peptide which can be rapidly produced by chemical methods. The advantages of the synbody producing process make it a potential human proteome binding reagent. Most of the synbodies are designed to bind to specific proteins. The peptides incorporated in a synbody are discovered with peptide microarray technology. Nevertheless, the targets for unknown synbodies can also be discovered by searching through a protein mixture. The first part of this thesis mainly focuses on the process of target searching, which was performed with immunoprecipitation assays and mass spectrometry analysis. Proteins are pulled down from the cell lysate by certain synbodies, and then these proteins are identified using mass spectrometry. After excluding non-specific bindings, the interaction between a synbody and its real target(s) can be verified with affinity measurements. As a specific example, the binding between 1-4-KCap synbody and actin was discovered. This result proved the feasibility of the mass spectrometry based method and also suggested that a high throughput synbody discovery platform for the human proteome could be developed. Besides the application of synbody development, the peptide microarray technology can also be used for immunosignatures. The composition of all types of antibodies existing in one's blood is related to an individual's health condition. A method, called immunosignaturing, has been developed for early disease diagnosis based on this principle. CIM10K microarray slides work as a platform for blood antibody detection in immunosignaturing. During the analysis of an immunosignature, the data from these slides needs to be validated by using landing light peptides. The second part of this thesis focuses on the validation of the data. A biotinylated peptide was used as a landing light on the new CIM10K slides. The data was collected in several rounds of tests and indicated that the variation among landing lights was significantly reduced by using the newly prepared biotinylated peptide compared with old peptide mixture. Several suggestions for further landing light improvement are proposed based on the results.

ContributorsSun, Minyao (Author) / Johnston, Stephen Albert (Thesis advisor) / Diehnelt, Chris Wayne (Committee member) / Stafford, Phillip (Committee member) / Arizona State University (Publisher)

Created2011

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

A study of evaluation methods centered on reliability for renewal of aging hydropower plants

Description

Hydropower generation is one of the clean renewable energies which has received great attention in the power industry. Hydropower has been the leading source of renewable energy. It provides more than 86% of all electricity generated by renewable sources worldwide. Generally, the life span of a hydropower plant is considered…

Hydropower generation is one of the clean renewable energies which has received great attention in the power industry. Hydropower has been the leading source of renewable energy. It provides more than 86% of all electricity generated by renewable sources worldwide. Generally, the life span of a hydropower plant is considered as 30 to 50 years. Power plants over 30 years old usually conduct a feasibility study of rehabilitation on their entire facilities including infrastructure. By age 35, the forced outage rate increases by 10 percentage points compared to the previous year. Much longer outages occur in power plants older than 20 years. Consequently, the forced outage rate increases exponentially due to these longer outages. Although these long forced outages are not frequent, their impact is immense. If reasonable timing of rehabilitation is missed, an abrupt long-term outage could occur and additional unnecessary repairs and inefficiencies would follow. On the contrary, too early replacement might cause the waste of revenue. The hydropower plants of Korea Water Resources Corporation (hereafter K-water) are utilized for this study. Twenty-four K-water generators comprise the population for quantifying the reliability of each equipment. A facility in a hydropower plant is a repairable system because most failures can be fixed without replacing the entire facility. The fault data of each power plant are collected, within which only forced outage faults are considered as raw data for reliability analyses. The mean cumulative repair functions (MCF) of each facility are determined with the failure data tables, using Nelson's graph method. The power law model, a popular model for a repairable system, can also be obtained to represent representative equipment and system availability. The criterion-based analysis of HydroAmp is used to provide more accurate reliability of each power plant. Two case studies are presented to enhance the understanding of the availability of each power plant and represent economic evaluations for modernization. Also, equipment in a hydropower plant is categorized into two groups based on their reliability for determining modernization timing and their suitable replacement periods are obtained using simulation.

ContributorsKwon, Ogeuk (Author) / Holbert, Keith E. (Thesis advisor) / Heydt, Gerald T (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2011

Characterization and analysis of a novel platform for profiling the antibody response

Description

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze…

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze the factors affecting the binding patterns using monoclonal antibodies and determine how much information may be extracted from the sequences. Specifically, I examined the effects of antibody concentration, competition, peptide density, and antibody valence. Peptide binding could be detected at the low concentrations relevant to immunosignaturing, and a monoclonal's signature could even be detected in the presences of 100 fold excess naive IgG. I also found that peptide density was important, but this effect was not due to bivalent binding. Next, I examined in more detail how a polyreactive antibody binds to the random sequence peptides compared to protein sequence derived peptides, and found that it bound to many peptides from both sets, but with low apparent affinity. An in depth look at how the peptide physicochemical properties and sequence complexity revealed that there were some correlations with properties, but they were generally small and varied greatly between antibodies. However, on a limited diversity but larger peptide library, I found that sequence complexity was important for antibody binding. The redundancy on that library did enable the identification of specific sub-sequences recognized by an antibody. The current immunosignaturing platform has little repetition of sub-sequences, so I evaluated several methods to infer antibody epitopes. I found two methods that had modest prediction accuracy, and I developed a software application called GuiTope to facilitate the epitope prediction analysis. None of the methods had sufficient accuracy to identify an unknown antigen from a database. In conclusion, the characteristics of the immunosignaturing platform observed through monoclonal antibody experiments demonstrate its promise as a new diagnostic technology. However, a major limitation is the difficulty in connecting the signature back to the original antigen, though larger peptide libraries could facilitate these predictions.

ContributorsHalperin, Rebecca (Author) / Johnston, Stephen A. (Thesis advisor) / Bordner, Andrew (Committee member) / Taylor, Thomas (Committee member) / Stafford, Phillip (Committee member) / Arizona State University (Publisher)

Created2011

Use of random peptide reactivities to analyze host immune responses of African swine fever virus infection and immunization

Description

African Swine Fever (ASF), endemic in many African countries, is now spreading to other continents. Though ASF is capable of incurring serious economic losses in affected countries, no vaccine exists to provide immunity to animals. Disease control relies largely on rapid diagnosis and the implementation of movement restrictions and strict…

African Swine Fever (ASF), endemic in many African countries, is now spreading to other continents. Though ASF is capable of incurring serious economic losses in affected countries, no vaccine exists to provide immunity to animals. Disease control relies largely on rapid diagnosis and the implementation of movement restrictions and strict eradication programs. Developing a scalable, accurate and low cost diagnostic for ASF will be of great help for the current situation. CIM's 10K random peptide microarray is a new high-throughput platform that allows systematic investigations of immune responses associated with disease and shows promise as a diagnostic tool. In this study, this new technology was applied to characterize the immune responses of ASF virus (ASFV) infections and immunizations. Six sets of sera from ASFV antigen immunized pigs, 6 sera from infected pigs and 20 sera samples from unexposed pigs were tested and analyzed statistically. Results show that both ASFV antigen immunized pigs and ASFV viral infected pigs can be distinguished from unexposed pigs. Since it appears that immune responses to other viral infections are also distinguishable on this platform, it holds the potential of being useful in developing a new ASF diagnostic. The ability of this platform to identify specific ASFV antibody epitopes was also explored. A subtle motif was found to be shared among a set of peptides displaying the highest reactivity for an antigen specific antibody. However, this motif does not seem to match with any antibody epitopes predicted by a linear antibody epitope prediction.

ContributorsXiao, Liang (Author) / Sykes, Kathryn (Thesis advisor) / Zhao, Zhan-Gong (Committee member) / Stafford, Phillip (Committee member) / Arizona State University (Publisher)

Created2011

Optimal experimental design for accelerated life testing and design evaluation

Description

Nowadays product reliability becomes the top concern of the manufacturers and customers always prefer the products with good performances under long period. In order to estimate the lifetime of the product, accelerated life testing (ALT) is introduced because most of the products can last years even decades. Much research has…

Nowadays product reliability becomes the top concern of the manufacturers and customers always prefer the products with good performances under long period. In order to estimate the lifetime of the product, accelerated life testing (ALT) is introduced because most of the products can last years even decades. Much research has been done in the ALT area and optimal design for ALT is a major topic. This dissertation consists of three main studies. First, a methodology of finding optimal design for ALT with right censoring and interval censoring have been developed and it employs the proportional hazard (PH) model and generalized linear model (GLM) to simplify the computational process. A sensitivity study is also given to show the effects brought by parameters to the designs. Second, an extended version of I-optimal design for ALT is discussed and then a dual-objective design criterion is defined and showed with several examples. Also in order to evaluate different candidate designs, several graphical tools are developed. Finally, when there are more than one models available, different model checking designs are discussed.

ContributorsYang, Tao (Author) / Pan, Rong (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Borror, Connie (Committee member) / Rigdon, Steve (Committee member) / Arizona State University (Publisher)

Created2013

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Spatio-temporal data mining to detect changes and clusters in trajectories

Description

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic…

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.

ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2012

Non-linear variation patterns and kernel preimages

Description

Identifying important variation patterns is a key step to identifying root causes of process variability. This gives rise to a number of challenges. First, the variation patterns might be non-linear in the measured variables, while the existing research literature has focused on linear relationships. Second, it is important to remove…

Identifying important variation patterns is a key step to identifying root causes of process variability. This gives rise to a number of challenges. First, the variation patterns might be non-linear in the measured variables, while the existing research literature has focused on linear relationships. Second, it is important to remove noise from the dataset in order to visualize the true nature of the underlying patterns. Third, in addition to visualizing the pattern (preimage), it is also essential to understand the relevant features that define the process variation pattern. This dissertation considers these variation challenges. A base kernel principal component analysis (KPCA) algorithm transforms the measurements to a high-dimensional feature space where non-linear patterns in the original measurement can be handled through linear methods. However, the principal component subspace in feature space might not be well estimated (especially from noisy training data). An ensemble procedure is constructed where the final preimage is estimated as the average from bagged samples drawn from the original dataset to attenuate noise in kernel subspace estimation. This improves the robustness of any base KPCA algorithm. In a second method, successive iterations of denoising a convex combination of the training data and the corresponding denoised preimage are used to produce a more accurate estimate of the actual denoised preimage for noisy training data. The number of primary eigenvectors chosen in each iteration is also decreased at a constant rate. An efficient stopping rule criterion is used to reduce the number of iterations. A feature selection procedure for KPCA is constructed to find the set of relevant features from noisy training data. Data points are projected onto sparse random vectors. Pairs of such projections are then matched, and the differences in variation patterns within pairs are used to identify the relevant features. This approach provides robustness to irrelevant features by calculating the final variation pattern from an ensemble of feature subsets. Experiments are conducted using several simulated as well as real-life data sets. The proposed methods show significant improvement over the competitive methods.

ContributorsSahu, Anshuman (Author) / Runger, George C. (Thesis advisor) / Wu, Teresa (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2013