Matching Items (42)
Filtering by

Clear all filters

149754-Thumbnail Image.png
Description
A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process is still very difficult due to the wide product mix, large number of parallel machines, product family related setups, machine-product qualification, and weekly demand consisting of thousands of lots. In this research, a novel mixed-integer-linear-programming (MILP) model is proposed for the batch production scheduling of a semiconductor back-end facility. In the MILP formulation, the manufacturing process is modeled as a flexible flow line with bottleneck stages, unrelated parallel machines, product family related sequence-independent setups, and product-machine qualification considerations. However, this MILP formulation is difficult to solve for real size problem instances. In a semiconductor back-end facility, production scheduling usually needs to be done every day while considering updated demand forecast for a medium term planning horizon. Due to the limitation on the solvable size of the MILP model, a deterministic scheduling system (DSS), consisting of an optimizer and a scheduler, is proposed to provide sub-optimal solutions in a short time for real size problem instances. The optimizer generates a tentative production plan. Then the scheduler sequences each lot on each individual machine according to the tentative production plan and scheduling rules. Customized factory rules and additional resource constraints are included in the DSS, such as preventive maintenance schedule, setup crew availability, and carrier limitations. Small problem instances are randomly generated to compare the performances of the MILP model and the deterministic scheduling system. Then experimental design is applied to understand the behavior of the DSS and identify the best configuration of the DSS under different demand scenarios. Product-machine qualification decisions have long-term and significant impact on production scheduling. A robust product-machine qualification matrix is critical for meeting demand when demand quantity or mix varies. In the second part of this research, a stochastic mixed integer programming model is proposed to balance the tradeoff between current machine qualification costs and future backorder costs with uncertain demand. The L-shaped method and acceleration techniques are proposed to solve the stochastic model. Computational results are provided to compare the performance of different solution methods.
ContributorsFu, Mengying (Author) / Askin, Ronald G. (Thesis advisor) / Zhang, Muhong (Thesis advisor) / Fowler, John W (Committee member) / Pan, Rong (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)
Created2011
149928-Thumbnail Image.png
Description
The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.
ContributorsTorres Garcia, Wandaliz (Author) / Meldrum, Deirdre R. (Thesis advisor) / Runger, George C. (Thesis advisor) / Gel, Esma S. (Committee member) / Li, Jing (Committee member) / Zhang, Weiwen (Committee member) / Arizona State University (Publisher)
Created2011
149723-Thumbnail Image.png
Description
This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.
ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)
Created2011
149658-Thumbnail Image.png
Description
Hydropower generation is one of the clean renewable energies which has received great attention in the power industry. Hydropower has been the leading source of renewable energy. It provides more than 86% of all electricity generated by renewable sources worldwide. Generally, the life span of a hydropower plant is considered

Hydropower generation is one of the clean renewable energies which has received great attention in the power industry. Hydropower has been the leading source of renewable energy. It provides more than 86% of all electricity generated by renewable sources worldwide. Generally, the life span of a hydropower plant is considered as 30 to 50 years. Power plants over 30 years old usually conduct a feasibility study of rehabilitation on their entire facilities including infrastructure. By age 35, the forced outage rate increases by 10 percentage points compared to the previous year. Much longer outages occur in power plants older than 20 years. Consequently, the forced outage rate increases exponentially due to these longer outages. Although these long forced outages are not frequent, their impact is immense. If reasonable timing of rehabilitation is missed, an abrupt long-term outage could occur and additional unnecessary repairs and inefficiencies would follow. On the contrary, too early replacement might cause the waste of revenue. The hydropower plants of Korea Water Resources Corporation (hereafter K-water) are utilized for this study. Twenty-four K-water generators comprise the population for quantifying the reliability of each equipment. A facility in a hydropower plant is a repairable system because most failures can be fixed without replacing the entire facility. The fault data of each power plant are collected, within which only forced outage faults are considered as raw data for reliability analyses. The mean cumulative repair functions (MCF) of each facility are determined with the failure data tables, using Nelson's graph method. The power law model, a popular model for a repairable system, can also be obtained to represent representative equipment and system availability. The criterion-based analysis of HydroAmp is used to provide more accurate reliability of each power plant. Two case studies are presented to enhance the understanding of the availability of each power plant and represent economic evaluations for modernization. Also, equipment in a hydropower plant is categorized into two groups based on their reliability for determining modernization timing and their suitable replacement periods are obtained using simulation.
ContributorsKwon, Ogeuk (Author) / Holbert, Keith E. (Thesis advisor) / Heydt, Gerald T (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)
Created2011
152223-Thumbnail Image.png
Description
Nowadays product reliability becomes the top concern of the manufacturers and customers always prefer the products with good performances under long period. In order to estimate the lifetime of the product, accelerated life testing (ALT) is introduced because most of the products can last years even decades. Much research has

Nowadays product reliability becomes the top concern of the manufacturers and customers always prefer the products with good performances under long period. In order to estimate the lifetime of the product, accelerated life testing (ALT) is introduced because most of the products can last years even decades. Much research has been done in the ALT area and optimal design for ALT is a major topic. This dissertation consists of three main studies. First, a methodology of finding optimal design for ALT with right censoring and interval censoring have been developed and it employs the proportional hazard (PH) model and generalized linear model (GLM) to simplify the computational process. A sensitivity study is also given to show the effects brought by parameters to the designs. Second, an extended version of I-optimal design for ALT is discussed and then a dual-objective design criterion is defined and showed with several examples. Also in order to evaluate different candidate designs, several graphical tools are developed. Finally, when there are more than one models available, different model checking designs are discussed.
ContributorsYang, Tao (Author) / Pan, Rong (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Borror, Connie (Committee member) / Rigdon, Steve (Committee member) / Arizona State University (Publisher)
Created2013
151341-Thumbnail Image.png
Description
With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic

With the rapid development of mobile sensing technologies like GPS, RFID, sensors in smartphones, etc., capturing position data in the form of trajectories has become easy. Moving object trajectory analysis is a growing area of interest these days owing to its applications in various domains such as marketing, security, traffic monitoring and management, etc. To better understand movement behaviors from the raw mobility data, this doctoral work provides analytic models for analyzing trajectory data. As a first contribution, a model is developed to detect changes in trajectories with time. If the taxis moving in a city are viewed as sensors that provide real time information of the traffic in the city, a change in these trajectories with time can reveal that the road network has changed. To detect changes, trajectories are modeled with a Hidden Markov Model (HMM). A modified training algorithm, for parameter estimation in HMM, called m-BaumWelch, is used to develop likelihood estimates under assumed changes and used to detect changes in trajectory data with time. Data from vehicles are used to test the method for change detection. Secondly, sequential pattern mining is used to develop a model to detect changes in frequent patterns occurring in trajectory data. The aim is to answer two questions: Are the frequent patterns still frequent in the new data? If they are frequent, has the time interval distribution in the pattern changed? Two different approaches are considered for change detection, frequency-based approach and distribution-based approach. The methods are illustrated with vehicle trajectory data. Finally, a model is developed for clustering and outlier detection in semantic trajectories. A challenge with clustering semantic trajectories is that both numeric and categorical attributes are present. Another problem to be addressed while clustering is that trajectories can be of different lengths and also have missing values. A tree-based ensemble is used to address these problems. The approach is extended to outlier detection in semantic trajectories.
ContributorsKondaveeti, Anirudh (Author) / Runger, George C. (Thesis advisor) / Mirchandani, Pitu (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)
Created2012
152398-Thumbnail Image.png
Description
Identifying important variation patterns is a key step to identifying root causes of process variability. This gives rise to a number of challenges. First, the variation patterns might be non-linear in the measured variables, while the existing research literature has focused on linear relationships. Second, it is important to remove

Identifying important variation patterns is a key step to identifying root causes of process variability. This gives rise to a number of challenges. First, the variation patterns might be non-linear in the measured variables, while the existing research literature has focused on linear relationships. Second, it is important to remove noise from the dataset in order to visualize the true nature of the underlying patterns. Third, in addition to visualizing the pattern (preimage), it is also essential to understand the relevant features that define the process variation pattern. This dissertation considers these variation challenges. A base kernel principal component analysis (KPCA) algorithm transforms the measurements to a high-dimensional feature space where non-linear patterns in the original measurement can be handled through linear methods. However, the principal component subspace in feature space might not be well estimated (especially from noisy training data). An ensemble procedure is constructed where the final preimage is estimated as the average from bagged samples drawn from the original dataset to attenuate noise in kernel subspace estimation. This improves the robustness of any base KPCA algorithm. In a second method, successive iterations of denoising a convex combination of the training data and the corresponding denoised preimage are used to produce a more accurate estimate of the actual denoised preimage for noisy training data. The number of primary eigenvectors chosen in each iteration is also decreased at a constant rate. An efficient stopping rule criterion is used to reduce the number of iterations. A feature selection procedure for KPCA is constructed to find the set of relevant features from noisy training data. Data points are projected onto sparse random vectors. Pairs of such projections are then matched, and the differences in variation patterns within pairs are used to identify the relevant features. This approach provides robustness to irrelevant features by calculating the final variation pattern from an ensemble of feature subsets. Experiments are conducted using several simulated as well as real-life data sets. The proposed methods show significant improvement over the competitive methods.
ContributorsSahu, Anshuman (Author) / Runger, George C. (Thesis advisor) / Wu, Teresa (Committee member) / Pan, Rong (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)
Created2013
150659-Thumbnail Image.png
Description
This dissertation is to address product design optimization including reliability-based design optimization (RBDO) and robust design with epistemic uncertainty. It is divided into four major components as outlined below. Firstly, a comprehensive study of uncertainties is performed, in which sources of uncertainty are listed, categorized and the impacts are discussed.

This dissertation is to address product design optimization including reliability-based design optimization (RBDO) and robust design with epistemic uncertainty. It is divided into four major components as outlined below. Firstly, a comprehensive study of uncertainties is performed, in which sources of uncertainty are listed, categorized and the impacts are discussed. Epistemic uncertainty is of interest, which is due to lack of knowledge and can be reduced by taking more observations. In particular, the strategies to address epistemic uncertainties due to implicit constraint function are discussed. Secondly, a sequential sampling strategy to improve RBDO under implicit constraint function is developed. In modern engineering design, an RBDO task is often performed by a computer simulation program, which can be treated as a black box, as its analytical function is implicit. An efficient sampling strategy on learning the probabilistic constraint function under the design optimization framework is presented. The method is a sequential experimentation around the approximate most probable point (MPP) at each step of optimization process. It is compared with the methods of MPP-based sampling, lifted surrogate function, and non-sequential random sampling. Thirdly, a particle splitting-based reliability analysis approach is developed in design optimization. In reliability analysis, traditional simulation methods such as Monte Carlo simulation may provide accurate results, but are often accompanied with high computational cost. To increase the efficiency, particle splitting is integrated into RBDO. It is an improvement of subset simulation with multiple particles to enhance the diversity and stability of simulation samples. This method is further extended to address problems with multiple probabilistic constraints and compared with the MPP-based methods. Finally, a reliability-based robust design optimization (RBRDO) framework is provided to integrate the consideration of design reliability and design robustness simultaneously. The quality loss objective in robust design, considered together with the production cost in RBDO, are used formulate a multi-objective optimization problem. With the epistemic uncertainty from implicit performance function, the sequential sampling strategy is extended to RBRDO, and a combined metamodel is proposed to tackle both controllable variables and uncontrollable variables. The solution is a Pareto frontier, compared with a single optimal solution in RBDO.
ContributorsZhuang, Xiaotian (Author) / Pan, Rong (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Zhang, Muhong (Committee member) / Du, Xiaoping (Committee member) / Arizona State University (Publisher)
Created2012
150497-Thumbnail Image.png
Description
In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements make services compete for system resources. IT service providers need

In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements make services compete for system resources. IT service providers need to allocate resources to services so the performance requirements of customers can be satisfied. Workload and performance models are required for efficient resource management and service performance assurance in SBS. This dissertation develops two methods to understand and model the cause-effect relations of service-related activities with resources workload and service performance. Part one presents an empirical method that requires the collection of system dynamics data and the application of statistical analyses. The results show that the method is capable to: 1) uncover the impacts of services on resource workload and service performance, 2) identify interaction effects of multiple services running concurrently, 3) gain insights about resource and performance tradeoffs of services, and 4) build service workload and performance models. In part two, the empirical method is used to investigate the impacts of services, security mechanisms and cyber attacks on resources workload and service performance. The information obtained is used to: 1) uncover interaction effects of services, security mechanisms and cyber attacks, 2) identify tradeoffs within limits of system resources, and 3) develop general/specific strategies for system survivability. Finally, part three presents a framework based on the usage profiles of services competing for resources and the resource-sharing schemes. The framework is used to: 1) uncover the impacts of service parameters (e.g. arrival distribution, execution time distribution, priority, workload intensity, scheduling algorithm) on workload and performance, and 2) build service workload and performance models at individual resources. The estimates obtained from service workload and performance models at individual resources can be aggregated to obtain overall estimates of services through multiple system resources. The workload and performance models of services obtained through both methods can be used for the efficient resource management and service performance assurance in SBS.
ContributorsMartinez Aranda, Billibaldo (Author) / Ye, Nong (Thesis advisor) / Wu, Tong (Committee member) / Sarjoughian, Hessam S. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)
Created2012
151226-Thumbnail Image.png
Description
Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection.
ContributorsBaydogan, Mustafa Gokce (Author) / Runger, George C. (Thesis advisor) / Atkinson, Robert (Committee member) / Gel, Esma (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)
Created2012