Search Content

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Modeling frameworks for supply chain analytics

Description

Supply chains are increasingly complex as companies branch out into newer products and markets. In many cases, multiple products with moderate differences in performance and price compete for the same unit of demand. Simultaneous occurrences of multiple scenarios (competitive, disruptive, regulatory, economic, etc.), coupled with business decisions (pricing, product introduction,…

Supply chains are increasingly complex as companies branch out into newer products and markets. In many cases, multiple products with moderate differences in performance and price compete for the same unit of demand. Simultaneous occurrences of multiple scenarios (competitive, disruptive, regulatory, economic, etc.), coupled with business decisions (pricing, product introduction, etc.) can drastically change demand structures within a short period of time. Furthermore, product obsolescence and cannibalization are real concerns due to short product life cycles. Analytical tools that can handle this complexity are important to quantify the impact of business scenarios/decisions on supply chain performance. Traditional analysis methods struggle in this environment of large, complex datasets with hundreds of features becoming the norm in supply chains. We present an empirical analysis framework termed Scenario Trees that provides a novel representation for impulse and delayed scenario events and a direction for modeling multivariate constrained responses. Amongst potential learners, supervised learners and feature extraction strategies based on tree-based ensembles are employed to extract the most impactful scenarios and predict their outcome on metrics at different product hierarchies. These models are able to provide accurate predictions in modeling environments characterized by incomplete datasets due to product substitution, missing values, outliers, redundant features, mixed variables and nonlinear interaction effects. Graphical model summaries are generated to aid model understanding. Models in complex environments benefit from feature selection methods that extract non-redundant feature subsets from the data. Additional model simplification can be achieved by extracting specific levels/values that contribute to variable importance. We propose and evaluate new analytical methods to address this problem of feature value selection and study their comparative performance using simulated datasets. We show that supply chain surveillance can be structured as a feature value selection problem. For situations such as new product introduction, a bottom-up approach to scenario analysis is designed using an agent-based simulation and data mining framework. This simulation engine envelopes utility theory, discrete choice models and diffusion theory and acts as a test bed for enacting different business scenarios. We demonstrate the use of machine learning algorithms to analyze scenarios and generate graphical summaries to aid decision making.

ContributorsShinde, Amit (Author) / Runger, George C. (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Villalobos, Rene (Committee member) / Janakiram, Mani (Committee member) / Arizona State University (Publisher)

Created2012

Integrated structural health management of complex carbon fiber reinforced composite structures

Description

Structural health management (SHM) is emerging as a vital methodology to help engineers improve the safety and maintainability of critical structures. SHM systems are designed to reliably monitor and test the health and performance of structures in aerospace, civil, and mechanical engineering applications. SHM combines multidisciplinary technologies including sensing, signal…

Structural health management (SHM) is emerging as a vital methodology to help engineers improve the safety and maintainability of critical structures. SHM systems are designed to reliably monitor and test the health and performance of structures in aerospace, civil, and mechanical engineering applications. SHM combines multidisciplinary technologies including sensing, signal processing, pattern recognition, data mining, high fidelity probabilistic progressive damage models, physics based damage models, and regression analysis. Due to the wide application of carbon fiber reinforced composites and their multiscale failure mechanisms, it is necessary to emphasize the research of SHM on composite structures. This research develops a comprehensive framework for the damage detection, localization, quantification, and prediction of the remaining useful life of complex composite structures. To interrogate a composite structure, guided wave propagation is applied to thin structures such as beams and plates. Piezoelectric transducers are selected because of their versatility, which allows them to be used as sensors and actuators. Feature extraction from guided wave signals is critical to demonstrate the presence of damage and estimate the damage locations. Advanced signal processing techniques are employed to extract robust features and information. To provide a better estimate of the damage for accurate life estimation, probabilistic regression analysis is used to obtain a prediction model for the prognosis of complex structures subject to fatigue loading. Special efforts have been applied to the extension of SHM techniques on aerospace and spacecraft structures, such as UAV composite wings and deployable composite boom structures. Necessary modifications of the developed SHM techniques were conducted to meet the unique requirements of the aerospace structures. The developed SHM algorithms are able to accurately detect and quantify impact damages as well as matrix cracking introduced.

ContributorsLiu, Yingtao (Author) / Chattopadhyay, Aditi (Thesis advisor) / Rajadas, John (Committee member) / Dai, Lenore (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Jiang, Hanqing (Committee member) / Arizona State University (Publisher)

Created2012

Production scheduling and system configuration for capacitated flow lines with application in the semiconductor backend process

Description

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process…

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process is still very difficult due to the wide product mix, large number of parallel machines, product family related setups, machine-product qualification, and weekly demand consisting of thousands of lots. In this research, a novel mixed-integer-linear-programming (MILP) model is proposed for the batch production scheduling of a semiconductor back-end facility. In the MILP formulation, the manufacturing process is modeled as a flexible flow line with bottleneck stages, unrelated parallel machines, product family related sequence-independent setups, and product-machine qualification considerations. However, this MILP formulation is difficult to solve for real size problem instances. In a semiconductor back-end facility, production scheduling usually needs to be done every day while considering updated demand forecast for a medium term planning horizon. Due to the limitation on the solvable size of the MILP model, a deterministic scheduling system (DSS), consisting of an optimizer and a scheduler, is proposed to provide sub-optimal solutions in a short time for real size problem instances. The optimizer generates a tentative production plan. Then the scheduler sequences each lot on each individual machine according to the tentative production plan and scheduling rules. Customized factory rules and additional resource constraints are included in the DSS, such as preventive maintenance schedule, setup crew availability, and carrier limitations. Small problem instances are randomly generated to compare the performances of the MILP model and the deterministic scheduling system. Then experimental design is applied to understand the behavior of the DSS and identify the best configuration of the DSS under different demand scenarios. Product-machine qualification decisions have long-term and significant impact on production scheduling. A robust product-machine qualification matrix is critical for meeting demand when demand quantity or mix varies. In the second part of this research, a stochastic mixed integer programming model is proposed to balance the tradeoff between current machine qualification costs and future backorder costs with uncertain demand. The L-shaped method and acceleration techniques are proposed to solve the stochastic model. Computational results are provided to compare the performance of different solution methods.

ContributorsFu, Mengying (Author) / Askin, Ronald G. (Thesis advisor) / Zhang, Muhong (Thesis advisor) / Fowler, John W (Committee member) / Pan, Rong (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

The detection of reliability prediction cues in manufacturing data from statistically controlled processes

Description

Many products undergo several stages of testing ranging from tests on individual components to end-item tests. Additionally, these products may be further "tested" via customer or field use. The later failure of a delivered product may in some cases be due to circumstances that have no correlation with the product's…

Many products undergo several stages of testing ranging from tests on individual components to end-item tests. Additionally, these products may be further "tested" via customer or field use. The later failure of a delivered product may in some cases be due to circumstances that have no correlation with the product's inherent quality. However, at times, there may be cues in the upstream test data that, if detected, could serve to predict the likelihood of downstream failure or performance degradation induced by product use or environmental stresses. This study explores the use of downstream factory test data or product field reliability data to infer data mining or pattern recognition criteria onto manufacturing process or upstream test data by means of support vector machines (SVM) in order to provide reliability prediction models. In concert with a risk/benefit analysis, these models can be utilized to drive improvement of the product or, at least, via screening to improve the reliability of the product delivered to the customer. Such models can be used to aid in reliability risk assessment based on detectable correlations between the product test performance and the sources of supply, test stands, or other factors related to product manufacture. As an enhancement to the usefulness of the SVM or hyperplane classifier within this context, L-moments and the Western Electric Company (WECO) Rules are used to augment or replace the native process or test data used as inputs to the classifier. As part of this research, a generalizable binary classification methodology was developed that can be used to design and implement predictors of end-item field failure or downstream product performance based on upstream test data that may be composed of single-parameter, time-series, or multivariate real-valued data. Additionally, the methodology provides input parameter weighting factors that have proved useful in failure analysis and root cause investigations as indicators of which of several upstream product parameters have the greater influence on the downstream failure outcomes.

ContributorsMosley, James (Author) / Morrell, Darryl (Committee member) / Cochran, Douglas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Roberts, Chell (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2011

An analytical approach to lean six sigma deployment strategies: project identification and prioritization

Description

The ever-changing economic landscape has forced many companies to re-examine their supply chains. Global resourcing and outsourcing of processes has been a strategy many organizations have adopted to reduce cost and to increase their global footprint. This has, however, resulted in increased process complexity and reduced customer satisfaction. In order…

The ever-changing economic landscape has forced many companies to re-examine their supply chains. Global resourcing and outsourcing of processes has been a strategy many organizations have adopted to reduce cost and to increase their global footprint. This has, however, resulted in increased process complexity and reduced customer satisfaction. In order to meet and exceed customer expectations, many companies are forced to improve quality and on-time delivery, and have looked towards Lean Six Sigma as an approach to enable process improvement. The Lean Six Sigma literature is rich in deployment strategies; however, there is a general lack of a mathematical approach to deploy Lean Six Sigma in a global enterprise. This includes both project identification and prioritization. The research presented here is two-fold. Firstly, a process characterization framework is presented to evaluate processes based on eight characteristics. An unsupervised learning technique, using clustering algorithms, is then utilized to group processes that are Lean Six Sigma conducive. The approach helps Lean Six Sigma deployment champions to identify key areas within the business to focus a Lean Six Sigma deployment. A case study is presented and 33% of the processes were found to be Lean Six Sigma conducive. Secondly, having identified parts of the business that are lean Six Sigma conducive, the next steps are to formulate and prioritize a portfolio of projects. Very often the deployment champion is faced with the decision of selecting a portfolio of Lean Six Sigma projects that meet multiple objectives which could include: maximizing productivity, customer satisfaction or return on investment, while meeting certain budgetary constraints. A multi-period 0-1 knapsack problem is presented that maximizes the expected net savings of the Lean Six Sigma portfolio over the life cycle of the deployment. Finally, a case study is presented that demonstrates the application of the model in a large multinational company. Traditionally, Lean Six Sigma found its roots in manufacturing. The research presented in this dissertation also emphasizes the applicability of the methodology to the non-manufacturing space. Additionally, a comparison is conducted between manufacturing and non-manufacturing processes to highlight the challenges in deploying the methodology in both spaces.

ContributorsDuarte, Brett Marc (Author) / Fowler, John W (Thesis advisor) / Montgomery, Douglas C. (Thesis advisor) / Shunk, Dan (Committee member) / Borror, Connie (Committee member) / Konopka, John (Committee member) / Arizona State University (Publisher)

Created2011

Analysis and modeling of services impacts on system workload and performance in service-based systems (SBS)

Description

In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements make services compete for system resources. IT service providers need…

In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements make services compete for system resources. IT service providers need to allocate resources to services so the performance requirements of customers can be satisfied. Workload and performance models are required for efficient resource management and service performance assurance in SBS. This dissertation develops two methods to understand and model the cause-effect relations of service-related activities with resources workload and service performance. Part one presents an empirical method that requires the collection of system dynamics data and the application of statistical analyses. The results show that the method is capable to: 1) uncover the impacts of services on resource workload and service performance, 2) identify interaction effects of multiple services running concurrently, 3) gain insights about resource and performance tradeoffs of services, and 4) build service workload and performance models. In part two, the empirical method is used to investigate the impacts of services, security mechanisms and cyber attacks on resources workload and service performance. The information obtained is used to: 1) uncover interaction effects of services, security mechanisms and cyber attacks, 2) identify tradeoffs within limits of system resources, and 3) develop general/specific strategies for system survivability. Finally, part three presents a framework based on the usage profiles of services competing for resources and the resource-sharing schemes. The framework is used to: 1) uncover the impacts of service parameters (e.g. arrival distribution, execution time distribution, priority, workload intensity, scheduling algorithm) on workload and performance, and 2) build service workload and performance models at individual resources. The estimates obtained from service workload and performance models at individual resources can be aggregated to obtain overall estimates of services through multiple system resources. The workload and performance models of services obtained through both methods can be used for the efficient resource management and service performance assurance in SBS.

ContributorsMartinez Aranda, Billibaldo (Author) / Ye, Nong (Thesis advisor) / Wu, Tong (Committee member) / Sarjoughian, Hessam S. (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

Minimizing total weighted tardiness in a two staged flexible flow-shop with batch processing, incompatible job families and unequal ready times using time window decomposition

Description

This research is motivated by a deterministic scheduling problem that is fairly common in manufacturing environments, where there are certain processes that call for a machine working on multiple jobs at the same time. An example of such an environment is wafer fabrication in the semiconductor industry where some stages…

This research is motivated by a deterministic scheduling problem that is fairly common in manufacturing environments, where there are certain processes that call for a machine working on multiple jobs at the same time. An example of such an environment is wafer fabrication in the semiconductor industry where some stages can be modeled as batch processes. There has been significant work done in the past in the field of a single stage of parallel machines which process jobs in batches. The primary motivation behind this research is to extend the research done in this area to a two-stage flow-shop where jobs arrive with unequal ready times and belong to incompatible job families with the goal of minimizing total weighted tardiness. As a first step to propose solutions, a mixed integer mathematical model is developed which tackles the problem at hand. The problem is NP-hard and thus the developed mathematical program can only solve problem instances of smaller sizes in a reasonable amount of time. The next step is to build heuristics which can provide feasible solutions in polynomial time for larger problem instances. The basic nature of the heuristics proposed is time window decomposition, where jobs within a moving time frame are considered for batching each time a machine becomes available on either stage. The Apparent Tardiness Cost (ATC) rule is used to build batches, and is modified to calculate ATC indices on a batch as well as a job level. An improvisation to the above heuristic is proposed, where the heuristic is run iteratively, each time assigning start times of jobs on the second stage as due dates for the jobs on the first stage. The underlying logic behind the iterative approach is to improve the way due dates are estimated for the first stage based on assigned due dates for jobs in the second stage. An important study carried out as part of this research is to analyze the bottleneck stage in terms of its location and how it affects the performance measure. Extensive experimentation is carried out to test how the quality of the solution varies when input parameters are varied between high and low values.

ContributorsTewari, Anubha Alokkumar (Author) / Fowler, John W (Thesis advisor) / Monch, Lars (Thesis advisor) / Gel, Esma S (Committee member) / Arizona State University (Publisher)

Created2012

A simulation study of Kanban levels for assembly lines and systems

Description

In the entire supply chain, demand planning is one of the crucial aspects of the production planning process. If the demand is not estimated accurately, then it causes revenue loss. Past research has shown forecasting can be used to help the demand planning process for production. However, accurate forecasting from…

In the entire supply chain, demand planning is one of the crucial aspects of the production planning process. If the demand is not estimated accurately, then it causes revenue loss. Past research has shown forecasting can be used to help the demand planning process for production. However, accurate forecasting from historical data is difficult in today's complex volatile market. Also it is not the only factor that influences the demand planning. Factors, namely, Consumer's shifting interest and buying power also influence the future demand. Hence, this research study focuses on Just-In-Time (JIT) philosophy using a pull control strategy implemented with a Kanban control system to control the inventory flow. Two different product structures, serial product structure and assembly product structure, are considered for this research. Three different methods: the Toyota Production System model, a histogram model and a cost minimization model, have been used to find the number of kanbans that was used in a computer simulated Just-In-Time Kanban System. The simulation model was built to execute the designed scenarios for both the serial and assembly product structure. A test was performed to check the significance effects of various factors on system performance. Results of all three methods were collected and compared to indicate which method provides the most effective way to determine number of kanbans at various conditions. It was inferred that histogram model and cost minimization models are more accurate in calculating the required kanbans for various manufacturing conditions. Method-1 fails to adjust the kanbans when the backordered cost increases or when product structure changes. Among the product structures, serial product structures proved to be effective when Method-2 or Method-3 is used to calculate the kanban numbers for the system. The experimental result data also indicated that the lower container capacity collects more backorders in the system, which increases the inventory cost, than the high container capacity for both serial and assembly product structures.

ContributorsSahu, Pranati (Author) / Askin, Ronald G. (Thesis advisor) / Shunk, Dan L. (Thesis advisor) / Fowler, John (Committee member) / Arizona State University (Publisher)

Created2012

Adaptive operation decisions for a system of smart buildings

Description

Buildings (approximately half commercial and half residential) consume over 70% of the electricity among all the consumption units in the United States. Buildings are also responsible for approximately 40% of CO2 emissions, which is more than any other industry sectors. As a result, the initiative smart building which aims to…

Buildings (approximately half commercial and half residential) consume over 70% of the electricity among all the consumption units in the United States. Buildings are also responsible for approximately 40% of CO2 emissions, which is more than any other industry sectors. As a result, the initiative smart building which aims to not only manage electrical consumption in an efficient way but also reduce the damaging effect of greenhouse gases on the environment has been launched. Another important technology being promoted by government agencies is the smart grid which manages energy usage across a wide range of buildings in an effort to reduce cost and increase reliability and transparency. As a great amount of efforts have been devoted to these two initiatives by either exploring the smart grid designs or developing technologies for smart buildings, the research studying how the smart buildings and smart grid coordinate thus more efficiently use the energy is currently lacking. In this dissertation, a "system-of-system" approach is employed to develop an integrated building model which consists a number of buildings (building cluster) interacting with smart grid. The buildings can function as both energy consumption unit as well as energy generation/storage unit. Memetic Algorithm (MA) and Particle Swarm Optimization (PSO) based decision framework are developed for building operation decisions. In addition, Particle Filter (PF) is explored as a mean for fusing online sensor and meter data so adaptive decision could be made in responding to dynamic environment. The dissertation is divided into three inter-connected research components. First, an integrated building energy model including building consumption, storage, generation sub-systems for the building cluster is developed. Then a bi-level Memetic Algorithm (MA) based decentralized decision framework is developed to identify the Pareto optimal operation strategies for the building cluster. The Pareto solutions not only enable multiple dimensional tradeoff analysis, but also provide valuable insight for determining pricing mechanisms and power grid capacity. Secondly, a multi-objective PSO based decision framework is developed to reduce the computational effort of the MA based decision framework without scarifying accuracy. With the improved performance, the decision time scale could be refined to make it capable for hourly operation decisions. Finally, by integrating the multi-objective PSO based decision framework with PF, an adaptive framework is developed for adaptive operation decisions for smart building cluster. The adaptive framework not only enables me to develop a high fidelity decision model but also enables the building cluster to respond to the dynamics and uncertainties inherent in the system.

ContributorsHu, Mengqi (Author) / Wu, Teresa (Thesis advisor) / Weir, Jeffery (Thesis advisor) / Wen, Jin (Committee member) / Fowler, John (Committee member) / Shunk, Dan (Committee member) / Arizona State University (Publisher)

Created2012