Search Content

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Production scheduling and system configuration for capacitated flow lines with application in the semiconductor backend process

Description

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process…

A good production schedule in a semiconductor back-end facility is critical for the on time delivery of customer orders. Compared to the front-end process that is dominated by re-entrant product flows, the back-end process is linear and therefore more suitable for scheduling. However, the production scheduling of the back-end process is still very difficult due to the wide product mix, large number of parallel machines, product family related setups, machine-product qualification, and weekly demand consisting of thousands of lots. In this research, a novel mixed-integer-linear-programming (MILP) model is proposed for the batch production scheduling of a semiconductor back-end facility. In the MILP formulation, the manufacturing process is modeled as a flexible flow line with bottleneck stages, unrelated parallel machines, product family related sequence-independent setups, and product-machine qualification considerations. However, this MILP formulation is difficult to solve for real size problem instances. In a semiconductor back-end facility, production scheduling usually needs to be done every day while considering updated demand forecast for a medium term planning horizon. Due to the limitation on the solvable size of the MILP model, a deterministic scheduling system (DSS), consisting of an optimizer and a scheduler, is proposed to provide sub-optimal solutions in a short time for real size problem instances. The optimizer generates a tentative production plan. Then the scheduler sequences each lot on each individual machine according to the tentative production plan and scheduling rules. Customized factory rules and additional resource constraints are included in the DSS, such as preventive maintenance schedule, setup crew availability, and carrier limitations. Small problem instances are randomly generated to compare the performances of the MILP model and the deterministic scheduling system. Then experimental design is applied to understand the behavior of the DSS and identify the best configuration of the DSS under different demand scenarios. Product-machine qualification decisions have long-term and significant impact on production scheduling. A robust product-machine qualification matrix is critical for meeting demand when demand quantity or mix varies. In the second part of this research, a stochastic mixed integer programming model is proposed to balance the tradeoff between current machine qualification costs and future backorder costs with uncertain demand. The L-shaped method and acceleration techniques are proposed to solve the stochastic model. Computational results are provided to compare the performance of different solution methods.

ContributorsFu, Mengying (Author) / Askin, Ronald G. (Thesis advisor) / Zhang, Muhong (Thesis advisor) / Fowler, John W (Committee member) / Pan, Rong (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Image-based process monitoring via generative adversarial autoencoder with applications to rolling defect detection

Description

Image-based process monitoring has recently attracted increasing attention due to the advancement of the sensing technologies. However, existing process monitoring methods fail to fully utilize the spatial information of images due to their complex characteristics including the high dimensionality and complex spatial structures. Recent advancement of the unsupervised deep models…

Image-based process monitoring has recently attracted increasing attention due to the advancement of the sensing technologies. However, existing process monitoring methods fail to fully utilize the spatial information of images due to their complex characteristics including the high dimensionality and complex spatial structures. Recent advancement of the unsupervised deep models such as a generative adversarial network (GAN) and generative adversarial autoencoder (AAE) has enabled to learn the complex spatial structures automatically. Inspired by this advancement, we propose an anomaly detection framework based on the AAE for unsupervised anomaly detection for images. AAE combines the power of GAN with the variational autoencoder, which serves as a nonlinear dimension reduction technique with regularization from the discriminator. Based on this, we propose a monitoring statistic efficiently capturing the change of the image data. The performance of the proposed AAE-based anomaly detection algorithm is validated through a simulation study and real case study for rolling defect detection.

ContributorsYeh, Huai-Ming (Author) / Yan, Hao (Thesis advisor) / Pan, Rong (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2019

Development of Tools for Planning and Coordinating the Production of Small Farmers as a Response to Market Opportunities

Description

For multiple reasons, the consumption of fresh fruits and vegetables in the United States has progressively increased. This has resulted in increased domestic production and importation of these products. The associated logistics is complex due to the perishability of these products, and most current logistics systems rely on marketing and…

For multiple reasons, the consumption of fresh fruits and vegetables in the United States has progressively increased. This has resulted in increased domestic production and importation of these products. The associated logistics is complex due to the perishability of these products, and most current logistics systems rely on marketing and supply chains practices that result in high levels of food waste and limited offer diversity. For instance, given the lack of critical mass, small growers are conspicuously absent from mainstream distribution channels. One way to obtain these critical masses is using associative schemes such as co-ops. However, the success level of traditional associate schemes has been mixed at best. This dissertation develops decision support tools to facilitate the formation of coalitions of small growers in complementary production regions to act as a single-like supplier. Thus, this dissertation demonstrates the benefits and efficiency that could be achieved by these coalitions, presents a methodology to efficiently distribute the value of a new identified market opportunity among the growers participating in the coalition, and develops a negotiation framework between a buyer(s) and the agent representing the coalition that results in a prototype contract.There are four main areas of research contributions in this dissertation. The first is the development of optimization tools to allocate a market opportunity to potential production regions while considering consumer preferences for special denomination labels such as “local”, “organic”, etc. The second contribution is in the development of a stochastic optimization and revenue-distribution framework for the formation of coalitions of growers to maximize the captured value of a market opportunity. The framework considers the growers’ individual preferences and production characteristics (yields, resources, etc.) to develop supply contracts that entice their participation in the coalition. The third area is the development of a negotiation mechanism to design contracts between buyers and groups of growers considering the profit expectations and the variability of the future demand. The final contribution is the integration of these models and tools into a framework capable of transforming new market opportunities into implementable production plans and contractual agreement between the different supply chain participants.

ContributorsUlloa, Rodrigo (Author) / Villalobos, Jesus (Thesis advisor) / Fowler, John (Committee member) / Mac Cawley, Alejandro (Committee member) / Yan, Hao (Committee member) / Phelan, Patrick (Committee member) / Arizona State University (Publisher)

Created2022

A Disease Progression Modeling Framework for Nonalcoholic Steatohepatitis Using Multiparametric Serial Magnetic Resonance Imaging and Elastography

Description

Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to…

Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to being asymptotic, it usually goes unnoticed and may lead to liver failure if not treated at the right time. Currently, liver biopsy is the gold standard to diagnose NASH, but being an invasive procedure, it comes with it's own complications along with the inconvenience of sampling repeated measurements over a period of time. Hence, noninvasive procedures to assess NASH are urgently required. Magnetic Resonance Elastography (MRE) based Shear Stiffness and Loss Modulus along with Magnetic Resonance Imaging based proton density fat fraction have been successfully combined to predict NASH stages However, their role in the prediction of disease progression still remains to be investigated. This thesis thus looks into combining features from serial MRE observations to develop statistical models to predict NASH progression. It utilizes data from an experiment conducted on male mice to develop progressive and regressive NASH and trains ordinal models, ordered probit regression and ordinal forest on labels generated from a logistic regression model. The models are assessed on histological data collected at the end point of the experiment. The models developed provide a framework to utilize a non-invasive tool to predict NASH disease progression.

ContributorsDeshpande, Eeshan (Author) / Ju, Feng (Thesis advisor) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2021

Extensions of the Assembly Line Balancing Problem Towards a General Assembly System Design Problem

Description

Assembly lines are low-cost production systems that manufacture similar finished units in large quantities. Manufacturers utilize mixed-model assembly lines to produce customized items that are not identical but share some general features in response to consumer needs. To maintain efficiency, the aim is to find the best feasible option to…

Assembly lines are low-cost production systems that manufacture similar finished units in large quantities. Manufacturers utilize mixed-model assembly lines to produce customized items that are not identical but share some general features in response to consumer needs. To maintain efficiency, the aim is to find the best feasible option to balance the lines efficiently; allocating each task to a workstation to satisfy all restrictions and fulfill all operational requirements in such a way that the line has the highest performance and maximum throughput. The work to be done at each workstation and line depends on the precise product configuration and is not constant across all models. This research seeks to enhance the subject of assembly line balancing by establishing a model for creating the most efficient assembly system. Several realistic characteristics are included into efficient optimization techniques and mathematical models to provide a more comprehensive model for building assembly systems. This involves analyzing the learning growth by task, employing parallel line designs, and configuring mixed models structure under particular constraints and criteria. This dissertation covers a gap in the literature by utilizing some exact and approximation modeling approaches. These methods are based on mathematical programming techniques, including integer and mixed integer models and heuristics. In this dissertation, heuristic approximations are employed to address problem-solving challenges caused by the problem's combinatorial complexity. This study proposes a model that considers learning curve effects and dynamic demand. This is exemplified in instances of a new assembly line, new employees, introducing new products or simply implementing engineering change orders. To achieve a cost-based optimal solution, an integer mathematical formulation is proposed to minimize the production line's total cost under the impact of learning and demand fulfillment. The research further creates approaches to obtain a comprehensive model in the case of single and mixed models for parallel lines systems. Optimization models and heuristics are developed under various aspects, such as cycle times by line and tooling considerations. Numerous extensions are explored effectively to analyze the cost impact under certain constraints and implications. The implementation results demonstrate that the proposed models and heuristics provide valuable insights.

ContributorsAlhomaidi, Esam (Author) / Askin, Ronald G (Thesis advisor) / Yan, Hao (Committee member) / Iquebal, Ashif (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2023

Hierarchical Sequential Event Prediction and Translation from Aviation Accident Report Data

Description

Sequential event prediction or sequential pattern mining is a well-studied topic in the literature. There are a lot of real-world scenarios where the data is released sequentially. People believe that there exist repetitive patterns of event sequences so that the future events can be predicted. For example, many companies build…

Sequential event prediction or sequential pattern mining is a well-studied topic in the literature. There are a lot of real-world scenarios where the data is released sequentially. People believe that there exist repetitive patterns of event sequences so that the future events can be predicted. For example, many companies build their recommender system to predict the next possible product for the users according to their purchase history. The healthcare system discovers the relationships among patients’ sequential symptoms to mitigate the adverse effect of a treatment (drugs or surgery). Modern engineering systems like aviation/distributed computing/energy systems diagnosed failure event logs and took prompt actions to avoid disaster when a similar failure pattern occurs. In this dissertation, I specifically focus on building a scalable algorithm for event prediction and extraction in the aviation domain. Understanding the accident event is always the major concern of the safety issue in the aviation system. A flight accident is often caused by a sequence of failure events. Accurate modeling of the failure event sequence and how it leads to the final accident is important for aviation safety. This work aims to study the relationship of the failure event sequence and evaluate the risk of the final accident according to these failure events. There are three major challenges I am trying to deal with. (1) Modeling Sequential Events with Hierarchical Structure: I aim to improve the prediction accuracy by taking advantage of the multi-level or hierarchical representation of these rare events. Specifically, I proposed to build a sequential Encoder-Decoder framework with a hierarchical embedding representation of the events. (2) Lack of high-quality and consistent event log data: In order to acquire more accurate event data from aviation accident reports, I convert the problem into a multi-label classification. An attention-based Bidirectional Encoder Representations from Transformers model is developed to achieve good performance and interpretability. (3) Ontology-based event extraction: In order to extract detailed events, I proposed to solve the problem as a hierarchical classification task. I improve the model performance by incorporating event ontology. By solving these three challenges, I provide a framework to extract events from narrative reports and estimate the risk level of aviation accidents through event sequence modeling.

ContributorsZhao, Xinyu (Author) / Yan, Hao (Thesis advisor) / Liu, Yongming (Committee member) / Ju, Feng (Committee member) / Iquebal, Ashif (Committee member) / Arizona State University (Publisher)

Created2022

Heuristics for Arc Routing Problems and Their Applications

Description

Arc Routing Problems (ARPs) are a type of routing problem that finds routes of minimum total cost covering the edges or arcs in a graph representing street or road networks. They find application in many essential services such as residential waste collection, winter gritting, and others. Being NP-hard, solutions are…

Arc Routing Problems (ARPs) are a type of routing problem that finds routes of minimum total cost covering the edges or arcs in a graph representing street or road networks. They find application in many essential services such as residential waste collection, winter gritting, and others. Being NP-hard, solutions are usually found using heuristic methods. This dissertation contributes to heuristics for ARP, with a focus on the Capacitated Arc Routing Problem (CARP) with additional constraints. In operations such as residential waste collection, vehicle breakdown disruptions occur frequently. A new variant Capacitated Arc Re-routing Problem for Vehicle Break-down (CARP-VB) is introduced to address the need to re-route using only remaining vehicles to avoid missing services. A new heuristic Probe is developed to solve CARP-VB. Experiments on benchmark instances show that Probe is better in reducing the makespan and hence effective in reducing delays and avoiding missing services. In addition to total cost, operators are also interested in solutions that are attractive, that is, routes that are contiguous, compact, and non-overlapping to manage the work. Operators may not adopt a solution that is not attractive even if it is optimum. They are also interested in solutions that are balanced in workload to meet equity requirements. A new multi-objective memetic algorithm, MA-ABC is developed, that optimizes three objectives: Attractiveness, makespan, and total cost. On testing with benchmark instances, MA-ABC was found to be effective in providing attractive and balanced route solutions without affecting the total cost. Changes in the problem specification such as demand and topology occurs frequently in business operations. Machine learning be applied to learn the distribution behind these changes and generate solutions quickly at time of inference. Splice is a machine learning framework for CARP that generates closer to optimum solutions quickly using a graph neural network and deep Q-learning. Splice can solve several variants of node and arc routing problems using the same architecture without any modification. Splice was trained and tested using randomly generated instances. Splice generated solutions faster that are also better in comparison to popular metaheuristics.

ContributorsRamamoorthy, Muhilan (Author) / Syrotiuk, Violet R. (Thesis advisor) / Forrest, Stephanie (Committee member) / Mirchandani, Pitu (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2022

Novel Semi-Supervised Learning Models to Balance Data Inclusivity and Usability in Healthcare Applications

Description

Semi-supervised learning (SSL) is sub-field of statistical machine learning that is useful for problems that involve having only a few labeled instances with predictor (X) and target (Y) information, and abundance of unlabeled instances that only have predictor (X) information. SSL harnesses the target information available in the limited…

Semi-supervised learning (SSL) is sub-field of statistical machine learning that is useful for problems that involve having only a few labeled instances with predictor (X) and target (Y) information, and abundance of unlabeled instances that only have predictor (X) information. SSL harnesses the target information available in the limited labeled data, as well as the information in the abundant unlabeled data to build strong predictive models. However, not all the included information is useful. For example, some features may correspond to noise and including them will hurt the predictive model performance. Additionally, some instances may not be as relevant to model building and their inclusion will increase training time and potentially hurt the model performance. The objective of this research is to develop novel SSL models to balance data inclusivity and usability. My dissertation research focuses on applications of SSL in healthcare, driven by problems in brain cancer radiomics, migraine imaging, and Parkinson’s Disease telemonitoring.

The first topic introduces an integration of machine learning (ML) and a mechanistic model (PI) to develop an SSL model applied to predicting cell density of glioblastoma brain cancer using multi-parametric medical images. The proposed ML-PI hybrid model integrates imaging information from unbiopsied regions of the brain as well as underlying biological knowledge from the mechanistic model to predict spatial tumor density in the brain.

The second topic develops a multi-modality imaging-based diagnostic decision support system (MMI-DDS). MMI-DDS consists of modality-wise principal components analysis to incorporate imaging features at different aggregation levels (e.g., voxel-wise, connectivity-based, etc.), a constrained particle swarm optimization (cPSO) feature selection algorithm, and a clinical utility engine that utilizes inverse operators on chosen principal components for white-box classification models.

The final topic develops a new SSL regression model with integrated feature and instance selection called s2SSL (with “s2” referring to selection in two different ways: feature and instance). s2SSL integrates cPSO feature selection and graph-based instance selection to simultaneously choose the optimal features and instances and build accurate models for continuous prediction. s2SSL was applied to smartphone-based telemonitoring of Parkinson’s Disease patients.

ContributorsGaw, Nathan (Author) / Li, Jing (Thesis advisor) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Hu, Leland (Committee member) / Arizona State University (Publisher)

Created2019

Real-time Analysis and Control for Smart Manufacturing Systems

Description

Recent advances in manufacturing system, such as advanced embedded sensing, big data analytics and IoT and robotics, are promising a paradigm shift in the manufacturing industry towards smart manufacturing systems. Typically, real-time data is available in many industries, such as automotive, semiconductor, and food production, which can reflect the machine…

Recent advances in manufacturing system, such as advanced embedded sensing, big data analytics and IoT and robotics, are promising a paradigm shift in the manufacturing industry towards smart manufacturing systems. Typically, real-time data is available in many industries, such as automotive, semiconductor, and food production, which can reflect the machine conditions and production system’s operation performance. However, a major research gap still exists in terms of how to utilize these real-time data information to evaluate and predict production system performance and to further facilitate timely decision making and production control on the factory floor. To tackle these challenges, this dissertation takes on an integrated analytical approach by hybridizing data analytics, stochastic modeling and decision making under uncertainty methodology to solve practical manufacturing problems.

Specifically, in this research, the machine degradation process is considered. It has been shown that machines working at different operating states may break down in different probabilistic manners. In addition, machines working in worse operating stage are more likely to fail, thus causing more frequent down period and reducing the system throughput. However, there is still a lack of analytical methods to quantify the potential impact of machine condition degradation on the overall system performance to facilitate operation decision making on the factory floor. To address these issues, this dissertation considers a serial production line with finite buffers and multiple machines following Markovian degradation process. An integrated model based on the aggregation method is built to quantify the overall system performance and its interactions with machine condition process. Moreover, system properties are investigated to analyze the influence of system parameters on system performance. In addition, three types of bottlenecks are defined and their corresponding indicators are derived to provide guidelines on improving system performance. These methods provide quantitative tools for modeling, analyzing, and improving manufacturing systems with the coupling between machine condition degradation and productivity given the real-time signals.

ContributorsKang, Yunyi (Author) / Ju, Feng (Thesis advisor) / Pedrielli, Giulia (Committee member) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by