Search Content

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

A model fusion based framework for imbalanced classification problem with noisy dataset

Description

Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data…

Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data imbalance and data noise have been treated separately in the data mining field. Yet, such approach ignores the mutual effects and as a result may lead to new problems. A desirable solution is to tackle these two issues jointly. Noting the complementary nature of generative and discriminative models, this research proposes a unified model fusion based framework to handle the imbalanced classification with noisy dataset.

The phase I study focuses on the imbalanced classification problem. A generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalance data to improve the discrimination power on imbalanced classes. By fusing this knowledge into cost SVM (cSVM), a CSG method is proposed. Experimental results show the effectiveness of CSG in dealing with imbalanced classification problems.

The phase II study expands the research scope to include the noisy dataset into the imbalanced classification problem. A model fusion based framework, K Nearest Gaussian (KNG) is proposed. KNG employs a generative modeling method, GMM, to model the training data as Gaussian mixtures and form adjustable confidence regions which are less sensitive to data imbalance and noise. Motivated by the K-nearest neighbor algorithm, the neighboring Gaussians are used to classify the testing instances. Experimental results show KNG method greatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset.

The phase III study addresses the issues of feature selection and parameter tuning of KNG algorithm. To further improve the performance of KNG algorithm, a Particle Swarm Optimization based method (PSO-KNG) is proposed. PSO-KNG formulates model parameters and data features into the same particle vector and thus can search the best feature and parameter combination jointly. The experimental results show that PSO can greatly improve the performance of KNG with better accuracy and much lower computational cost.

ContributorsHe, Miao (Author) / Wu, Teresa (Thesis advisor) / Li, Jing (Committee member) / Silva, Alvin (Committee member) / Borror, Connie (Committee member) / Arizona State University (Publisher)

Created2014

Expanding data mining theory for industrial applications

Description

The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the…

The field of Data Mining is widely recognized and accepted for its applications in many business problems to guide decision-making processes based on data. However, in recent times, the scope of these problems has swollen and the methods are under scrutiny for applicability and relevance to real-world circumstances. At the crossroads of innovation and standards, it is important to examine and understand whether the current theoretical methods for industrial applications (which include KDD, SEMMA and CRISP-DM) encompass all possible scenarios that could arise in practical situations. Do the methods require changes or enhancements? As part of the thesis I study the current methods and delineate the ideas of these methods and illuminate their shortcomings which posed challenges during practical implementation. Based on the experiments conducted and the research carried out, I propose an approach which illustrates the business problems with higher accuracy and provides a broader view of the process. It is then applied to different case studies highlighting the different aspects to this approach.

ContributorsAnand, Aneeth (Author) / Liu, Huan (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2012

Intervention Strategies for the DoD Acquisition Process Using Simulation

Description

The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation…

The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation study. This research seeks to determine the acquisition processes that contribute significantly to total simulated program time in the acquisition system for all programs reaching Milestone C. Specifically, this research examines the effect of increased scope management, technology maturity, and decreased variation and mean process times in post-Design Readiness Review contractor activities by performing additional simulation analyses. Potential policies are formulated from the results to further improve program acquisition completion time.

ContributorsWorger, Danielle Marie (Author) / Wu, Teresa (Thesis director) / Shunk, Dan (Committee member) / Wirthlin, J. Robert (Committee member) / Industrial, Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2013-05

A Data Mining Approach to Modeling Customer Preference: A Case Study of Intel Corporation

Description

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster…

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster analysis to create product profiles based on which customer profiles are derived. Phase-2 then delves deep into each of the customer profiles and investigates causality behind their preference using Bayesian networks. This thesis illustrates the working of the framework using the case of Intel Corporation, world’s largest semiconductor manufacturing company.

ContributorsRam, Sudarshan Venkat (Author) / Kempf, Karl G. (Thesis advisor) / Wu, Teresa (Thesis advisor) / Ju, Feng (Committee member) / Arizona State University (Publisher)

Created2017

A Disease Progression Modeling Framework for Nonalcoholic Steatohepatitis Using Multiparametric Serial Magnetic Resonance Imaging and Elastography

Description

Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to…

Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to being asymptotic, it usually goes unnoticed and may lead to liver failure if not treated at the right time. Currently, liver biopsy is the gold standard to diagnose NASH, but being an invasive procedure, it comes with it's own complications along with the inconvenience of sampling repeated measurements over a period of time. Hence, noninvasive procedures to assess NASH are urgently required. Magnetic Resonance Elastography (MRE) based Shear Stiffness and Loss Modulus along with Magnetic Resonance Imaging based proton density fat fraction have been successfully combined to predict NASH stages However, their role in the prediction of disease progression still remains to be investigated. This thesis thus looks into combining features from serial MRE observations to develop statistical models to predict NASH progression. It utilizes data from an experiment conducted on male mice to develop progressive and regressive NASH and trains ordinal models, ordered probit regression and ordinal forest on labels generated from a logistic regression model. The models are assessed on histological data collected at the end point of the experiment. The models developed provide a framework to utilize a non-invasive tool to predict NASH disease progression.

ContributorsDeshpande, Eeshan (Author) / Ju, Feng (Thesis advisor) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

A model fusion based framework for imbalanced classification problem with noisy dataset

Expanding data mining theory for industrial applications

Intervention Strategies for the DoD Acquisition Process Using Simulation

A Data Mining Approach to Modeling Customer Preference: A Case Study of Intel Corporation

A Disease Progression Modeling Framework for Nonalcoholic Steatohepatitis Using Multiparametric Serial Magnetic Resonance Imaging and Elastography