Search Content

A model fusion based framework for imbalanced classification problem with noisy dataset

Description

Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data…

Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data imbalance and data noise have been treated separately in the data mining field. Yet, such approach ignores the mutual effects and as a result may lead to new problems. A desirable solution is to tackle these two issues jointly. Noting the complementary nature of generative and discriminative models, this research proposes a unified model fusion based framework to handle the imbalanced classification with noisy dataset.

The phase I study focuses on the imbalanced classification problem. A generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalance data to improve the discrimination power on imbalanced classes. By fusing this knowledge into cost SVM (cSVM), a CSG method is proposed. Experimental results show the effectiveness of CSG in dealing with imbalanced classification problems.

The phase II study expands the research scope to include the noisy dataset into the imbalanced classification problem. A model fusion based framework, K Nearest Gaussian (KNG) is proposed. KNG employs a generative modeling method, GMM, to model the training data as Gaussian mixtures and form adjustable confidence regions which are less sensitive to data imbalance and noise. Motivated by the K-nearest neighbor algorithm, the neighboring Gaussians are used to classify the testing instances. Experimental results show KNG method greatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset.

The phase III study addresses the issues of feature selection and parameter tuning of KNG algorithm. To further improve the performance of KNG algorithm, a Particle Swarm Optimization based method (PSO-KNG) is proposed. PSO-KNG formulates model parameters and data features into the same particle vector and thus can search the best feature and parameter combination jointly. The experimental results show that PSO can greatly improve the performance of KNG with better accuracy and much lower computational cost.

ContributorsHe, Miao (Author) / Wu, Teresa (Thesis advisor) / Li, Jing (Committee member) / Silva, Alvin (Committee member) / Borror, Connie (Committee member) / Arizona State University (Publisher)

Created2014

Design and analysis of ambulance diversion policies

Description

Overcrowding of Emergency Departments (EDs) put the safety of patients at risk. Decision makers implement Ambulance Diversion (AD) as a way to relieve congestion and ensure timely treatment delivery. However, ineffective design of AD policies reduces the accessibility to emergency care and adverse events may arise. The objective of this…

Overcrowding of Emergency Departments (EDs) put the safety of patients at risk. Decision makers implement Ambulance Diversion (AD) as a way to relieve congestion and ensure timely treatment delivery. However, ineffective design of AD policies reduces the accessibility to emergency care and adverse events may arise. The objective of this dissertation is to propose methods to design and analyze effective AD policies that consider performance measures that are related to patient safety. First, a simulation-based methodology is proposed to evaluate the mean performance and variability of single-factor AD policies in a single hospital environment considering the trade-off between average waiting time and percentage of time spent on diversion. Regression equations are proposed to obtain parameters of AD policies that yield desired performance level. The results suggest that policies based on the total number of patients waiting are more consistent and provide a high precision in predicting policy performance. Then, a Markov Decision Process model is proposed to obtain the optimal AD policy assuming that information to start treatment in a neighboring hospital is available. The model is designed to minimize the average tardiness per patient in the long run. Tardiness is defined as the time that patients have to wait beyond a safety time threshold to start receiving treatment. Theoretical and computational analyses show that there exists an optimal policy that is of threshold type, and diversion can be a good alternative to decrease tardiness when ambulance patients cause excessive congestion in the ED. Furthermore, implementation of AD policies in a simulation model that accounts for several relaxations of the assumptions suggests that the model provides consistent policies under multiple scenarios. Finally, a genetic algorithm is combined with simulation to design effective policies for multiple hospitals simultaneously. The model has the objective of minimizing the time that patients spend in non-value added activities, including transportation, waiting and boarding in the ED. Moreover, the AD policies are combined with simple ambulance destination policies to create ambulance flow control mechanisms. Results show that effective ambulance management can significantly reduce the time that patients have to wait to receive appropriate level of care.

ContributorsRamirez Nafarrate, Adrian (Author) / Fowler, John W. (Thesis advisor) / Wu, Teresa (Thesis advisor) / Gel, Esma S. (Committee member) / Limon, Jorge (Committee member) / Arizona State University (Publisher)

Created2011

Intervention Strategies for the DoD Acquisition Process Using Simulation

Description

The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation…

The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation study. This research seeks to determine the acquisition processes that contribute significantly to total simulated program time in the acquisition system for all programs reaching Milestone C. Specifically, this research examines the effect of increased scope management, technology maturity, and decreased variation and mean process times in post-Design Readiness Review contractor activities by performing additional simulation analyses. Potential policies are formulated from the results to further improve program acquisition completion time.

ContributorsWorger, Danielle Marie (Author) / Wu, Teresa (Thesis director) / Shunk, Dan (Committee member) / Wirthlin, J. Robert (Committee member) / Industrial, Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2013-05

Data and Predictive Analytics for Energy Use

Description

The overall energy consumption around the United States has not been reduced even with the advancement of technology over the past decades. Deficiencies exist between design and actual energy performances. Energy Infrastructure Systems (EIS) are impacted when the amount of energy production cannot be accurately and efficiently forecasted. Inaccurate engineering…

The overall energy consumption around the United States has not been reduced even with the advancement of technology over the past decades. Deficiencies exist between design and actual energy performances. Energy Infrastructure Systems (EIS) are impacted when the amount of energy production cannot be accurately and efficiently forecasted. Inaccurate engineering assumptions can result when there is a lack of understanding on how energy systems can operate in real-world applications. Energy systems are complex, which results in unknown system behaviors, due to an unknown structural system model. Currently, there exists a lack of data mining techniques in reverse engineering, which are needed to develop efficient structural system models. In this project, a new type of reverse engineering algorithm has been applied to a year's worth of energy data collected from an ASU research building called MacroTechnology Works, to identify the structural system model. Developing and understanding structural system models is the first step in creating accurate predictive analytics for energy production. The associative network of the building's data will be highlighted to accurately depict the structural model. This structural model will enhance energy infrastructure systems' energy efficiency, reduce energy waste, and narrow the gaps between energy infrastructure design, planning, operation and management (DPOM).

ContributorsCamarena, Raquel Jimenez (Author) / Chong, Oswald (Thesis director) / Ye, Nong (Committee member) / Industrial, Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Optimization of surgery delivery systems

Description

Optimization of surgical operations is a challenging managerial problem for surgical suite directors. This dissertation presents modeling and solution techniques for operating room (OR) planning and scheduling problems. First, several sequencing and patient appointment time setting heuristics are proposed for scheduling an Outpatient Procedure Center. A discrete event simulation model…

Optimization of surgical operations is a challenging managerial problem for surgical suite directors. This dissertation presents modeling and solution techniques for operating room (OR) planning and scheduling problems. First, several sequencing and patient appointment time setting heuristics are proposed for scheduling an Outpatient Procedure Center. A discrete event simulation model is used to evaluate how scheduling heuristics perform with respect to the competing criteria of expected patient waiting time and expected surgical suite overtime for a single day compared to current practice. Next, a bi-criteria Genetic Algorithm is used to determine if better solutions can be obtained for this single day scheduling problem. The efficacy of the bi-criteria Genetic Algorithm, when surgeries are allowed to be moved to other days, is investigated. Numerical experiments based on real data from a large health care provider are presented. The analysis provides insight into the best scheduling heuristics, and the tradeoff between patient and health care provider based criteria. Second, a multi-stage stochastic mixed integer programming formulation for the allocation of surgeries to ORs over a finite planning horizon is studied. The demand for surgery and surgical duration are random variables. The objective is to minimize two competing criteria: expected surgery cancellations and OR overtime. A decomposition method, Progressive Hedging, is implemented to find near optimal surgery plans. Finally, properties of the model are discussed and methods are proposed to improve the performance of the algorithm based on the special structure of the model. It is found simple rules can improve schedules used in practice. Sequencing surgeries from the longest to shortest mean duration causes high expected overtime, and should be avoided, while sequencing from the shortest to longest mean duration performed quite well in our experiments. Expending greater computational effort with more sophisticated optimization methods does not lead to substantial improvements. However, controlling daily procedure mix may achieve substantial improvements in performance. A novel stochastic programming model for a dynamic surgery planning problem is proposed in the dissertation. The efficacy of the progressive hedging algorithm is investigated. It is found there is a significant correlation between the performance of the algorithm and type and number of scenario bundles in a problem instance. The computational time spent to solve scenario subproblems is among the most significant factors that impact the performance of the algorithm. The quality of the solutions can be improved by detecting and preventing cyclical behaviors.

ContributorsGul, Serhat (Author) / Fowler, John W. (Thesis advisor) / Denton, Brian T. (Thesis advisor) / Wu, Teresa (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2010

Multi-objective operating room planning and scheduling

Description

Surgery is one of the most important functions in a hospital with respect to operational cost, patient flow, and resource utilization. Planning and scheduling the Operating Room (OR) is important for hospitals to improve efficiency and achieve high quality of service. At the same time, it is a complex task…

Surgery is one of the most important functions in a hospital with respect to operational cost, patient flow, and resource utilization. Planning and scheduling the Operating Room (OR) is important for hospitals to improve efficiency and achieve high quality of service. At the same time, it is a complex task due to the conflicting objectives and the uncertain nature of surgeries. In this dissertation, three different methodologies are developed to address OR planning and scheduling problem. First, a simulation-based framework is constructed to analyze the factors that affect the utilization of a catheterization lab and provide decision support for improving the efficiency of operations in a hospital with different priorities of patients. Both operational costs and patient satisfaction metrics are considered. Detailed parametric analysis is performed to provide generic recommendations. Overall it is found the 75th percentile of process duration is always on the efficient frontier and is a good compromise of both objectives. Next, the general OR planning and scheduling problem is formulated with a mixed integer program. The objectives include reducing staff overtime, OR idle time and patient waiting time, as well as satisfying surgeon preferences and regulating patient flow from OR to the Post Anesthesia Care Unit (PACU). Exact solutions are obtained using real data. Heuristics and a random keys genetic algorithm (RKGA) are used in the scheduling phase and compared with the optimal solutions. Interacting effects between planning and scheduling are also investigated. Lastly, a multi-objective simulation optimization approach is developed, which relaxes the deterministic assumption in the second study by integrating an optimization module of a RKGA implementation of the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to search for Pareto optimal solutions, and a simulation module to evaluate the performance of a given schedule. It is experimentally shown to be an effective technique for finding Pareto optimal solutions.

ContributorsLi, Qing (Author) / Fowler, John W (Thesis advisor) / Mohan, Srimathy (Thesis advisor) / Gopalakrishnan, Mohan (Committee member) / Askin, Ronald G. (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2010

A Data Mining Approach to Modeling Customer Preference: A Case Study of Intel Corporation

Description

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster…

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster analysis to create product profiles based on which customer profiles are derived. Phase-2 then delves deep into each of the customer profiles and investigates causality behind their preference using Bayesian networks. This thesis illustrates the working of the framework using the case of Intel Corporation, world’s largest semiconductor manufacturing company.

ContributorsRam, Sudarshan Venkat (Author) / Kempf, Karl G. (Thesis advisor) / Wu, Teresa (Thesis advisor) / Ju, Feng (Committee member) / Arizona State University (Publisher)

Created2017

A Disease Progression Modeling Framework for Nonalcoholic Steatohepatitis Using Multiparametric Serial Magnetic Resonance Imaging and Elastography

Description

Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to…

Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to being asymptotic, it usually goes unnoticed and may lead to liver failure if not treated at the right time. Currently, liver biopsy is the gold standard to diagnose NASH, but being an invasive procedure, it comes with it's own complications along with the inconvenience of sampling repeated measurements over a period of time. Hence, noninvasive procedures to assess NASH are urgently required. Magnetic Resonance Elastography (MRE) based Shear Stiffness and Loss Modulus along with Magnetic Resonance Imaging based proton density fat fraction have been successfully combined to predict NASH stages However, their role in the prediction of disease progression still remains to be investigated. This thesis thus looks into combining features from serial MRE observations to develop statistical models to predict NASH progression. It utilizes data from an experiment conducted on male mice to develop progressive and regressive NASH and trains ordinal models, ordered probit regression and ordinal forest on labels generated from a logistic regression model. The models are assessed on histological data collected at the end point of the experiment. The models developed provide a framework to utilize a non-invasive tool to predict NASH disease progression.

ContributorsDeshpande, Eeshan (Author) / Ju, Feng (Thesis advisor) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2021

Data Driven Personalized Management of Hospital Inventory of Perishable and Substitutable Blood Units

Description

The use of Red Blood Cells (RBCs) is a pillar of modern health care. Annually, the lives of hundreds of thousands of patients are saved through ready access to safe, fresh, blood-type compatible RBCs. Worldwide, hospitals have the common goal to better utilize available blood units by maximizing patients served…

The use of Red Blood Cells (RBCs) is a pillar of modern health care. Annually, the lives of hundreds of thousands of patients are saved through ready access to safe, fresh, blood-type compatible RBCs. Worldwide, hospitals have the common goal to better utilize available blood units by maximizing patients served and reducing blood wastage. Managing blood is challenging because blood is perishable, its supply is stochastic and its demand pattern is highly uncertain. Additionally, RBCs are typed and patient compatibility is required.

This research focuses on improving blood inventory management at the hospital level. It explores the importance of hospital characteristics, such as demand rate and blood-type distribution in supply and demand, for improving RBC inventory management. Available inventory models make simplifying assumptions; they tend to be general and do not utilize available data that could improve blood delivery. This dissertation develops useful and realistic models that incorporate data characterizing the hospital inventory position, distribution of blood types of donors and the population being served.

The dissertation contributions can be grouped into three areas. First, simulations are used to characterize the benefits of demand forecasting. In addition to forecast accuracy, it shows that characteristics such as forecast horizon, the age of replenishment units, and the percentage of demand that is forecastable influence the benefits resulting from demand variability reduction.

Second, it develops Markov decision models for improved allocation policies under emergency conditions, where only the units on the shelf are available for dispensing. In this situation the RBC perishability has no impact due to the short timeline for decision making. Improved location-specific policies are demonstrated via simulation models for two emergency event types: mass casualty events and pandemic influenza.

Third, improved allocation policies under normal conditions are found using Markov decision models that incorporate temporal dynamics. In this case, hospitals receive replenishment and units age and outdate. The models are solved using Approximate Dynamic Programming with model-free approximate policy iteration, using machine learning algorithms to approximate value or policy functions. These are the first stock- and age-dependent allocation policies that engage substitution between blood type groups to improve inventory performance.

ContributorsDumkrieger, Gina (Author) / Mirchandani, Pitu B. (Thesis advisor) / Fowler, John (Committee member) / Wu, Teresa (Committee member) / Ju, Feng (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by