Search Content

A model fusion based framework for imbalanced classification problem with noisy dataset

Description

Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data…

Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data imbalance and data noise have been treated separately in the data mining field. Yet, such approach ignores the mutual effects and as a result may lead to new problems. A desirable solution is to tackle these two issues jointly. Noting the complementary nature of generative and discriminative models, this research proposes a unified model fusion based framework to handle the imbalanced classification with noisy dataset.

The phase I study focuses on the imbalanced classification problem. A generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalance data to improve the discrimination power on imbalanced classes. By fusing this knowledge into cost SVM (cSVM), a CSG method is proposed. Experimental results show the effectiveness of CSG in dealing with imbalanced classification problems.

The phase II study expands the research scope to include the noisy dataset into the imbalanced classification problem. A model fusion based framework, K Nearest Gaussian (KNG) is proposed. KNG employs a generative modeling method, GMM, to model the training data as Gaussian mixtures and form adjustable confidence regions which are less sensitive to data imbalance and noise. Motivated by the K-nearest neighbor algorithm, the neighboring Gaussians are used to classify the testing instances. Experimental results show KNG method greatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset.

The phase III study addresses the issues of feature selection and parameter tuning of KNG algorithm. To further improve the performance of KNG algorithm, a Particle Swarm Optimization based method (PSO-KNG) is proposed. PSO-KNG formulates model parameters and data features into the same particle vector and thus can search the best feature and parameter combination jointly. The experimental results show that PSO can greatly improve the performance of KNG with better accuracy and much lower computational cost.

ContributorsHe, Miao (Author) / Wu, Teresa (Thesis advisor) / Li, Jing (Committee member) / Silva, Alvin (Committee member) / Borror, Connie (Committee member) / Arizona State University (Publisher)

Created2014

Design and analysis of ambulance diversion policies

Description

Overcrowding of Emergency Departments (EDs) put the safety of patients at risk. Decision makers implement Ambulance Diversion (AD) as a way to relieve congestion and ensure timely treatment delivery. However, ineffective design of AD policies reduces the accessibility to emergency care and adverse events may arise. The objective of this…

Overcrowding of Emergency Departments (EDs) put the safety of patients at risk. Decision makers implement Ambulance Diversion (AD) as a way to relieve congestion and ensure timely treatment delivery. However, ineffective design of AD policies reduces the accessibility to emergency care and adverse events may arise. The objective of this dissertation is to propose methods to design and analyze effective AD policies that consider performance measures that are related to patient safety. First, a simulation-based methodology is proposed to evaluate the mean performance and variability of single-factor AD policies in a single hospital environment considering the trade-off between average waiting time and percentage of time spent on diversion. Regression equations are proposed to obtain parameters of AD policies that yield desired performance level. The results suggest that policies based on the total number of patients waiting are more consistent and provide a high precision in predicting policy performance. Then, a Markov Decision Process model is proposed to obtain the optimal AD policy assuming that information to start treatment in a neighboring hospital is available. The model is designed to minimize the average tardiness per patient in the long run. Tardiness is defined as the time that patients have to wait beyond a safety time threshold to start receiving treatment. Theoretical and computational analyses show that there exists an optimal policy that is of threshold type, and diversion can be a good alternative to decrease tardiness when ambulance patients cause excessive congestion in the ED. Furthermore, implementation of AD policies in a simulation model that accounts for several relaxations of the assumptions suggests that the model provides consistent policies under multiple scenarios. Finally, a genetic algorithm is combined with simulation to design effective policies for multiple hospitals simultaneously. The model has the objective of minimizing the time that patients spend in non-value added activities, including transportation, waiting and boarding in the ED. Moreover, the AD policies are combined with simple ambulance destination policies to create ambulance flow control mechanisms. Results show that effective ambulance management can significantly reduce the time that patients have to wait to receive appropriate level of care.

ContributorsRamirez Nafarrate, Adrian (Author) / Fowler, John W. (Thesis advisor) / Wu, Teresa (Thesis advisor) / Gel, Esma S. (Committee member) / Limon, Jorge (Committee member) / Arizona State University (Publisher)

Created2011

Capacitated vehicle routing problem with time windows: a case study on pickup of dietary products in nonprofit organization

Description

This thesis presents a successful application of operations research techniques in nonprofit distribution system to improve the distribution efficiency and increase customer service quality. It focuses on truck routing problems faced by St. Mary’s Food Bank Distribution Center. This problem is modeled as a capacitated vehicle routing problem to improve the distribution efficiency…

This thesis presents a successful application of operations research techniques in nonprofit distribution system to improve the distribution efficiency and increase customer service quality. It focuses on truck routing problems faced by St. Mary’s Food Bank Distribution Center. This problem is modeled as a capacitated vehicle routing problem to improve the distribution efficiency and is extended to capacitated vehicle routing problem with time windows to increase customer service quality. Several heuristics are applied to solve these vehicle routing problems and tested in well-known benchmark problems. Algorithms are tested by comparing the results with the plan currently used by St. Mary’s Food Bank Distribution Center. The results suggest heuristics are quite completive: average 17% less trucks and 28.52% less travel time are used in heuristics’ solution.

ContributorsLi, Xiaoyan (Author) / Askin, Ronald (Thesis advisor) / Wu, Teresa (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2015

Development of Complementary Fresh-Food Systems Through the Exploration and Identification of Profit-Maximizing, Supply Chains

Description

One of the greatest 21st century challenges is meeting the needs of a growing world population expected to increase 35% by 2050 given projected trends in diets, consumption and income. This in turn requires a 70-100% improvement on current production capability, even as the world is undergoing systemic climate…

One of the greatest 21st century challenges is meeting the needs of a growing world population expected to increase 35% by 2050 given projected trends in diets, consumption and income. This in turn requires a 70-100% improvement on current production capability, even as the world is undergoing systemic climate pattern changes. This growth not only translates to higher demand for staple products, such as rice, wheat, and beans, but also creates demand for high-value products such as fresh fruits and vegetables (FVs), fueled by better economic conditions and a more health conscious consumer. In this case, it would seem that these trends would present opportunities for the economic development of environmentally well-suited regions to produce high-value products. Interestingly, many regions with production potential still exhibit a considerable gap between their current and ‘true’ maximum capability, especially in places where poverty is more common. Paradoxically, often high-value, horticultural products could be produced in these regions, if relatively small capital investments are made and proper marketing and distribution channels are created. The hypothesis is that small farmers within local agricultural systems are well positioned to take advantage of existing sustainable and profitable opportunities, specifically in high-value agricultural production. Unearthing these opportunities can entice investments in small farming development and help them enter the horticultural industry, thus expand the volume, variety and/or quality of products available for global consumption. In this dissertation, the objective is three-fold: (1) to demonstrate the hidden production potential that exist within local agricultural communities, (2) highlight the importance of supply chain modeling tools in the strategic design of local agricultural systems, and (3) demonstrate the application of optimization and machine learning techniques to strategize the implementation of protective agricultural technologies.

As part of this dissertation, a yield approximation method is developed and integrated with a mixed-integer program to estimate a region’s potential to produce non-perennial, vegetable items. This integration offers practical approximations that help decision-makers identify technologies needed to protect agricultural production, alter harvesting patterns to better match market behavior, and provide an analytical framework through which external investment entities can assess different production options.

ContributorsFlores, Hector M. (Author) / Villalobos, Rene (Thesis advisor) / Pan, Rong (Committee member) / Wu, Teresa (Committee member) / Parker, Nathan (Committee member) / Arizona State University (Publisher)

Created2017

Network maintenance and capacity management with applications in transportation

Description

This research develops heuristics to manage both mandatory and optional network capacity reductions to better serve the network flows. The main application discussed relates to transportation networks, and flow cost relates to travel cost of users of the network. Temporary mandatory capacity reductions are required by maintenance activities. The objective…

This research develops heuristics to manage both mandatory and optional network capacity reductions to better serve the network flows. The main application discussed relates to transportation networks, and flow cost relates to travel cost of users of the network. Temporary mandatory capacity reductions are required by maintenance activities. The objective of managing maintenance activities and the attendant temporary network capacity reductions is to schedule the required segment closures so that all maintenance work can be completed on time, and the total flow cost over the maintenance period is minimized for different types of flows. The goal of optional network capacity reduction is to selectively reduce the capacity of some links to improve the overall efficiency of user-optimized flows, where each traveler takes the route that minimizes the traveler’s trip cost. In this dissertation, both managing mandatory and optional network capacity reductions are addressed with the consideration of network-wide flow diversions due to changed link capacities.

This research first investigates the maintenance scheduling in transportation networks with service vehicles (e.g., truck fleets and passenger transport fleets), where these vehicles are assumed to take the system-optimized routes that minimize the total travel cost of the fleet. This problem is solved with the randomized fixed-and-optimize heuristic developed. This research also investigates the maintenance scheduling in networks with multi-modal traffic that consists of (1) regular human-driven cars with user-optimized routing and (2) self-driving vehicles with system-optimized routing. An iterative mixed flow assignment algorithm is developed to obtain the multi-modal traffic assignment resulting from a maintenance schedule. The genetic algorithm with multi-point crossover is applied to obtain a good schedule.

Based on the Braess’ paradox that removing some links may alleviate the congestion of user-optimized flows, this research generalizes the Braess’ paradox to reduce the capacity of selected links to improve the efficiency of the resultant user-optimized flows. A heuristic is developed to identify links to reduce capacity, and the corresponding capacity reduction amounts, to get more efficient total flows. Experiments on real networks demonstrate the generalized Braess’ paradox exists in reality, and the heuristic developed solves real-world test cases even when commercial solvers fail.

ContributorsPeng, Dening (Author) / Mirchandani, Pitu B. (Thesis advisor) / Sefair, Jorge (Committee member) / Wu, Teresa (Committee member) / Zhou, Xuesong (Committee member) / Arizona State University (Publisher)

Created2017

Stochastic Modeling and Optimization to Improve Identification and Treatment of Alzheimer’s Disease

Description

Mathematical modeling and decision-making within the healthcare industry have given means to quantitatively evaluate the impact of decisions into diagnosis, screening, and treatment of diseases. In this work, we look into a specific, yet very important disease, the Alzheimer. In the United States, Alzheimer’s Disease (AD) is the 6th leading…

Mathematical modeling and decision-making within the healthcare industry have given means to quantitatively evaluate the impact of decisions into diagnosis, screening, and treatment of diseases. In this work, we look into a specific, yet very important disease, the Alzheimer. In the United States, Alzheimer’s Disease (AD) is the 6th leading cause of death. Diagnosis of AD cannot be confidently confirmed until after death. This has prompted the importance of early diagnosis of AD, based upon symptoms of cognitive decline. A symptom of early cognitive decline and indicator of AD is Mild Cognitive Impairment (MCI). In addition to this qualitative test, Biomarker tests have been proposed in the medical field including p-Tau, FDG-PET, and hippocampal. These tests can be administered to patients as early detectors of AD thus improving patients’ life quality and potentially reducing the costs of the health structure. Preliminary work has been conducted in the development of a Sequential Tree Based Classifier (STC), which helps medical providers predict if a patient will contract AD or not, by sequentially testing these biomarker tests. The STC model, however, has its limitations and the need for a more complex, robust model is needed. In fact, STC assumes a general linear model as the status of the patient based upon the tests results. We take a simulation perspective and try to define a more complex model that represents the patient evolution in time.

Specifically, this thesis focuses on the formulation of a Markov Chain model that is complex and robust. This Markov Chain model emulates the evolution of MCI patients based upon doctor visits and the sequential administration of biomarker tests. Data provided to create this Markov Chain model were collected by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The data lacked detailed information of the sequential administration of the biomarker tests and therefore, different analytical approaches were tried and conducted in order to calibrate the model. The resulting Markov Chain model provided the capability to conduct experiments regarding different parameters of the Markov Chain and yielded different results of patients that contracted AD and those that did not, leading to important insights into effect of thresholds and sequence on patient prediction capability as well as health costs reduction.

The data in this thesis was provided from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI investigators did not contribute to any analysis or writing of this thesis. A list of the ADNI investigators can be found at: http://adni.loni.usc.edu/about/governance/principal-investigators/ .

ContributorsCamarena, Raquel (Author) / Pedrielli, Giulia (Thesis advisor) / Li, Jing (Thesis advisor) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2018

Design and Mining of Health Information Systems for Process and Patient Care Improvement

Description

In healthcare facilities, health information systems (HISs) are used to serve different purposes. The radiology department adopts multiple HISs in managing their operations and patient care. In general, the HISs that touch radiology fall into two categories: tracking HISs and archive HISs. Electronic Health Records (EHR) is a typical tracking…

In healthcare facilities, health information systems (HISs) are used to serve different purposes. The radiology department adopts multiple HISs in managing their operations and patient care. In general, the HISs that touch radiology fall into two categories: tracking HISs and archive HISs. Electronic Health Records (EHR) is a typical tracking HIS, which tracks the care each patient receives at multiple encounters and facilities. Archive HISs are typically specialized databases to store large-size data collected as part of the patient care. A typical example of an archive HIS is the Picture Archive and Communication System (PACS), which provides economical storage and convenient access to diagnostic images from multiple modalities. How to integrate such HISs and best utilize their data remains a challenging problem due to the disparity of HISs as well as high-dimensionality and heterogeneity of the data. My PhD dissertation research includes three inter-connected and integrated topics and focuses on designing integrated HISs and further developing statistical models and machine learning algorithms for process and patient care improvement.

Topic 1: Design of super-HIS and tracking of quality of care (QoC). My research developed an information technology that integrates multiple HISs in radiology, and proposed QoC metrics defined upon the data that measure various dimensions of care. The DDD assisted the clinical practices and enabled an effective intervention for reducing lengthy radiologist turnaround times for patients.

Topic 2: Monitoring and change detection of QoC data streams for process improvement. With the super-HIS in place, high-dimensional data streams of QoC metrics are generated. I developed a statistical model for monitoring high- dimensional data streams that integrated Singular Vector Decomposition (SVD) and process control. The algorithm was applied to QoC metrics data, and additionally extended to another application of monitoring traffic data in communication networks.

Topic 3: Deep transfer learning of archive HIS data for computer-aided diagnosis (CAD). The novelty of the CAD system is the development of a deep transfer learning algorithm that combines the ideas of transfer learning and multi- modality image integration under the deep learning framework. Our system achieved high accuracy in breast cancer diagnosis compared with conventional machine learning algorithms.

ContributorsWang, Kun (Author) / Li, Jing (Thesis advisor) / Wu, Teresa (Committee member) / Pan, Rong (Committee member) / Zwart, Christine M. (Committee member) / Arizona State University (Publisher)

Created2018

Intervention Strategies for the DoD Acquisition Process Using Simulation

Description

The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation…

The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation study. This research seeks to determine the acquisition processes that contribute significantly to total simulated program time in the acquisition system for all programs reaching Milestone C. Specifically, this research examines the effect of increased scope management, technology maturity, and decreased variation and mean process times in post-Design Readiness Review contractor activities by performing additional simulation analyses. Potential policies are formulated from the results to further improve program acquisition completion time.

ContributorsWorger, Danielle Marie (Author) / Wu, Teresa (Thesis director) / Shunk, Dan (Committee member) / Wirthlin, J. Robert (Committee member) / Industrial, Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2013-05

Fix-and-optimize heuristic and MP-based approaches for capacitated lot sizing problem with setup carryover, setup splitting and backlogging

Description

In this thesis, a single-level, multi-item capacitated lot sizing problem with setup carryover, setup splitting and backlogging is investigated. This problem is typically used in the tactical and operational planning stage, determining the optimal production quantities and sequencing for all the products in the planning horizon. Although the capacitated lot…

In this thesis, a single-level, multi-item capacitated lot sizing problem with setup carryover, setup splitting and backlogging is investigated. This problem is typically used in the tactical and operational planning stage, determining the optimal production quantities and sequencing for all the products in the planning horizon. Although the capacitated lot sizing problems have been investigated with many different features from researchers, the simultaneous consideration of setup carryover and setup splitting is relatively new. This consideration is beneficial to reduce costs and produce feasible production schedule. Setup carryover allows the production setup to be continued between two adjacent periods without incurring extra setup costs and setup times. Setup splitting permits the setup to be partially finished in one period and continued in the next period, utilizing the capacity more efficiently and remove infeasibility of production schedule.

The main approaches are that first the simple plant location formulation is adopted to reformulate the original model. Furthermore, an extended formulation by redefining the idle period constraints is developed to make the formulation tighter. Then for the purpose of evaluating the solution quality from heuristic, three types of valid inequalities are added to the model. A fix-and-optimize heuristic with two-stage product decomposition and period decomposition strategies is proposed to solve the formulation. This generic heuristic solves a small portion of binary variables and all the continuous variables rapidly in each subproblem. In addition, the case with demand backlogging is also incorporated to demonstrate that making additional assumptions to the basic formulation does not require to completely altering the heuristic.

The contribution of this thesis includes several aspects: the computational results show the capability, flexibility and effectiveness of the approaches. The average optimality gap is 6% for data without backlogging and 8% for data with backlogging, respectively. In addition, when backlogging is not allowed, the performance of fix-and-optimize heuristic is stable regardless of period length. This gives advantage of using such approach to plan longer production schedule. Furthermore, the performance of the proposed solution approaches is analyzed so that later research on similar topics could compare the result with different solution strategies.

ContributorsChen, Cheng-Lung (Author) / Zhang, Muhong (Thesis advisor) / Mohan, Srimathy (Thesis advisor) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2015

Unit commitment with uncertainty

Description

This dissertation carries out an inter-disciplinary research of operations research, statistics, power system engineering, and economics. Specifically, this dissertation focuses on a special power system scheduling problem, a unit commitment problem with uncertainty. This scheduling problem is a two-stage decision problem. In the first stage, system operator determines the binary…

This dissertation carries out an inter-disciplinary research of operations research, statistics, power system engineering, and economics. Specifically, this dissertation focuses on a special power system scheduling problem, a unit commitment problem with uncertainty. This scheduling problem is a two-stage decision problem. In the first stage, system operator determines the binary commitment status (on or off) of generators in advance. In the second stage, after the realization of uncertainty, the system operator determines generation levels of the generators. The goal of this dissertation is to develop computationally-tractable methodologies and algorithms to solve large-scale unit commitment problems with uncertainty.

In the first part of this dissertation, two-stage models are studied to solve the problem. Two solution methods are studied and improved: stochastic programming and robust optimization. A scenario-based progressive hedging decomposition algorithm is applied. Several new hedging mechanisms and parameter selections rules are proposed and tested. A data-driven uncertainty set is proposed to improve the performance of robust optimization.

In the second part of this dissertation, a framework to reduce the two-stage stochastic program to a single-stage deterministic formulation is proposed. Most computation of the proposed approach can be done by offline studies. With the assistance of offline analysis, simulation, and data mining, the unit commitment problems with uncertainty can be solved efficiently.

Finally, the impacts of uncertainty on energy market prices are studied. A new component of locational marginal price, a marginal security component, which is the weighted shadow prices of the proposed security constraints, is proposed to better represent energy prices.

ContributorsLi, Chao (Author) / Hedman, Kory W (Thesis advisor) / Zhang, Muhong (Thesis advisor) / Mirchandani, Pitu B. (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by