Matching Items (3)
Filtering by

Clear all filters

151511-Thumbnail Image.png
Description
With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.
ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)
Created2013
149723-Thumbnail Image.png
Description
This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.
ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)
Created2011
156337-Thumbnail Image.png
Description
Healthcare operations have enjoyed reduced costs, improved patient safety, and

innovation in healthcare policy over a huge variety of applications by tackling prob-

lems via the creation and optimization of descriptive mathematical models to guide

decision-making. Despite these accomplishments, models are stylized representations

of real-world applications, reliant on accurate estimations from historical data to

Healthcare operations have enjoyed reduced costs, improved patient safety, and

innovation in healthcare policy over a huge variety of applications by tackling prob-

lems via the creation and optimization of descriptive mathematical models to guide

decision-making. Despite these accomplishments, models are stylized representations

of real-world applications, reliant on accurate estimations from historical data to jus-

tify their underlying assumptions. To protect against unreliable estimations which

can adversely affect the decisions generated from applications dependent on fully-

realized models, techniques that are robust against misspecications are utilized while

still making use of incoming data for learning. Hence, new robust techniques are ap-

plied that (1) allow for the decision-maker to express a spectrum of pessimism against

model uncertainties while (2) still utilizing incoming data for learning. Two main ap-

plications are investigated with respect to these goals, the first being a percentile

optimization technique with respect to a multi-class queueing system for application

in hospital Emergency Departments. The second studies the use of robust forecasting

techniques in improving developing countries’ vaccine supply chains via (1) an inno-

vative outside of cold chain policy and (2) a district-managed approach to inventory

control. Both of these research application areas utilize data-driven approaches that

feature learning and pessimism-controlled robustness.
ContributorsBren, Austin (Author) / Saghafian, Soroush (Thesis advisor) / Mirchandani, Pitu (Thesis advisor) / Wu, Teresa (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)
Created2018