Matching Items (7)
Filtering by

Clear all filters

152235-Thumbnail Image.png
Description
The ability to design high performance buildings has acquired great importance in recent years due to numerous federal, societal and environmental initiatives. However, this endeavor is much more demanding in terms of designer expertise and time. It requires a whole new level of synergy between automated performance prediction with the

The ability to design high performance buildings has acquired great importance in recent years due to numerous federal, societal and environmental initiatives. However, this endeavor is much more demanding in terms of designer expertise and time. It requires a whole new level of synergy between automated performance prediction with the human capabilities to perceive, evaluate and ultimately select a suitable solution. While performance prediction can be highly automated through the use of computers, performance evaluation cannot, unless it is with respect to a single criterion. The need to address multi-criteria requirements makes it more valuable for a designer to know the "latitude" or "degrees of freedom" he has in changing certain design variables while achieving preset criteria such as energy performance, life cycle cost, environmental impacts etc. This requirement can be met by a decision support framework based on near-optimal "satisficing" as opposed to purely optimal decision making techniques. Currently, such a comprehensive design framework is lacking, which is the basis for undertaking this research. The primary objective of this research is to facilitate a complementary relationship between designers and computers for Multi-Criterion Decision Making (MCDM) during high performance building design. It is based on the application of Monte Carlo approaches to create a database of solutions using deterministic whole building energy simulations, along with data mining methods to rank variable importance and reduce the multi-dimensionality of the problem. A novel interactive visualization approach is then proposed which uses regression based models to create dynamic interplays of how varying these important variables affect the multiple criteria, while providing a visual range or band of variation of the different design parameters. The MCDM process has been incorporated into an alternative methodology for high performance building design referred to as Visual Analytics based Decision Support Methodology [VADSM]. VADSM is envisioned to be most useful during the conceptual and early design performance modeling stages by providing a set of potential solutions that can be analyzed further for final design selection. The proposed methodology can be used for new building design synthesis as well as evaluation of retrofits and operational deficiencies in existing buildings.
ContributorsDutta, Ranojoy (Author) / Reddy, T Agami (Thesis advisor) / Runger, George C. (Committee member) / Addison, Marlin S. (Committee member) / Arizona State University (Publisher)
Created2013
151511-Thumbnail Image.png
Description
With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.
ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)
Created2013
149723-Thumbnail Image.png
Description
This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.
ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)
Created2011
150555-Thumbnail Image.png
Description
Supply chains are increasingly complex as companies branch out into newer products and markets. In many cases, multiple products with moderate differences in performance and price compete for the same unit of demand. Simultaneous occurrences of multiple scenarios (competitive, disruptive, regulatory, economic, etc.), coupled with business decisions (pricing, product introduction,

Supply chains are increasingly complex as companies branch out into newer products and markets. In many cases, multiple products with moderate differences in performance and price compete for the same unit of demand. Simultaneous occurrences of multiple scenarios (competitive, disruptive, regulatory, economic, etc.), coupled with business decisions (pricing, product introduction, etc.) can drastically change demand structures within a short period of time. Furthermore, product obsolescence and cannibalization are real concerns due to short product life cycles. Analytical tools that can handle this complexity are important to quantify the impact of business scenarios/decisions on supply chain performance. Traditional analysis methods struggle in this environment of large, complex datasets with hundreds of features becoming the norm in supply chains. We present an empirical analysis framework termed Scenario Trees that provides a novel representation for impulse and delayed scenario events and a direction for modeling multivariate constrained responses. Amongst potential learners, supervised learners and feature extraction strategies based on tree-based ensembles are employed to extract the most impactful scenarios and predict their outcome on metrics at different product hierarchies. These models are able to provide accurate predictions in modeling environments characterized by incomplete datasets due to product substitution, missing values, outliers, redundant features, mixed variables and nonlinear interaction effects. Graphical model summaries are generated to aid model understanding. Models in complex environments benefit from feature selection methods that extract non-redundant feature subsets from the data. Additional model simplification can be achieved by extracting specific levels/values that contribute to variable importance. We propose and evaluate new analytical methods to address this problem of feature value selection and study their comparative performance using simulated datasets. We show that supply chain surveillance can be structured as a feature value selection problem. For situations such as new product introduction, a bottom-up approach to scenario analysis is designed using an agent-based simulation and data mining framework. This simulation engine envelopes utility theory, discrete choice models and diffusion theory and acts as a test bed for enacting different business scenarios. We demonstrate the use of machine learning algorithms to analyze scenarios and generate graphical summaries to aid decision making.
ContributorsShinde, Amit (Author) / Runger, George C. (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Villalobos, Rene (Committee member) / Janakiram, Mani (Committee member) / Arizona State University (Publisher)
Created2012
135328-Thumbnail Image.png
Description
Millennials are the group of people that make up the newer generation of the world's population and they are constantly surrounded by technology, as well as known for having different values than the previous generations. Marketers have to adapt to newer ways to appeal to millennials and secure their loyalty

Millennials are the group of people that make up the newer generation of the world's population and they are constantly surrounded by technology, as well as known for having different values than the previous generations. Marketers have to adapt to newer ways to appeal to millennials and secure their loyalty since millennials are always on the lookout for the next best thing and will "trade up for brands that matter, but trade down when brand value is weak", it poses a challenge for the marketing departments of companies (Fromm, J. & Parks, J.). The airline industry is one of the fastest growing sectors as "the total number of people flying on U.S. airlines will increase from 745.5 million in 2014 and grow to 1.15 billion in 2034," which shows that airlines have a wider population to market to, and will need to improve their marketing strategies to differentiate from competitors (Power). The financial sector also has a difficult time reaching out to millennials because "millennials are hesitant to take financial risks," as well as downing in college debt, while not making as much money as previous generations (Fromm, J. & Parks, J.). By looking into the marketing strategies, specifically using social media platforms, of the two industries, an understanding can be gathered of what millennials are attracted to. Along with looking at the marketing strategies of financial and airline industries, I looked at the perspectives of these industries in different countries, which is important to look at because then we can see if the values of millennials vary across different cultures. Countries chosen for research to further examine their cultural differences in terms of marketing practices are the United States and England. The main form of marketing that was used for this research were social media accounts of the companies, and seeing how they used the social networking platforms to reach and engage with their consumers, especially with those of the millennial generation. The companies chosen for further research for the airline industry from England were British Airways, EasyJet, and Virgin Atlantic, while for the U.S. Delta Airlines, Inc., Southwest Airlines, and United were chosen. The companies chosen to further examine within the finance industry from England include Barclay's, HSBC, and Lloyd's Bank, while for the U.S. the banks selected were Bank of America, JPMorgan Chase, and Wells Fargo. The companies for this study were chosen because they are among the top five in their industry, as well as all companies that I have had previous interactions with. It was meant to see what the companies at the top of the industry were doing that set them apart from their competitors in terms of social media marketing content and see if there were features they lacked that could be changed or improvements they could make. A survey was also conducted to get a better idea of the attitudes and behaviors of millennials when it comes to the airline and finance industries, as well as towards social media marketing practices.
ContributorsPathak, Krisha Hemanshu (Author) / Kumar, Ajith (Thesis director) / Arora, Hina (Committee member) / W. P. Carey School of Business (Contributor) / Department of Information Systems (Contributor) / Department of Marketing (Contributor) / Hugh Downs School of Human Communication (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
134871-Thumbnail Image.png
Description
This thesis, through a thorough literature and content review, discusses the various ways that data analytics and supply chain management intersect. Both fields have been around for a while, but are incredibly aided by the information age we live in today. Today's ERP systems and supply chain software packages use

This thesis, through a thorough literature and content review, discusses the various ways that data analytics and supply chain management intersect. Both fields have been around for a while, but are incredibly aided by the information age we live in today. Today's ERP systems and supply chain software packages use advanced analytic techniques and algorithms to optimize every aspect of supply chain management. This includes aspects like inventory optimization, portfolio management, network design, production scheduling, fleet planning, supplier evaluation, and others. The benefit of these analytic techniques is a reduction in costs as well as an improvement in overall supply chain performance and efficiencies. The paper begins with a short historical context on business analytics and optimization then moves on to the impact and application of analytics in the supply chain today. Following that the implications of big data are explored, along with how a company might begin to take advantage of big data and what challenges a firm may face along the way. The current tools used by supply chain professionals are then discussed. There is then a section on the most up and coming technologies; the internet of things, blockchain technology, additive manufacturing (3D printing), and machine learning; and how those technologies may further enable the successful use of analytics to improve supply chain management. Companies that do take advantage of analytics in their supply chains are sure to maintain a competitive advantage over those firms that fail to do so.
ContributorsCotton, Ryan Aaron (Author) / Taylor, Todd (Thesis director) / Arora, Hina (Committee member) / Department of Information Systems (Contributor) / Department of Supply Chain Management (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
155725-Thumbnail Image.png
Description
Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF

Random forest (RF) is a popular and powerful technique nowadays. It can be used for classification, regression and unsupervised clustering. In its original form introduced by Leo Breiman, RF is used as a predictive model to generate predictions for new observations. Recent researches have proposed several methods based on RF for feature selection and for generating prediction intervals. However, they are limited in their applicability and accuracy. In this dissertation, RF is applied to build a predictive model for a complex dataset, and used as the basis for two novel methods for biomarker discovery and generating prediction interval.

Firstly, a biodosimetry is developed using RF to determine absorbed radiation dose from gene expression measured from blood samples of potentially exposed individuals. To improve the prediction accuracy of the biodosimetry, day-specific models were built to deal with day interaction effect and a technique of nested modeling was proposed. The nested models can fit this complex data of large variability and non-linear relationships.

Secondly, a panel of biomarkers was selected using a data-driven feature selection method as well as handpick, considering prior knowledge and other constraints. To incorporate domain knowledge, a method called Know-GRRF was developed based on guided regularized RF. This method can incorporate domain knowledge as a penalized term to regulate selection of candidate features in RF. It adds more flexibility to data-driven feature selection and can improve the interpretability of models. Know-GRRF showed significant improvement in cross-species prediction when cross-species correlation was used to guide selection of biomarkers. The method can also compete with existing methods using intrinsic data characteristics as alternative of domain knowledge in simulated datasets.

Lastly, a novel non-parametric method, RFerr, was developed to generate prediction interval using RF regression. This method is widely applicable to any predictive models and was shown to have better coverage and precision than existing methods on the real-world radiation dataset, as well as benchmark and simulated datasets.
ContributorsGuan, Xin (Author) / Liu, Li (Thesis advisor) / Runger, George C. (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)
Created2017