Search Content

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Complexity measurement of cyber physical systems

Description

Modern automotive and aerospace products are large cyber-physical system involving both software and hardware, composed of mechanical, electrical and electronic components. The increasing complexity of such systems is a major concern as it impacts development time and effort, as well as, initial and operational costs. Towards the goal of measuring…

Modern automotive and aerospace products are large cyber-physical system involving both software and hardware, composed of mechanical, electrical and electronic components. The increasing complexity of such systems is a major concern as it impacts development time and effort, as well as, initial and operational costs. Towards the goal of measuring complexity, the first step is to determine factors that contribute to it and metrics to qualify it. These complexity components can be further use to (a) estimate the cost of cyber-physical system, (b) develop methods that can reduce the cost of cyber-physical system and (c) make decision such as selecting one design from a set of possible solutions or variants. To determine the contributions to complexity we conducted survey at an aerospace company. We found out three types of contribution to the complexity of the system: Artifact complexity, Design process complexity and Manufacturing complexity. In all three domains, we found three types of metrics: size complexity, numeric complexity (degree of coupling) and technological complexity (solvability).We propose a formal representation for all three domains as graphs, but with different interpretations of entity (node) and relation (link) corresponding to the above three aspects. Complexities of these components are measured using algorithms defined in graph theory. Two experiments were conducted to check the meaningfulness and feasibility of the complexity metrics. First experiment was mechanical transmission and the scope of this experiment was component level. All the design stages, from concept to manufacturing, were considered in this experiment. The second experiment was conducted on hybrid powertrains. The scope of this experiment was assembly level and only artifact complexity is considered because of the limited resources. Finally the calibration of these complexity measures was conducted at an aerospace company but the results cannot be included in this thesis.

ContributorsGurpreet Singh (Author) / Shah, Jami (Thesis advisor) / Runger, George C. (Committee member) / Davidson, Joseph (Committee member) / Arizona State University (Publisher)

Created2011

Complexity studies of firm dynamics

Description

This thesis consists of three projects employing complexity economics methods to explore firm dynamics. The first is the Firm Ecosystem Model, which addresses the institutional conditions of capital access and entrenched competitive advantage. Larger firms will be more competitive than smaller firms due to efficiencies of scale, but the persistence…

This thesis consists of three projects employing complexity economics methods to explore firm dynamics. The first is the Firm Ecosystem Model, which addresses the institutional conditions of capital access and entrenched competitive advantage. Larger firms will be more competitive than smaller firms due to efficiencies of scale, but the persistence of larger firms is also supported institutionally through mechanisms such as tax policy, capital access mechanisms and industry-favorable legislation. At the same time, evidence suggests that small firms innovate more than larger firms, and an aggressive firm-as-value perspective incentivizes early investment in new firms in an attempt to capture that value. The Ecological Firm Model explores the effects of the differences in innovation and investment patterns and persistence rates between large and small firms.

The second project is the Structural Inertia Model, which is intended to build theory around why larger firms may be less successful in capturing new marketshare than smaller firms, as well as to advance fitness landscape methods. The model explores the possibility that firms with larger scopes may be less effective in mitigating the costs of cooperation because conditions may arise that cause intrafirm conflicts. The model is implemented on structured fitness landscapes derived using the maximal order of interaction (NM) formulation and described using local optima networks (LONs), thus integrating these novel techniques.

Finally, firm dynamics can serve as a proxy for the ease at which people can voluntarily enter into the legal cooperative agreements that constitute firms. The third project, the Emergent Firm model, is an exploration of how this dynamic of voluntary association may be affected by differing capital institutions, and explores the macroeconomic implications of the economies that emerge out of the various resulting firm populations.

ContributorsApplegate, Joffa Michele (Author) / Janssen, Marcus A (Thesis advisor) / Hoetker, Glenn (Committee member) / Johnston, Erik W., 1977- (Committee member) / Shutter, Shade (Committee member) / Arizona State University (Publisher)

Created2018

A study of accelerated Bayesian additive regression trees

Description

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading…

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading algorithms, including BART. The results show that XBART maintains BART’s predictive power while reducing its computation time. The thesis also describes the development of a Python package implementing XBART.

ContributorsYalov, Saar (Author) / Hahn, P. Richard (Thesis advisor) / McCulloch, Robert (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)

Created2019

Cognitive software complexity analysis

Description

A well-defined Software Complexity Theory which captures the Cognitive means of algorithmic information comprehension is needed in the domain of cognitive informatics & computing. The existing complexity heuristics are vague and empirical. Industrial software is a combination of algorithms implemented. However, it would be wrong to conclude that algorithmic space…

A well-defined Software Complexity Theory which captures the Cognitive means of algorithmic information comprehension is needed in the domain of cognitive informatics & computing. The existing complexity heuristics are vague and empirical. Industrial software is a combination of algorithms implemented. However, it would be wrong to conclude that algorithmic space and time complexity is software complexity. An algorithm with multiple lines of pseudocode might sometimes be simpler to understand that the one with fewer lines. So, it is crucial to determine the Algorithmic Understandability for an algorithm, in order to better understand Software Complexity. This work deals with understanding Software Complexity from a cognitive angle. Also, it is vital to compute the effect of reducing cognitive complexity. The work aims to prove three important statements. The first being, that, while algorithmic complexity is a part of software complexity, software complexity does not solely and entirely mean algorithmic Complexity. Second, the work intends to bring to light the importance of cognitive understandability of algorithms. Third, is about the impact, reducing Cognitive Complexity, would have on Software Design and Development.

ContributorsMannava, Manasa Priyamvada (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by

System complexity reduction via feature selection

Complexity measurement of cyber physical systems

Complexity studies of firm dynamics

A study of accelerated Bayesian additive regression trees

Cognitive software complexity analysis