Matching Items (41)

149723-Thumbnail Image.png

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

Contributors

Agent

Created

Date Created
2011

152494-Thumbnail Image.png

Design, analytics and quality assurance for emerging personalized clinical diagnostics based on next-gen sequencing

Description

Major advancements in biology and medicine have been realized during recent decades, including massively parallel sequencing, which allows researchers to collect millions or billions of short reads from a DNA or RNA sample. This capability opens the door to a

Major advancements in biology and medicine have been realized during recent decades, including massively parallel sequencing, which allows researchers to collect millions or billions of short reads from a DNA or RNA sample. This capability opens the door to a renaissance in personalized medicine if effectively deployed. Three projects that address major and necessary advancements in massively parallel sequencing are included in this dissertation. The first study involves a pair of algorithms to verify patient identity based on single nucleotide polymorphisms (SNPs). In brief, we developed a method that allows de novo construction of sample relationships, e.g., which ones are from the same individuals and which are from different individuals. We also developed a method to confirm the hypothesis that a tumor came from a known individual. The second study derives an algorithm to multiplex multiple Polymerase Chain Reaction (PCR) reactions, while minimizing interference between reactions that compromise results. PCR is a powerful technique that amplifies pre-determined regions of DNA and is often used to selectively amplify DNA and RNA targets that are destined for sequencing. It is highly desirable to multiplex reactions to save on reagent and assay setup costs as well as equalize the effect of minor handling issues across gene targets. Our solution involves a binary integer program that minimizes events that are likely to cause interference between PCR reactions. The third study involves design and analysis methods required to analyze gene expression and copy number results against a reference range in a clinical setting for guiding patient treatments. Our goal is to determine which events are present in a given tumor specimen. These events may be mutation, DNA copy number or RNA expression. All three techniques are being used in major research and diagnostic projects for their intended purpose at the time of writing this manuscript. The SNP matching solution has been selected by The Cancer Genome Atlas to determine sample identity. Paradigm Diagnostics, Viomics and International Genomics Consortium utilize the PCR multiplexing technique to multiplex various types of PCR reactions on multi-million dollar projects. The reference range-based normalization method is used by Paradigm Diagnostics to analyze results from every patient.

Contributors

Agent

Created

Date Created
2014

152506-Thumbnail Image.png

Cluster metrics and temporal coherency in pixel based matrices

Description

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as:

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data sets, the bond energy algorithm to find finer patterns and anomalies through contrast, multi-dimensional scaling, flow lines, user guided clustering, and row-column ordering. User input is applied on precomputed data sets to provide for real time interaction. General applicability of the techniques are tested against industrial trade, social networking, financial, and sparse data sets of varying dimensionality.

Contributors

Agent

Created

Date Created
2014

152235-Thumbnail Image.png

A visual analytics based decision support methodology for evaluating low energy building design alternatives

Description

The ability to design high performance buildings has acquired great importance in recent years due to numerous federal, societal and environmental initiatives. However, this endeavor is much more demanding in terms of designer expertise and time. It requires a whole

The ability to design high performance buildings has acquired great importance in recent years due to numerous federal, societal and environmental initiatives. However, this endeavor is much more demanding in terms of designer expertise and time. It requires a whole new level of synergy between automated performance prediction with the human capabilities to perceive, evaluate and ultimately select a suitable solution. While performance prediction can be highly automated through the use of computers, performance evaluation cannot, unless it is with respect to a single criterion. The need to address multi-criteria requirements makes it more valuable for a designer to know the "latitude" or "degrees of freedom" he has in changing certain design variables while achieving preset criteria such as energy performance, life cycle cost, environmental impacts etc. This requirement can be met by a decision support framework based on near-optimal "satisficing" as opposed to purely optimal decision making techniques. Currently, such a comprehensive design framework is lacking, which is the basis for undertaking this research. The primary objective of this research is to facilitate a complementary relationship between designers and computers for Multi-Criterion Decision Making (MCDM) during high performance building design. It is based on the application of Monte Carlo approaches to create a database of solutions using deterministic whole building energy simulations, along with data mining methods to rank variable importance and reduce the multi-dimensionality of the problem. A novel interactive visualization approach is then proposed which uses regression based models to create dynamic interplays of how varying these important variables affect the multiple criteria, while providing a visual range or band of variation of the different design parameters. The MCDM process has been incorporated into an alternative methodology for high performance building design referred to as Visual Analytics based Decision Support Methodology [VADSM]. VADSM is envisioned to be most useful during the conceptual and early design performance modeling stages by providing a set of potential solutions that can be analyzed further for final design selection. The proposed methodology can be used for new building design synthesis as well as evaluation of retrofits and operational deficiencies in existing buildings.

Contributors

Agent

Created

Date Created
2013

152076-Thumbnail Image.png

Towards haptic intelligence for artificial hands: development and use of deformable, fluidic tactile sensors to relate action and perception

Description

Human fingertips contain thousands of specialized mechanoreceptors that enable effortless physical interactions with the environment. Haptic perception capabilities enable grasp and manipulation in the absence of visual feedback, as when reaching into one's pocket or wrapping a belt around oneself.

Human fingertips contain thousands of specialized mechanoreceptors that enable effortless physical interactions with the environment. Haptic perception capabilities enable grasp and manipulation in the absence of visual feedback, as when reaching into one's pocket or wrapping a belt around oneself. Unfortunately, state-of-the-art artificial tactile sensors and processing algorithms are no match for their biological counterparts. Tactile sensors must not only meet stringent practical specifications for everyday use, but their signals must be processed and interpreted within hundreds of milliseconds. Control of artificial manipulators, ranging from prosthetic hands to bomb defusal robots, requires a constant reliance on visual feedback that is not entirely practical. To address this, we conducted three studies aimed at advancing artificial haptic intelligence. First, we developed a novel, robust, microfluidic tactile sensor skin capable of measuring normal forces on flat or curved surfaces, such as a fingertip. The sensor consists of microchannels in an elastomer filled with a liquid metal alloy. The fluid serves as both electrical interconnects and tunable capacitive sensing units, and enables functionality despite substantial deformation. The second study investigated the use of a commercially-available, multimodal tactile sensor (BioTac sensor, SynTouch) to characterize edge orientation with respect to a body fixed reference frame, such as a fingertip. Trained on data from a robot testbed, a support vector regression model was developed to relate haptic exploration actions to perception of edge orientation. The model performed comparably to humans for estimating edge orientation. Finally, the robot testbed was used to perceive small, finger-sized geometric features. The efficiency and accuracy of different haptic exploratory procedures and supervised learning models were assessed for estimating feature properties such as type (bump, pit), order of curvature (flat, conical, spherical), and size. This study highlights the importance of tactile sensing in situations where other modalities fail, such as when the finger itself blocks line of sight. Insights from this work could be used to advance tactile sensor technology and haptic intelligence for artificial manipulators that improve quality of life, such as prosthetic hands and wheelchair-mounted robotic hands.

Contributors

Agent

Created

Date Created
2013

153063-Thumbnail Image.png

Holistic learning for multi-target and network monitoring problems

Description

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making.

The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm created in this dissertation. Specifically, a novel tree based ensemble that leverages the relationships between target attributes towards constructing a diverse, yet strong, model is proposed. The method is justified through its connection to existing methods and experimental evaluations on synthetic and real data.

The second topic pertains to monitoring complex systems that are modeled as networks. Such systems present a rich set of attributes and relationships for which holistic learning is important. In social networks, for example, in addition to friendship ties, various attributes concerning the users' gender, age, topic of messages, time of messages, etc. are collected. A restricted form of monitoring fails to take the relationships of multiple attributes into account, whereas the holistic view embeds such relationships in the monitoring methods. The focus is on the difficult task to detect a change that might only impact a small subset of the network and only occur in a sub-region of the high-dimensional space of the network attributes. One contribution is a monitoring algorithm based on a network statistical model. Another contribution is a transactional model that transforms the task into an expedient structure for machine learning, along with a generalizable algorithm to monitor the attributed network. A learning step in this algorithm adapts to changes that may only be local to sub-regions (with a broader potential for other learning tasks). Diagnostic tools to interpret the change are provided. This robust, generalizable, holistic monitoring method is elaborated on synthetic and real networks.

Contributors

Agent

Created

Date Created
2014

Dynamic management of inspection effort allocation in an international port of entry (POE)

Description

Every year, more than 11 million maritime containers and 11 million commercial trucks arrive to the United States, carrying all types of imported goods. As it would be costly to inspect every container, only a fraction of them are inspected

Every year, more than 11 million maritime containers and 11 million commercial trucks arrive to the United States, carrying all types of imported goods. As it would be costly to inspect every container, only a fraction of them are inspected before being allowed to proceed into the United States. This dissertation proposes a decision support system that aims to allocate the scarce inspection resources at a land POE (L-POE), to minimize the different costs associated with the inspection process, including those associated with delaying the entry of legitimate imports. Given the ubiquity of sensors in all aspects of the supply chain, it is necessary to have automated decision systems that incorporate the information provided by these sensors and other possible channels into the inspection planning process. The inspection planning system proposed in this dissertation decomposes the inspection effort allocation process into two phases: Primary and detailed inspection planning. The former helps decide what to inspect, and the latter how to conduct the inspections. A multi-objective optimization (MOO) model is developed for primary inspection planning. This model tries to balance the costs of conducting inspections, direct and expected, and the waiting time of the trucks. The resulting model is exploited in two different ways: One is to construct a complete or a partial efficient frontier for the MOO model with diversity of Pareto-optimal solutions maximized; the other is to evaluate a given inspection plan and provide possible suggestions for improvement. The methodologies are described in detail and case studies provided. The case studies show that this MOO based primary planning model can effectively pick out the non-conforming trucks to inspect, while balancing the costs and waiting time.

Contributors

Agent

Created

Date Created
2012

151511-Thumbnail Image.png

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

Contributors

Agent

Created

Date Created
2013

149928-Thumbnail Image.png

Integrative analyses of diverse biological data sources

Description

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.

Contributors

Agent

Created

Date Created
2011

150172-Thumbnail Image.png

Opportunistic fresh-produce commercialization under two-market disintegration

Description

This thesis develops a low-investment marketing strategy that allows low-to-mid level farmers extend their commercialization reach by strategically sending containers of fresh produce items to secondary markets that present temporary arbitrage opportunities. The methodology aims at identifying time windows of

This thesis develops a low-investment marketing strategy that allows low-to-mid level farmers extend their commercialization reach by strategically sending containers of fresh produce items to secondary markets that present temporary arbitrage opportunities. The methodology aims at identifying time windows of opportunity in which the price differential between two markets create an arbitrage opportunity for a transaction; a transaction involves buying a fresh produce item at a base market, and then shipping and selling it at secondary market price. A decision-making tool is developed that gauges the individual arbitrage opportunities and determines the specific price differential (or threshold level) that is most beneficial to the farmer under particular market conditions. For this purpose, two approaches are developed; a pragmatic approach that uses historic price information of the products in order to find the optimal price differential that maximizes earnings, and a theoretical one, which optimizes an expected profit model of the shipments to identify this optimal threshold. This thesis also develops risk management strategies that further reduce profit variability during a particular two-market transaction. In this case, financial engineering concepts are used to determine a shipment configuration strategy that minimizes the overall variability of the profits. For this, a Markowitz model is developed to determine the weight assignation of each component for a particular shipment. Based on the results of the analysis, it is deemed possible to formulate a shipment policy that not only increases the farmer's commercialization reach, but also produces profitable operations. In general, the observed rates of return under a pragmatic and theoretical approach hovered between 0.072 and 0.616 within important two-market structures. Secondly, it is demonstrated that the level of return and risk can be manipulated by varying the strictness of the shipping policy to meet the overall objectives of the decision-maker. Finally, it was found that one can minimize the risk of a particular two-market transaction by strategically grouping the product shipments.

Contributors

Agent

Created

Date Created
2011