Search Content

Integrative analyses of diverse biological data sources

Description

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards…

The technology expansion seen in the last decade for genomics research has permitted the generation of large-scale data sources pertaining to molecular biological assays, genomics, proteomics, transcriptomics and other modern omics catalogs. New methods to analyze, integrate and visualize these data types are essential to unveil relevant disease mechanisms. Towards these objectives, this research focuses on data integration within two scenarios: (1) transcriptomic, proteomic and functional information and (2) real-time sensor-based measurements motivated by single-cell technology. To assess relationships between protein abundance, transcriptomic and functional data, a nonlinear model was explored at static and temporal levels. The successful integration of these heterogeneous data sources through the stochastic gradient boosted tree approach and its improved predictability are some highlights of this work. Through the development of an innovative validation subroutine based on a permutation approach and the use of external information (i.e., operons), lack of a priori knowledge for undetected proteins was overcome. The integrative methodologies allowed for the identification of undetected proteins for Desulfovibrio vulgaris and Shewanella oneidensis for further biological exploration in laboratories towards finding functional relationships. In an effort to better understand diseases such as cancer at different developmental stages, the Microscale Life Science Center headquartered at the Arizona State University is pursuing single-cell studies by developing novel technologies. This research arranged and applied a statistical framework that tackled the following challenges: random noise, heterogeneous dynamic systems with multiple states, and understanding cell behavior within and across different Barrett's esophageal epithelial cell lines using oxygen consumption curves. These curves were characterized with good empirical fit using nonlinear models with simple structures which allowed extraction of a large number of features. Application of a supervised classification model to these features and the integration of experimental factors allowed for identification of subtle patterns among different cell types visualized through multidimensional scaling. Motivated by the challenges of analyzing real-time measurements, we further explored a unique two-dimensional representation of multiple time series using a wavelet approach which showcased promising results towards less complex approximations. Also, the benefits of external information were explored to improve the image representation.

ContributorsTorres Garcia, Wandaliz (Author) / Meldrum, Deirdre R. (Thesis advisor) / Runger, George C. (Thesis advisor) / Gel, Esma S. (Committee member) / Li, Jing (Committee member) / Zhang, Weiwen (Committee member) / Arizona State University (Publisher)

Created2011

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Opportunistic fresh-produce commercialization under two-market disintegration

Description

This thesis develops a low-investment marketing strategy that allows low-to-mid level farmers extend their commercialization reach by strategically sending containers of fresh produce items to secondary markets that present temporary arbitrage opportunities. The methodology aims at identifying time windows of opportunity in which the price differential between two markets create…

This thesis develops a low-investment marketing strategy that allows low-to-mid level farmers extend their commercialization reach by strategically sending containers of fresh produce items to secondary markets that present temporary arbitrage opportunities. The methodology aims at identifying time windows of opportunity in which the price differential between two markets create an arbitrage opportunity for a transaction; a transaction involves buying a fresh produce item at a base market, and then shipping and selling it at secondary market price. A decision-making tool is developed that gauges the individual arbitrage opportunities and determines the specific price differential (or threshold level) that is most beneficial to the farmer under particular market conditions. For this purpose, two approaches are developed; a pragmatic approach that uses historic price information of the products in order to find the optimal price differential that maximizes earnings, and a theoretical one, which optimizes an expected profit model of the shipments to identify this optimal threshold. This thesis also develops risk management strategies that further reduce profit variability during a particular two-market transaction. In this case, financial engineering concepts are used to determine a shipment configuration strategy that minimizes the overall variability of the profits. For this, a Markowitz model is developed to determine the weight assignation of each component for a particular shipment. Based on the results of the analysis, it is deemed possible to formulate a shipment policy that not only increases the farmer's commercialization reach, but also produces profitable operations. In general, the observed rates of return under a pragmatic and theoretical approach hovered between 0.072 and 0.616 within important two-market structures. Secondly, it is demonstrated that the level of return and risk can be manipulated by varying the strictness of the shipping policy to meet the overall objectives of the decision-maker. Finally, it was found that one can minimize the risk of a particular two-market transaction by strategically grouping the product shipments.

ContributorsFlores, Hector M (Author) / Villalobos, Rene (Thesis advisor) / Runger, George C. (Committee member) / Maltz, Arnold (Committee member) / Arizona State University (Publisher)

Created2011

Chaos computing: from theory to application

Description

In this thesis I introduce a new direction to computing using nonlinear chaotic dynamics. The main idea is rich dynamics of a chaotic system enables us to (1) build better computers that have a flexible instruction set, and (2) carry out computation that conventional computers are not good at it.…

In this thesis I introduce a new direction to computing using nonlinear chaotic dynamics. The main idea is rich dynamics of a chaotic system enables us to (1) build better computers that have a flexible instruction set, and (2) carry out computation that conventional computers are not good at it. Here I start from the theory, explaining how one can build a computing logic block using a chaotic system, and then I introduce a new theoretical analysis for chaos computing. Specifically, I demonstrate how unstable periodic orbits and a model based on them explains and predicts how and how well a chaotic system can do computation. Furthermore, since unstable periodic orbits and their stability measures in terms of eigenvalues are extractable from experimental times series, I develop a time series technique for modeling and predicting chaos computing from a given time series of a chaotic system. After building a theoretical framework for chaos computing I proceed to architecture of these chaos-computing blocks to build a sophisticated computing system out of them. I describe how one can arrange and organize these chaos-based blocks to build a computer. I propose a brand new computer architecture using chaos computing, which shifts the limits of conventional computers by introducing flexible instruction set. Our new chaos based computer has a flexible instruction set, meaning that the user can load its desired instruction set to the computer to reconfigure the computer to be an implementation for the desired instruction set. Apart from direct application of chaos theory in generic computation, the application of chaos theory to speech processing is explained and a novel application for chaos theory in speech coding and synthesizing is introduced. More specifically it is demonstrated how a chaotic system can model the natural turbulent flow of the air in the human speech production system and how chaotic orbits can be used to excite a vocal tract model. Also as another approach to build computing system based on nonlinear system, the idea of Logical Stochastic Resonance is studied and adapted to an autoregulatory gene network in the bacteriophage λ.

ContributorsKia, Behnam (Author) / Ditto, William (Thesis advisor) / Huang, Liang (Committee member) / Lai, Ying-Cheng (Committee member) / Helms Tillery, Stephen (Committee member) / Arizona State University (Publisher)

Created2011

Characterizing retinotopic mapping using conformal geometry and Beltrami coefficient: a preliminary study

Description

Functional magnetic resonance imaging (fMRI) has been widely used to measure the retinotopic organization of early visual cortex in the human brain. Previous studies have identified multiple visual field maps (VFMs) based on statistical analysis of fMRI signals, but the resulting geometry has not been fully characterized with mathematical models.…

Functional magnetic resonance imaging (fMRI) has been widely used to measure the retinotopic organization of early visual cortex in the human brain. Previous studies have identified multiple visual field maps (VFMs) based on statistical analysis of fMRI signals, but the resulting geometry has not been fully characterized with mathematical models. This thesis explores using concepts from computational conformal geometry to create a custom software framework for examining and generating quantitative mathematical models for characterizing the geometry of early visual areas in the human brain. The software framework includes a graphical user interface built on top of a selected core conformal flattening algorithm and various software tools compiled specifically for processing and examining retinotopic data. Three conformal flattening algorithms were implemented and evaluated for speed and how well they preserve the conformal metric. All three algorithms performed well in preserving the conformal metric but the speed and stability of the algorithms varied. The software framework performed correctly on actual retinotopic data collected using the standard travelling-wave experiment. Preliminary analysis of the Beltrami coefficient for the early data set shows that selected regions of V1 that contain reasonably smooth eccentricity and polar angle gradients do show significant local conformality, warranting further investigation of this approach for analysis of early and higher visual cortex.

ContributorsTa, Duyan (Author) / Wang, Yalin (Thesis advisor) / Maciejewski, Ross (Committee member) / Wonka, Peter (Committee member) / Arizona State University (Publisher)

Created2013

Combining thickness information with surface tensor-based morphometry for the 3D statistical analysis of the corpus callosum

Description

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set…

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set of 3D morphological differences in the corpus callosum between two groups of subjects. The CCs are segmented from whole brain T1-weighted MRI and modeled as 3D tetrahedral meshes. The callosal surface is divided into superior and inferior patches on which we compute a volumetric harmonic field by solving the Laplace's equation with Dirichlet boundary conditions. We adopt a refined tetrahedral mesh to compute the Laplacian operator, so our computation can achieve sub-voxel accuracy. Thickness is estimated by tracing the streamlines in the harmonic field. We combine areal changes found using surface tensor-based morphometry and thickness information into a vector at each vertex to be used as a metric for the statistical analysis. Group differences are assessed on this combined measure through Hotelling's T2 test. The method is applied to statistically compare three groups consisting of: congenitally blind (CB), late blind (LB; onset > 8 years old) and sighted (SC) subjects. Our results reveal significant differences in several regions of the CC between both blind groups and the sighted groups; and to a lesser extent between the LB and CB groups. These results demonstrate the crucial role of visual deprivation during the developmental period in reshaping the structural architecture of the CC.

ContributorsXu, Liang (Author) / Wang, Yalin (Thesis advisor) / Maciejewski, Ross (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

3D rooftop detection and modeling using orthographic aerial images

Description

Detection of extruded features like rooftops and trees in aerial images automatically is a very active area of research. Elevated features identified from aerial imagery have potential applications in urban planning, identifying cover in military training or flight training. Detection of such features using commonly available geospatial data like orthographic…

Detection of extruded features like rooftops and trees in aerial images automatically is a very active area of research. Elevated features identified from aerial imagery have potential applications in urban planning, identifying cover in military training or flight training. Detection of such features using commonly available geospatial data like orthographic aerial imagery is very challenging because rooftop and tree textures are often camouflaged by similar looking features like roads, ground and grass. So, additonal data such as LIDAR, multispectral imagery and multiple viewpoints are exploited for more accurate detection. However, such data is often not available, or may be improperly registered or inacurate. In this thesis, we discuss a novel framework that only uses orthographic images for detection and modeling of rooftops. A segmentation scheme that initializes by assigning either foreground (rooftop) or background labels to certain pixels in the image based on shadows is proposed. Then it employs grabcut to assign one of those two labels to the rest of the pixels based on initial labeling. Parametric model fitting is performed on the segmented results in order to create a 3D scene and to facilitate roof-shape and height estimation. The framework can also benefit from additional geospatial data such as streetmaps and LIDAR, if available.

ContributorsKhanna, Kunal (Author) / Femiani, John (Thesis advisor) / Wonka, Peter (Thesis advisor) / Razdan, Anshuman (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2013

Spatial and temporal patterns of population genetic diversity in the fynbos plant, Leucadendron salignum, in the Cape Floral Region of South Africa

Description

The Cape Floral Region (CFR) in southwestern South Africa is one of the most diverse in the world, with >9,000 plant species, 70% of which are endemic, in an area of only ~90,000 km2. Many have suggested that the CFR's heterogeneous environment, with respect to landscape gradients, vegetation, rainfall, elevation,…

The Cape Floral Region (CFR) in southwestern South Africa is one of the most diverse in the world, with >9,000 plant species, 70% of which are endemic, in an area of only ~90,000 km2. Many have suggested that the CFR's heterogeneous environment, with respect to landscape gradients, vegetation, rainfall, elevation, and soil fertility, is responsible for the origin and maintenance of this biodiversity. While studies have struggled to link species diversity with these features, no study has attempted to associate patterns of gene flow with environmental data to determine how CFR biodiversity evolves on different scales. Here, a molecular population genetic data is presented for a widespread CFR plant, Leucadendron salignum, across 51 locations with 5-kb of chloroplast (cpDNA) and 6-kb of unlinked nuclear (nuDNA) DNA sequences in a dataset of 305 individuals. In the cpDNA dataset, significant genetic structure was found to vary on temporal and spatial scales, separating Western and Eastern Capes - the latter of which appears to be recently derived from the former - with the highest diversity in the heart of the CFR in a central region. A second study applied a statistical model using vegetation and soil composition and found fine-scale genetic divergence is better explained by this landscape resistance model than a geographic distance model. Finally, a third analysis contrasted cpDNA and nuDNA datasets, and revealed very little geographic structure in the latter, suggesting that seed and pollen dispersal can have different evolutionary genetic histories of gene flow on even small CFR scales. These three studies together caution that different genomic markers need to be considered when modeling the geographic and temporal origin of CFR groups. From a greater perspective, the results here are consistent with the hypothesis that landscape heterogeneity is one driving influence in limiting gene flow across the CFR that can lead to species diversity on fine-scales. Nonetheless, while this pattern may be true of the widespread L. salignum, the extension of this approach is now warranted for other CFR species with varying ranges and dispersal mechanisms to determine how universal these patterns of landscape genetic diversity are.

ContributorsTassone, Erica (Author) / Verrelli, Brian C (Thesis advisor) / Dowling, Thomas (Committee member) / Cartwright, Reed (Committee member) / Rosenberg, Michael S. (Committee member) / Wojciechowski, Martin (Committee member) / Arizona State University (Publisher)

Created2013

Regression tree-based methodology for customizing building energy benchmarks to individual commercial buildings

Description

According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement.…

According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI pertinent to the building type. The ability to identify and rank the important variables is of great importance in practical implementation of the benchmarking tools which rely on query-based building and HVAC variable filters specified by the user.

ContributorsKaskhedikar, Apoorva Prakash (Author) / Reddy, T. Agami (Thesis advisor) / Bryan, Harvey (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)

Created2013

A visual analytics based decision support methodology for evaluating low energy building design alternatives

Description

The ability to design high performance buildings has acquired great importance in recent years due to numerous federal, societal and environmental initiatives. However, this endeavor is much more demanding in terms of designer expertise and time. It requires a whole new level of synergy between automated performance prediction with the…

The ability to design high performance buildings has acquired great importance in recent years due to numerous federal, societal and environmental initiatives. However, this endeavor is much more demanding in terms of designer expertise and time. It requires a whole new level of synergy between automated performance prediction with the human capabilities to perceive, evaluate and ultimately select a suitable solution. While performance prediction can be highly automated through the use of computers, performance evaluation cannot, unless it is with respect to a single criterion. The need to address multi-criteria requirements makes it more valuable for a designer to know the "latitude" or "degrees of freedom" he has in changing certain design variables while achieving preset criteria such as energy performance, life cycle cost, environmental impacts etc. This requirement can be met by a decision support framework based on near-optimal "satisficing" as opposed to purely optimal decision making techniques. Currently, such a comprehensive design framework is lacking, which is the basis for undertaking this research. The primary objective of this research is to facilitate a complementary relationship between designers and computers for Multi-Criterion Decision Making (MCDM) during high performance building design. It is based on the application of Monte Carlo approaches to create a database of solutions using deterministic whole building energy simulations, along with data mining methods to rank variable importance and reduce the multi-dimensionality of the problem. A novel interactive visualization approach is then proposed which uses regression based models to create dynamic interplays of how varying these important variables affect the multiple criteria, while providing a visual range or band of variation of the different design parameters. The MCDM process has been incorporated into an alternative methodology for high performance building design referred to as Visual Analytics based Decision Support Methodology [VADSM]. VADSM is envisioned to be most useful during the conceptual and early design performance modeling stages by providing a set of potential solutions that can be analyzed further for final design selection. The proposed methodology can be used for new building design synthesis as well as evaluation of retrofits and operational deficiencies in existing buildings.

ContributorsDutta, Ranojoy (Author) / Reddy, T Agami (Thesis advisor) / Runger, George C. (Committee member) / Addison, Marlin S. (Committee member) / Arizona State University (Publisher)

Created2013