Search Content

Computational methods for knowledge integration in the analysis of large-scale biological networks

Description

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual…

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual patients render this a difficult task. In the last decade, several algorithms have been proposed to elucidate cellular systems from data, resulting in numerous data-driven hypotheses. However, due to the large number of variables involved in the process, many of which are unknown or not measurable, such computational approaches often lead to a high proportion of false positives. This renders interpretation of the data-driven hypotheses extremely difficult. Consequently, a dismal proportion of these hypotheses are subject to further experimental validation, eventually limiting their potential to augment existing biological knowledge. This dissertation develops a framework of computational methods for the analysis of such data-driven hypotheses leveraging existing biological knowledge. Specifically, I show how biological knowledge can be mapped onto these hypotheses and subsequently augmented through novel hypotheses. Biological hypotheses are learnt in three levels of abstraction -- individual interactions, functional modules and relationships between pathways, corresponding to three complementary aspects of biological systems. The computational methods developed in this dissertation are applied to high throughput cancer data, resulting in novel hypotheses with potentially significant biological impact.

ContributorsRamesh, Archana (Author) / Kim, Seungchan (Thesis advisor) / Langley, Patrick W (Committee member) / Baral, Chitta (Committee member) / Kiefer, Jeffrey (Committee member) / Arizona State University (Publisher)

Created2012

Tree Ensemble Algorithms for Causal Machine Learning

Description

This dissertation centers on treatment effect estimation in the field of causal inference, and aims to expand the toolkit for effect estimation when the treatment variable is binary. Two new stochastic tree-ensemble methods for treatment effect estimation in the continuous outcome setting are presented. The Accelerated Bayesian Causal Forrest (XBCF)…

This dissertation centers on treatment effect estimation in the field of causal inference, and aims to expand the toolkit for effect estimation when the treatment variable is binary. Two new stochastic tree-ensemble methods for treatment effect estimation in the continuous outcome setting are presented. The Accelerated Bayesian Causal Forrest (XBCF) model handles variance via a group-specific parameter, and the Heteroskedastic version of XBCF (H-XBCF) uses a separate tree ensemble to learn covariate-dependent variance. This work also contributes to the field of survival analysis by proposing a new framework for estimating survival probabilities via density regression. Within this framework, the Heteroskedastic Accelerated Bayesian Additive Regression Trees (H-XBART) model, which is also developed as part of this work, is utilized in treatment effect estimation for right-censored survival outcomes. All models have been implemented as part of the XBART R package, and their performance is evaluated via extensive simulation studies with appropriate sets of comparators. The contributed methods achieve similar levels of performance, while being orders of magnitude (sometimes as much as 100x) faster than comparator state-of-the-art methods, thus offering an exciting opportunity for treatment effect estimation in the large data setting.

ContributorsKrantsevich, Nikolay (Author) / Hahn, P Richard (Thesis advisor) / McCulloch, Robert (Committee member) / Zhou, Shuang (Committee member) / Lan, Shiwei (Committee member) / He, Jingyu (Committee member) / Arizona State University (Publisher)

Created2023

Elliptic Fourier Features for Robustness to Rotations and Translations in Neural Networks

Description

In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in sketches, aiming to attenuate the effects of distribution shift caused…

In image classification tasks, images are often corrupted by spatial transformationslike translations and rotations. In this work, I utilize an existing method that uses the Fourier series expansion to generate a rotation and translation invariant representation of closed contours found in sketches, aiming to attenuate the effects of distribution shift caused by the aforementioned transformations. I use this technique to transform input images into one of two different invariant representations, a Fourier series representation and a corrected raster image representation, prior to passing them to a neural network for classification. The architectures used include convolutional neutral networks (CNNs), multi-layer perceptrons (MLPs), and graph neural networks (GNNs). I compare the performance of this method to using data augmentation during training, the standard approach for addressing distribution shift, to see which strategy yields the best performance when evaluated against a test set with rotations and translations applied. I include experiments where the augmentations applied during training both do and do not accurately reflect the transformations encountered at test time. Additionally, I investigate the robustness of both approaches to high-frequency noise. In each experiment, I also compare training efficiency across models. I conduct experiments on three data sets, the MNIST handwritten digit dataset, a custom dataset (QD-3) consisting of three classes of geometric figures from the Quick, Draw! hand-drawn sketch dataset, and another custom dataset (QD-345) featuring sketches from all 345 classes found in Quick, Draw!. On the smaller problem space of MNIST and QD-3, the networks utilizing the Fourier-based technique to attenuate distribution shift perform competitively with the standard data augmentation strategy. On the more complex problem space of QD-345, the networks using the Fourier technique do not achieve the same test performance as correctly-applied data augmentation. However, they still outperform instances where train-time augmentations mis-predict test-time transformations, and outperform a naive baseline model where no strategy is used to attenuate distribution shift. Overall, this work provides evidence that strategies which attempt to directly mitigate distribution shift, rather than simply increasing the diversity of the training data, can be successful when certain conditions hold.

ContributorsWatson, Matthew (Author) / Yang, Yezhou YY (Thesis advisor) / Kerner, Hannah HK (Committee member) / Yang, Yingzhen YY (Committee member) / Arizona State University (Publisher)

Created2023

Relational Macrostate Theory for Understanding and Designing Complex Systems

Description

Scientific research encompasses a variety of objectives, including measurement, making predictions, identifying laws, and more. The advent of advanced measurement technologies and computational methods has largely automated the processes of big data collection and prediction. However, the discovery of laws, particularly universal ones, still heavily relies on human intellect. Even…

Scientific research encompasses a variety of objectives, including measurement, making predictions, identifying laws, and more. The advent of advanced measurement technologies and computational methods has largely automated the processes of big data collection and prediction. However, the discovery of laws, particularly universal ones, still heavily relies on human intellect. Even with human intelligence, complex systems present a unique challenge in discerning the laws that govern them. Even the preliminary step, system description, poses a substantial challenge. Numerous metrics have been developed, but universally applicable laws remain elusive. Due to the cognitive limitations of human comprehension, a direct understanding of big data derived from complex systems is impractical. Therefore, simplification becomes essential for identifying hidden regularities, enabling scientists to abstract observations or draw connections with existing knowledge. As a result, the concept of macrostates -- simplified, lower-dimensional representations of high-dimensional systems -- proves to be indispensable. Macrostates serve a role beyond simplification. They are integral in deciphering reusable laws for complex systems. In physics, macrostates form the foundation for constructing laws and provide building blocks for studying relationships between quantities, rather than pursuing case-by-case analysis. Therefore, the concept of macrostates facilitates the discovery of regularities across various systems. Recognizing the importance of macrostates, I propose the relational macrostate theory and a machine learning framework, MacroNet, to identify macrostates and design microstates. The relational macrostate theory defines a macrostate based on the relationships between observations, enabling the abstraction from microscopic details. In MacroNet, I propose an architecture to encode microstates into macrostates, allowing for the sampling of microstates associated with a specific macrostate. My experiments on simulated systems demonstrate the effectiveness of this theory and method in identifying macrostates such as energy. Furthermore, I apply this theory and method to a complex chemical system, analyzing oil droplets with intricate movement patterns in a Petri dish, to answer the question, ``which combinations of parameters control which behavior?'' The macrostate theory allows me to identify a two-dimensional macrostate, establish a mapping between the chemical compound and the macrostate, and decipher the relationship between oil droplet patterns and the macrostate.

ContributorsZhang, Yanbo (Author) / Walker, Sara I (Thesis advisor) / Anbar, Ariel (Committee member) / Daniels, Bryan (Committee member) / Das, Jnaneshwar (Committee member) / Davies, Paul (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by

Computational methods for knowledge integration in the analysis of large-scale biological networks

Tree Ensemble Algorithms for Causal Machine Learning

Elliptic Fourier Features for Robustness to Rotations and Translations in Neural Networks

Relational Macrostate Theory for Understanding and Designing Complex Systems