Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions.
Results
We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/.
Conclusions
Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods.
This project utilized computational tools to analyze large data sets and interpreted the results from historical and philosophical perspectives. Tools deployed were derived from scientometrics, corpus linguistics, text-based analysis, network analysis, and GIS analysis to analyze more than 9000 articles (metadata and text) on systems biology. The application of these tools to a HPS project represents a novel approach.
The dissertation shows that systems biology has transitioned from a more mathematical, computational, and engineering-oriented discipline focusing on modeling to a more biology-oriented discipline that uses modeling as a means to address real biological problems. Also, the results show that bioengineering and medical research has increased within systems biology. This is reflected in the increase of the centrality of biology-related concepts such as cancer, over time. The dissertation also compares the development of systems biology in China with some other parts of the world, and reveals regional differences, such as a unique trajectory of systems biology in China related to a focus on traditional Chinese medicine.
This dissertation adds to the historiography of modern biology where few studies have focused on systems biology compared with the history of molecular biology and evolutionary biology.
The first chapter revisits some of the key experiments that contributed to the development of the repression model of genetic regulation in the lac operon and concludes that the early research on gene expression and genetic regulation depict an iterative and integrative process, which was neither reductionist nor holist. In doing so, it challenges a common application of a conceptual framework in the history of biology and offers an alternative framework. The second chapter argues that the concept of emergence in the history and philosophy of biology is too ambiguous to account for the current research in post-genomic molecular biology and it is often erroneously used to argue against some reductionist theses. The third chapter investigates the use of network representations of gene expression in developmental evolution research and takes up some of the conceptual and methodological problems it has generated. The concluding comments present potential avenues for future research arising from each substantial chapter.
In sum, this dissertation argues that the epistemic practices of gene expression research are an iterative and integrative process, which produces theoretical representations of the complex interactions in gene expression as networks. Moreover, conceptualizing these interactions as networks constrains empirical research strategies by the limited number of ways in which gene expression can be controlled through general rules of network interactions. Making these strategies explicit helps to clarify how they can explain the dynamic and adaptive features of genomes.
Recently we employed phylogenetics to predict that the cellular interpretation of TGF-β signals is modulated by monoubiquitylation cycles affecting the Smad4 signal transducer/tumor suppressor. This prediction was subsequently validated by experiments in flies, frogs and mammalian cells. Here we apply a phylogenetic approach to the Hippo pathway and predict that two of its signal transducers, Salvador and Merlin/Nf2 (also a tumor suppressor) are regulated by monoubiquitylation. This regulatory mechanism does not lead to protein degradation but instead serves as a highly efficient “off/on” switch when the protein is subsequently deubiquitylated. Overall, our study shows that the creative application of phylogenetics can predict new roles for pathway components and new mechanisms for regulating intercellular signaling pathways.