Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions.
Results
We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/.
Conclusions
Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods.
George Wells Beadle studied corn, fruit flies, and funguses in the US during the twentieth century. These studies helped Beadle earn the 1958 Nobel Prize in Physiology or Medicine. Beadle shared the prize with Edward Tatum for their discovery that genes help regulate chemical processes in and between cells. This finding, initially termed the one gene-one enzyme hypothesis, helped scientists develop new techniques to study genes and DNA as molecules, not just as units of heredity between generations of organisms. By inducing mutations in organisms while they were in different embryonic stages, Beadle's work on Drosophila and Neurospora led to the analysis of the cell cycle and embryonic development processes.
St. George Jackson Mivart studied animals and worked in England during the nineteenth century. He also proposed a theory of organismal development that he called individuation, and he critiqued Charles Darwin's argument for evolution by natural selection. His work on prosimians, a group of primates excluding apes and monkeys, helped scientists better investigate the Primate group. In his work On the Genesis of Species, Mivart argued that Darwin's theory couldn't explain how specific organismal forms developed and varied, explanations Mivart argued were necessary before Darwin could invoke the mechanism of natural selection to explain the evolution of species. To provide those explanations Mivart proposed theories of individuation and of instinct.
Boris Ephrussi and George Wells Beadle developed a transplantation technique on flies, Drosophila melanogaster, which they described in their 1936 article A Technique of Transplantation for Drosophila. The technique of injecting a tissue from one fly larva into another fly larva, using a micropipette, to grow that tissue in the second larvae, was a means for investigating development of Drosophila. Through this technique, Beadle and Ephrussi studied the role of genes in embryological processes. Beadle and Ephrussi were the first to apply the transplantation method, which had previously been used in the study of larger insects, to the smaller sized Drosophila. Beadle and Ephrussi used this method of transplantation to determine if parts of the optic disc, the section of a larvae that later become the eye buds in the adult, could be extracted from one larva and transplanted into another. They later built upon this research to relate the production of molecules in cells to gene function.
The one gene-one enzyme hypothesis, proposed by George Wells Beadle in the US in 1941, is the theory that each gene directly produces a single enzyme, which consequently affects an individual step in a metabolic pathway. In 1941, Beadle demonstrated that one gene in a fruit fly controlled a single, specific chemical reaction in the fruit fly, which one enzyme controlled. In the 1950s, the theory that genes produce enzymes that control a single metabolic step was dubbed the one geneÐone enzyme hypothesis by Norman Horowitz, a professor at the California Institute of Technology (Caltech) and an associate of Beadle's. This concept helped researchers characterize genes as chemical molecules, and it helped them identify the functions of those molecules.
George Wells Beadle and Edward Lawrie Tatum's 1941 article Genetic Control of Biochemical Reactions in Neurospora detailed their experiments on how genes regulated chemical reactions, and how the chemical reactions in turn affected development in the organism. Beadle and Tatum experimented on Neurospora, a type of bread mold, and they concluded that mutations to genes affected the enzymes of organisms, a result that biologists later generalized to proteins, not just enzymes. Beadle and Tatum's experiments provided an early link between genetics and the field of molecular biology.
Boris Ephrussi studied fruit flies, yeast, and mouse genetics and development while working in France and the US during the twentieth century. In yeast, Ephrussi studied how mutations in the cytoplasm persisted across generations. In mice he studied the genetics of hybrids and the development of cancer. Working with George Wells Beadle on the causes of different eye colors in fruit flies, Ephrussi's research helped establish the one-gene-one-enzyme hypothesis. Ephrussi helped create new embryological techniques and contributed the theories of genetics and development.