Search Content

A Bag-of-Words Approach for Drosophila Gene Expression Pattern Annotation

Description

Background:
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput…

Background:
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.

Results:
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.

Conclusion:
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.

ContributorsJi, Shuiwang (Author) / Li, Ying-Xin (Author) / Zhou, Zhi-Hua (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Ira A. Fulton Schools of Engineering (Contributor) / School of Electrical, Computer and Energy Engineering (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)

Created2009-04-21

A composite genome approach to identify phylogenetically informative data from next-generation sequencing

Description

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS,…

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.

ContributorsSchwartz, Rachel (Author) / Harkins, Kelly (Author) / Stone, Anne (Author) / Cartwright, Reed (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / School of Life Sciences (Contributor)

Created2015-06-11

GRASP [Genomic Resource Access for Stoichioproteomics]: comparative explorations of the atomic content of 12 Drosophila proteomes

Description

Background
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic…

Background
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic composition of their side chains. GRASP integrates comparative protein composition data with annotation data from multiple public databases. Currently, GRASP includes information on proteins of 12 sequenced Drosophila (fruit fly) proteomes, which will be expanded to include increasingly diverse organisms over time. In this paper we illustrate the potential of GRASP for testing stoichioproteomic hypotheses by conducting an exploratory investigation into the composition of 12 Drosophila proteomes, testing the prediction that protein atomic content is associated with species ecology and with protein expression levels.
Results
Elements varied predictably along multivariate axes. Species were broadly similar, with the D. willistoni proteome a clear outlier. As expected, individual protein atomic content within proteomes was influenced by protein function and amino acid biochemistry. Evolution in elemental composition across the phylogeny followed less predictable patterns, but was associated with broad ecological variation in diet. Using expression data available for D. melanogaster, we found evidence consistent with selection for efficient usage of elements within the proteome: as expected, nitrogen content was reduced in highly expressed proteins in most tissues, most strongly in the gut, where nutrients are assimilated, and least strongly in the germline.
Conclusions
The patterns identified here using GRASP provide a foundation on which to base future research into the evolution of atomic composition in Drosophila and other taxa.

ContributorsGilbert, James D. J. (Author) / Acquisti, Claudia (Author) / Martinson, Holly M. (Author) / Elser, James (Author) / Kumar, Sudhir (Author) / Fagan, William F. (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)

Created2013-09-04

Image-level and group-level models for Drosophila gene expression pattern annotation

Description

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the…

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.

ContributorsSun, Qian (Author) / Muckatira, Sherin (Author) / Yuan, Lei (Author) / Ji, Shuiwang (Author) / Newfeld, Stuart (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / Ira A. Fulton Schools of Engineering (Contributor)

Created2013-12-03

Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval

Description

Background
Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis,…

Background
Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords.
Results
In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes.
Conclusions
We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.

ContributorsYuan, Lei (Author) / Woodard, Alexander (Author) / Ji, Shuiwang (Author) / Jiang, Yuan (Author) / Zhou, Zhi-Hua (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / Ira A. Fulton Schools of Engineering (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)

Created2012-05-23

A mesh generation and machine learning framework for Drosophilagene expression pattern image analysis

Description

Background
Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that…

Background
Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions.
Results
We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/.
Conclusions
Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods.

ContributorsZhang, Wenlu (Author) / Feng, Daming (Author) / Li, Rongjian (Author) / Chernikov, Andrey (Author) / Chrisochoides, Nikos (Author) / Osgood, Christopher (Author) / Konikoff, Charlotte (Author) / Newfeld, Stuart (Author) / Kumar, Sudhir (Author) / Ji, Shuiwang (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)

Created2013-12-28

Bacterial Expression, Correct Membrane Targeting, and Functional Folding of the HIV-1 Membrane Protein Vpu Using a Periplasmic Signal Peptide

Description

Viral protein U (Vpu) is a type-III integral membrane protein encoded by Human Immunodeficiency Virus-1 (HIV- 1). It is expressed in infected host cells and plays several roles in viral progeny escape from infected cells, including down-regulation of CD4 receptors. But key structure/function questions remain regarding the mechanisms by which…

Viral protein U (Vpu) is a type-III integral membrane protein encoded by Human Immunodeficiency Virus-1 (HIV- 1). It is expressed in infected host cells and plays several roles in viral progeny escape from infected cells, including down-regulation of CD4 receptors. But key structure/function questions remain regarding the mechanisms by which the Vpu protein contributes to HIV-1 pathogenesis. Here we describe expression of Vpu in bacteria, its purification and characterization. We report the successful expression of PelB-Vpu in Escherichia coli using the leader peptide pectate lyase B (PelB) from Erwinia carotovora. The protein was detergent extractable and could be isolated in a very pure form. We demonstrate that the PelB signal peptide successfully targets Vpu to the cell membranes and inserts it as a type I membrane protein. PelB-Vpu was biophysically characterized by circular dichroism and dynamic light scattering experiments and was shown to be an excellent candidate for elucidating structural models.

ContributorsDeb, Arpan (Author) / Johnson, William (Author) / Kline, Alexander (Author) / Scott, Boston (Author) / Meador, Lydia (Author) / Srinivas, Dustin (Author) / Martin Garcia, Jose Manuel (Author) / Dorner, Katerina (Author) / Borges, Chad (Author) / Misra, Rajeev (Author) / Hogue, Brenda (Author) / Fromme, Petra (Author) / Mor, Tsafrir (Author) / ASU Biodesign Center Immunotherapy, Vaccines and Virotherapy (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / Biodesign Institute (Contributor) / School of Molecular Sciences (Contributor) / Applied Structural Discovery (Contributor) / Personalized Diagnostics (Contributor)

Created2017-02-22

Pyrosequencing Analysis Yields Comprehensive Assessment of Microbial Communities in Pilot-Scale Two-Stage Membrane Biofilm Reactors

Description

We studied the microbial community structure of pilot two-stage membrane biofilm reactors (MBfRs) designed to reduce nitrate (NO[subscript 3]–) and perchlorate (ClO[subscript 4]–) in contaminated groundwater. The groundwater also contained oxygen (O[subscript 2]) and sulfate (SO[2 over 4]–), which became important electron sinks that affected the NO[subscript 3]– and ClO[subscript…

We studied the microbial community structure of pilot two-stage membrane biofilm reactors (MBfRs) designed to reduce nitrate (NO[subscript 3]–) and perchlorate (ClO[subscript 4]–) in contaminated groundwater. The groundwater also contained oxygen (O[subscript 2]) and sulfate (SO[2 over 4]–), which became important electron sinks that affected the NO[subscript 3]– and ClO[subscript 4]– removal rates. Using pyrosequencing, we elucidated how important phylotypes of each “primary” microbial group, i.e., denitrifying bacteria (DB), perchlorate-reducing bacteria (PRB), and sulfate-reducing bacteria (SRB), responded to changes in electron-acceptor loading. UniFrac, principal coordinate analysis (PCoA), and diversity analyses documented that the microbial community of biofilms sampled when the MBfRs had a high acceptor loading were phylogenetically distant from and less diverse than the microbial community of biofilm samples with lower acceptor loadings. Diminished acceptor loading led to SO[2 over 4]– reduction in the lag MBfR, which allowed Desulfovibrionales (an SRB) and Thiothrichales (sulfur-oxidizers) to thrive through S cycling. As a result of this cooperative relationship, they competed effectively with DB/PRB phylotypes such as Xanthomonadales and Rhodobacterales. Thus, pyrosequencing illustrated that while DB, PRB, and SRB responded predictably to changes in acceptor loading, a decrease in total acceptor loading led to important shifts within the “primary” groups, the onset of other members (e.g., Thiothrichales), and overall greater diversity.

ContributorsOntiveros-Valencia, Aura (Author) / Tang, Youneng (Author) / Zhao, Heping (Author) / Friese, David (Author) / Overstreet, Ryan (Author) / Smith, Jennifer (Author) / Evans, Patrick (Author) / Rittmann, Bruce (Author) / Krajmalnik-Brown, Rosa (Author) / Biodesign Institute (Contributor) / Swette Center for Environmental Biotechnology (Contributor) / Julie Ann Wrigley Global Institute of Sustainability (Contributor) / School of Sustainability (Contributor)

Created2014-07-01

How UV photolysis accelerates the biodegradation and mineralization of sulfadiazine (SD)

Description

Sulfadiazine (SD), one of broad-spectrum antibiotics, exhibits limited biodegradation in wastewater treatment due to its chemical structure, which requires initial mono-oxygenation reactions to initiate its biodegradation. Intimately coupling UV photolysis with biodegradation, realized with the internal loop photobiodegradation reactor, accelerated SD biodegradation and mineralization by 35 and 71 %, respectively.…

Sulfadiazine (SD), one of broad-spectrum antibiotics, exhibits limited biodegradation in wastewater treatment due to its chemical structure, which requires initial mono-oxygenation reactions to initiate its biodegradation. Intimately coupling UV photolysis with biodegradation, realized with the internal loop photobiodegradation reactor, accelerated SD biodegradation and mineralization by 35 and 71 %, respectively. The main organic products from photolysis were 2-aminopyrimidine (2-AP), p-aminobenzenesulfonic acid (ABS), and aniline (An), and an SD-photolysis pathway could be identified using C, N, and S balances. Adding An or ABS (but not 2-AP) into the SD solution during biodegradation experiments (no UV photolysis) gave SD removal and mineralization rates similar to intimately coupled photolysis and biodegradation. An SD biodegradation pathway, based on a diverse set of the experimental results, explains how the mineralization of ABS and An (but not 2-AP) provided internal electron carriers that accelerated the initial mono-oxygenation reactions of SD biodegradation. Thus, multiple lines of evidence support that the mechanism by which intimately coupled photolysis and biodegradation accelerated SD removal and mineralization was through producing co-substrates whose oxidation produced electron equivalents that stimulated the initial mono-oxygenation reactions for SD biodegradation.

ContributorsPan, Shihui (Author) / Yan, Ning (Author) / Liu, Xinyue (Author) / Wang, Wenbing (Author) / Zhang, Yongming (Author) / Liu, Rui (Author) / Rittmann, Bruce (Author) / Biodesign Institute (Contributor) / Swette Center for Environmental Biotechnology (Contributor)

Created2014-11-01

Effects of pulsed electric field treatment on enhancing lipid recovery from the microalga, Scenedesmus

Description

Chloroform and methanol are superior solvents for lipid extraction from photosynthetic microorganisms, because they can overcome the resistance offered by the cell walls and membranes, but they are too toxic and expensive to use for large-scale fuel production. Biomass from the photosynthetic microalga Scenedesmus, subjected to a commercially available pre-treatment…

Chloroform and methanol are superior solvents for lipid extraction from photosynthetic microorganisms, because they can overcome the resistance offered by the cell walls and membranes, but they are too toxic and expensive to use for large-scale fuel production. Biomass from the photosynthetic microalga Scenedesmus, subjected to a commercially available pre-treatment technology called Focused-Pulsed® (FP), yielded 3.1-fold more crude lipid and fatty acid methyl ester (FAME) after extraction with a range of solvents. FP treatment increased the FAME-to-crude-lipid ratio for all solvents, which means that the extraction of non-lipid materials was minimized, while the FAME profile itself was unchanged compared to the control. FP treatment also made it possible to use only a small proportion of chloroform and methanol, along with isopropanol, to obtain equivalent yields of lipid and FAME as with 100% chloroform plus methanol.

ContributorsLai, Yenjung Sean (Author) / Parameswaran, Prathap (Author) / Li, Ang (Author) / Baez, Maria (Author) / Rittmann, Bruce (Author) / Biodesign Institute (Contributor) / Swette Center for Environmental Biotechnology (Contributor)

Created2014-12-01

Programs and Communities

Filtering by

A Bag-of-Words Approach for Drosophila Gene Expression Pattern Annotation

A composite genome approach to identify phylogenetically informative data from next-generation sequencing

GRASP [Genomic Resource Access for Stoichioproteomics]: comparative explorations of the atomic content of 12 Drosophila proteomes

Image-level and group-level models for Drosophila gene expression pattern annotation

Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval

A mesh generation and machine learning framework for Drosophilagene expression pattern image analysis

Bacterial Expression, Correct Membrane Targeting, and Functional Folding of the HIV-1 Membrane Protein Vpu Using a Periplasmic Signal Peptide

Pyrosequencing Analysis Yields Comprehensive Assessment of Microbial Communities in Pilot-Scale Two-Stage Membrane Biofilm Reactors

How UV photolysis accelerates the biodegradation and mineralization of sulfadiazine (SD)

Effects of pulsed electric field treatment on enhancing lipid recovery from the microalga, Scenedesmus