Matching Items (4)

128781-Thumbnail Image.png

Resolving Prokaryotic Taxonomy without rRNA: Longer Oligonucleotide Word Lengths Improve Genome and Metagenome Taxonomic Classification

Description

Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been

Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism’s inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.

Contributors

Agent

Created

Date Created
  • 2013-07-01

128833-Thumbnail Image.png

Korarchaeota Diversity, Biogeography, and Abundance in Yellowstone and Great Basin Hot Springs and Ecological Niche Modeling Based on Machine Learning

Description

Over 100 hot spring sediment samples were collected from 28 sites in 12 areas/regions, while recording as many coincident geochemical properties as feasible (>60 analytes). PCR was used to screen

Over 100 hot spring sediment samples were collected from 28 sites in 12 areas/regions, while recording as many coincident geochemical properties as feasible (>60 analytes). PCR was used to screen samples for Korarchaeota 16S rRNA genes. Over 500 Korarchaeota 16S rRNA genes were screened by RFLP analysis and 90 were sequenced, resulting in identification of novel Korarchaeota phylotypes and exclusive geographical variants. Korarchaeota diversity was low, as in other terrestrial geothermal systems, suggesting a marine origin for Korarchaeota with subsequent niche-invasion into terrestrial systems. Korarchaeota endemism is consistent with endemism of other terrestrial thermophiles and supports the existence of dispersal barriers. Korarchaeota were found predominantly in >55°C springs at pH 4.7–8.5 at concentrations up to 6.6×10[superscript 6] 16S rRNA gene copies g[superscript −1] wet sediment. In Yellowstone National Park (YNP), Korarchaeota were most abundant in springs with a pH range of 5.7 to 7.0. High sulfate concentrations suggest these fluids are influenced by contributions from hydrothermal vapors that may be neutralized to some extent by mixing with water from deep geothermal sources or meteoric water. In the Great Basin (GB), Korarchaeota were most abundant at spring sources of pH<7.2 with high particulate C content and high alkalinity, which are likely to be buffered by the carbonic acid system. It is therefore likely that at least two different geological mechanisms in YNP and GB springs create the neutral to mildly acidic pH that is optimal for Korarchaeota. A classification support vector machine (C-SVM) trained on single analytes, two analyte combinations, or vectors from non-metric multidimensional scaling models was able to predict springs as Korarchaeota-optimal or sub-optimal habitats with accuracies up to 95%. To our knowledge, this is the most extensive analysis of the geochemical habitat of any high-level microbial taxon and the first application of a C-SVM to microbial ecology.

Contributors

Created

Date Created
  • 2012-05-04

128916-Thumbnail Image.png

Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem

Description

We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome

We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability.

Contributors

Agent

Created

Date Created
  • 2012-06-04

Merging metagenomics and geochemistry reveals environmental controls on biological diversity and evolution

Description

Background
The metabolic strategies employed by microbes inhabiting natural systems are, in large part, dictated by the physical and geochemical properties of the environment. This study sheds light onto the

Background
The metabolic strategies employed by microbes inhabiting natural systems are, in large part, dictated by the physical and geochemical properties of the environment. This study sheds light onto the complex relationship between biology and environmental geochemistry using forty-three metagenomes collected from geochemically diverse and globally distributed natural systems. It is widely hypothesized that many uncommonly measured geochemical parameters affect community dynamics and this study leverages the development and application of multidimensional biogeochemical metrics to study correlations between geochemistry and microbial ecology. Analysis techniques such as a Markov cluster-based measure of the evolutionary distance between whole communities and a principal component analysis (PCA) of the geochemical gradients between environments allows for the determination of correlations between microbial community dynamics and environmental geochemistry and provides insight into which geochemical parameters most strongly influence microbial biodiversity.

Results
By progressively building from samples taken along well defined geochemical gradients to samples widely dispersed in geochemical space this study reveals strong links between the extent of taxonomic and functional diversification of resident communities and environmental geochemistry and reveals temperature and pH as the primary factors that have shaped the evolution of these communities. Moreover, the inclusion of extensive geochemical data into analyses reveals new links between geochemical parameters (e.g. oxygen and trace element availability) and the distribution and taxonomic diversification of communities at the functional level. Further, an overall geochemical gradient (from multivariate analyses) between natural systems provides one of the most complete predictions of microbial taxonomic and functional composition.

Conclusions
Clustering based on the frequency in which orthologous proteins occur among metagenomes facilitated accurate prediction of the ordering of community functional composition along geochemical gradients, despite a lack of geochemical input. The consistency in the results obtained from the application of Markov clustering and multivariate methods to distinct natural systems underscore their utility in predicting the functional potential of microbial communities within a natural system based on system geochemistry alone, allowing geochemical measurements to be used to predict purely biological metrics such as microbial community composition and metabolism.

Contributors

Agent

Created

Date Created
  • 2014-05-28