Genomes from Terabase-scale Metagenomic Projects Used to Explore Metabolic Potential of Uncultivated and Novel Microorganisms

Benjamin J Tully, University of Southern California, Center for Dark Energy Biosphere Investigations, Los Angeles, CA, United States, Chris Neely, University of Southern California, Department of Biological Sciences, United States and Elaina D Graham, University of Southern California, Marine & Environmental Biology, CA, United States
Abstract:
The vast majority of microbial life belongs to uncultivated groups, for which we have limited knowledge about the role they may play in biogeochemical cycles. Environmental metagenomics as a tool provides an avenue for researchers to capture a snapshot of in situ microbial diversity that can be used to gain insight into the metabolisms of uncultivated and novel microorganisms. Large-scale, global efforts to apply metagenomic techniques have generated multiple publicly available datasets that consist of hundreds of samples and more than 5 Terabase pairs (Tbp) of data – Tara Oceans and bioGEOTRACES. Thousands of metagenome-assembled genomes (MAGs) have been generated from the Tara Oceans datasets. Our analysis of these MAGs has led to the discovery of a novel lineage of phototrophic Alphaproteobacteria (Candidatus Luxescamonaceae) and expanded the characterization of the Marine Group II Euryarchaea (Candidatus Poseidoniales). Ongoing analysis explores the role of uncultivated heterotrophs from the Planctomycetes and the candidate phylum Dadabacteria. These groups cover two different ends of the marine heterotroph genome size spectrum. The Dadabacteria have exceptionally small, streamlined genomes, similar to SAR11 (~1.2Gbp), while the Planctomycetes have large genomes (>3Gbp), providing perspective on organisms involved in extracellular organic matter degradation through the lens of genome specialists versus generalists. Further, new analysis has started on MAGs reconstructed from the 500+ samples available in the bioGEOTRACES metagenomic dataset. Modifications to the assembly and binning protocols were designed to enhance the recovery of high abundance, high diversity microorganisms that are resistant to recovery through standard binning methods (e.g., SAR11, MG1 Thaumarchaea, the cyanobacteria) and to increase the number of MAGs recovered per Megabase of sequencing effort. These MAGs have been paired with the high-resolution geochemistry collected by the GEOTRACES program to discern ecological trends between microbes and global nutrient concentrations.