Towards a FAIR-compliant ocean and environmental genome database

Sean Jungbluth1, Benjamin J Tully2, Adam P Arkin3 and Elisha M Wood-Charlson1, (1)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (2)University of Southern California, Center for Dark Energy Biosphere Investigations, Los Angeles, CA, United States, (3)Lawrence Berkeley National Laboratory, Berkeley, United States
Abstract:
Ocean and environmental scientists are producing genome sequence data at an unprecedented rate. The high pace of data generation combined with novel data types are straining data archiving efforts with authoritative databases such as NCBI. These challenges are ultimately leading data generators to submit their data to alternative repositories and thus undermining FAIR (Findable, Accessible, Interoperable, and Reusable) access to these intended public resources. I will describe how we are leveraging the DOE Systems Biology Knowledge Base (KBase) for making genome sequence data derived from ocean and environmental metagenome-assembled genomes (MAGs) and single-cell assembled genomes (SAGs) FAIR compliant.

KBase currently provides access to all genomes available through NCBI RefSeq. To augment these genome data, we are constructing Narratives to provide access to MAGs and SAGs data on a per publication basis, and encourage the community to start doing the same. The generated Narratives live in a public KBase organization to allow convenient access by users. In addition to providing access to recently published MAG and SAG data not currently in NCBI, the KBase Narratives allows users to explore and do analysis on system, while maintaining the provenance back to the original Narrative containing a list of authors and link to the original publication. We envision that a user-driven, centralized and publicly available MAG/SAG database within KBase, that is accessible to users and machines, will democratize access and help improve the infrastructure supporting the reuse of ocean- and environmental-derived genome sequence data.