Planet Microbe: An Ontology-Enriched Cyberinfrastructure System for FAIR Marine ‘Omics Data

Kai Blumberg1, Alise Ponsero2, Matthew D Bomhoff3, Elisha M Wood-Charlson4, Pier Luigi Buttigieg5 and Bonnie L Hurwitz2, (1)University of Arizona, Biosystems Engineering, Tucson, AZ, United States, (2)University of Arizona, Agricultural & Biosystems Engineering, Tucson, AZ, United States, (3)University of Arizona, Agricultural and Biosystems Engineering, Tucson, AZ, United States, (4)Lawrence Berkeley National Laboratory, Berkeley, CA, United States, (5)Alfred Wegener Institut, Helmholtz Zentrum fūr Polar- und Meeresforschung, HGF-MPG Joint Research Group for Deep Sea Ecology and Technology, Bremerhaven, Germany
Abstract:
Published scientific data about critically important marine ecosystems are typically of heterogeneous types. Marine ‘omics dataset metadata are currently not interoperable, hindering their reuse in meta-analyses. Interoperability, the ability for data to be used together, allows disparate data sets to be combined and computed on, enabling discoveries that may have been impossible otherwise. Interoperability is a crucial and difficult to achieve component in the FAIR guiding principles for scientific data management and stewardship.

Here we present Planet Microbe, a web-accessible platform for the discovery, integration and analysis of ocean ‘omics data as well as their metadata. Planet Microbe brings together large-scale historical marine datasets such as the Hawaii Ocean Time-series (HOT), the Bermuda Atlantic Time-series (BATS), the Global Ocean Sampling Expedition (GOS), the Tara Oceans Expedition, and the Ocean Sampling Day (OSD) into a common platform which makes use of a standardized semantic layer by which to annotate dataset metadata.

Such semantic layer is comprised of ontologies, which are hierarchically structured, machine and human readable representations of expert knowledge used to describe real-world entities. Various Ontologies from the Open Biomedical Ontology and Library Foundry (OBO) including the Environment Ontology (ENVO), the Ontology for Biomedical Investigations (OBI) and the Units of measurement ontology (UO) are utilized and extended for dataset metadata annotation; enabling semantic search and discovery of dataset metadata.

Ongoing work will involve leveraging the Gene Ontology and the NCBI Taxonomy expressed in an ontology format in order to enable deeper data interrogation of the functional and taxonomic information contained within the Planet Microbe genomic data. Leveraging knowledge represented in the ontologies in combination with computational pipelines in order to guide users to new potential insights.