Provenance through Time
Abstract:The ability to reproduce scientific results is a cornerstone of the scientific method, and access to the data upon which the results are based is essential to reproducibility. Access to the data alone is not enough though, and research communities have recognized the importance of metadata (data documentation) to enable discovery and data access, and facilitate interpretation and accurate reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) was first funded in late 2006 by the National Science Foundation (NSF) Division of Ocean Sciences (OCE) Biology and Chemistry Sections to help ensure that data generated during NSF OCE funded research would be preserved and available for future use. The BCO-DMO was formed by combining the formerly independent data management offices of two marine research programs: the United States Joint Global Ocean Flux Study (US JGOFS) and the US GLOBal Ocean ECosystems Dynamics (US GLOBEC) program.
Since the US JGOFS and US GLOBEC programs were both active (1990s) there have been significant changes in all aspects of the research data life cycle, and the staff at BCO-DMO has modified the way in which we manage data contributed to the office. The supporting documentation that describes each dataset was originally displayed as a human-readable text file retrievable via a Web browser. BCO-DMO still offers that form because our primary audience is marine researchers using Web browser clients; however we are seeing an increased demand to support machine client access. Metadata records from the BCO-DMO data system are now extracted and published out in a variety of formats. The system supports ISO 19115, FGDC, GCMD DIF, schema.org Dataset extension, formal publication with a DOI, and RDF with semantic markup including PROV-O, FOAF and more.
In the 1990s, data documentation helped researchers locate data of interest and understand the provenance sufficiently to determine fitness for purpose. Today, providing data documentation in a machine interpretable form enables researchers to make more effective use of machine clients to discover and access data. This presentation will describe the challenges associated with and benefits realized from layering modern Semantic Web technologies on top of a legacy data system.