S53A-2768
SEIS-PROV: Practical Provenance for Seismological Data

Friday, 18 December 2015
Poster Hall (Moscone South)
Lion Krischer, Ludwig Maximilians University of Munich, Munich, Germany, James A Smith, Princeton University, Geosciences, Princeton, NJ, United States and Jeroen Tromp, Princeton University, Princeton, NJ, United States
Abstract:
It is widely recognized that reproducibility is crucial to advance science, but at the same time it is very hard to actually achieve. This results in it being recognized but also mostly ignored by a large fraction of the community. A key ingredient towards full reproducibility is to capture and describe the history of data, an issue known as provenance. We present SEIS-PROV, a practical format and data model to store provenance information for seismological data.

In a seismological context, provenance can be seen as information about the processes that generated and modified a particular piece of data. For synthetic waveforms the provenance information describes which solver and settings therein were used to generate it. When looking at processed seismograms, the provenance conveys information about the different time series analysis steps that led to it. Additional uses include the description of derived data types, such as cross-correlations and adjoint sources, enabling their proper storage and exchange.

SEIS-PROV is based on W3C PROV (http://www.w3.org/TR/prov-overview/), a standard for generic provenance information. It then applies an additional set of constraints to make it suitable for seismology. We present a definition of the SEIS-PROV format, a way to check if any given file is a valid SEIS-PROV document, and two sample implementations: One in SPECFEM3D GLOBE (https://geodynamics.org/cig/software/specfem3d_globe/) to store the provenance information of synthetic seismograms and another one as part of the ObsPy (http://obspy.org) framework enabling automatic tracking of provenance information during a series of analysis and transformation stages.

This, along with tools to visualize and interpret provenance graphs, offers a description of data history that can be readily tracked, stored, and exchanged.