S53A-2769
ASDF: An Adaptable Seismic Data Format with Full Provenance
Abstract:
In order for seismologists to maximize their knowledge of how the Earth works, they must extract the maximum amount of useful information from all recorded seismic data available for their research. This requires assimilating large sets of waveform data, keeping track of vast amounts of metadata, using validated standards for quality control, and automating the workflow in a careful and efficient manner. In addition, there is a growing gap between CPU/GPU speeds and disk access speeds that leads to an I/O bottleneck in seismic workflows. This is made even worse by existing seismic data formats that were not designed for performance and are limited to a few fixed headers for storing metadata.The Adaptable Seismic Data Format (ASDF) is a new data format for seismology that solves the problems with existing seismic data formats and integrates full provenance into the definition. ASDF is a self-describing format that features parallel I/O using the parallel HDF5 library. This makes it a great choice for use on HPC clusters. The format integrates the standards QuakeML for seismic sources and StationXML for receivers. ASDF is suitable for storing earthquake data sets, where all waveforms for a single earthquake are stored in a one file, ambient noise cross-correlations, and adjoint sources. The format comes with a user-friendly Python reader and writer that gives seismologists access to a full set of Python tools for seismology. There is also a faster C/Fortran library for integrating ASDF into performance-focused numerical wave solvers, such as SPECFEM3D_GLOBE. Finally, a GUI tool designed for visually exploring the format exists that provides a flexible interface for both research and educational applications. ASDF is a new seismic data format that offers seismologists high-performance parallel processing, organized and validated contents, and full provenance tracking for automated seismological workflows.