IN31C-3734:
Provenance for Runtime Workflow Steering and Validation in Computational Seismology

Wednesday, 17 December 2014
Alessandro Spinuso1, Lion Krischer2, Amy Krause3, Rosa Filgueira3, Federica Magnoni4, Visakh Muraleedharan5 and Mario David5, (1)Royal Netherlands Meteorological Institute, De Bilt, 3730, Netherlands, (2)Ludwig Maximilian University of Munich, Munich, Germany, (3)University of Edinburgh, Edinburgh, United Kingdom, (4)National Institute of Geophysics and Volcanology, Rome, Italy, (5)Institut de Physique du Globe de Paris, Sismologie, Paris, France
Abstract:
Provenance systems may be offered by modern workflow engines to collect metadata about the data transformations at runtime. If combined with effective visualisation and monitoring interfaces, these provenance recordings can speed up the validation process of an experiment, suggesting interactive or automated interventions with immediate effects on the lifecycle of a workflow run.

For instance, in the field of computational seismology, if we consider research applications performing long lasting cross correlation analysis and high resolution simulations, the immediate notification of logical errors and the rapid access to intermediate results, can produce reactions which foster a more efficient progress of the research.

These applications are often executed in secured and sophisticated HPC and HTC infrastructures, highlighting the need for a comprehensive framework that facilitates the extraction of fine grained provenance and the development of provenance aware components, leveraging the scalability characteristics of the adopted workflow engines, whose enactment can be mapped to different technologies (MPI, Storm clusters, etc).

This work looks at the adoption of W3C-PROV concepts and data model within a user driven processing and validation framework for seismic data, supporting also computational and data management steering. Validation needs to balance automation with user intervention, considering the scientist as part of the archiving process. Therefore, the provenance data is enriched with community-specific metadata vocabularies and control messages, making an experiment reproducible and its description consistent with the community understandings. Moreover, it can contain user defined terms and annotations.

The current implementation of the system is supported by the EU-Funded VERCE (http://verce.eu). It provides, as well as the provenance generation mechanisms, a prototypal browser-based user interface and a web API built on top of a NoSQL storage technology, experimenting ways to ensure a rapid and flexible access to the lineage traces. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data which may be selectively stored at runtime, into dedicated data archives.