Towards a common provenance model for research publications

Wednesday, 17 December 2014
Linyun Fu1, Xiaogang Ma1, Patrick West1, Stace E Beaulieu2, Massimo Di Stefano3 and Peter Arthur Fox4, (1)Rensselaer Polytechnic Institute, Troy, NY, United States, (2)Woods Hole Oceanographic Inst, Woods Hole, MA, United States, (3)Rensselaer Polytechnic Institute, woods hole, MA, United States, (4)Rensselaer Polytechnic Inst., Troy, NY, United States
Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. In a research publication, provenance includes entities, activities and people involved in the process leading to the parts of the publication such as figures, tables, paragraphs etc. Such information is often desirable for the readers to correctly interpret publication content and enables them to evaluate the credibility of the reported results by digging into the software in use, source data and responsible agents or even reproducing the results themselves.

In this presentation, we will describe our ontology designed to model the preparing process of research publications based on our experience from two projects, both focusing on provenance capturing for research publications. The first project is about capturing provenance information for a National Climate Assessment (NCA) report of the US Global Change Research Program (USGCRP), and the second about capturing provenance information for an Ecosystem Status Report (ESR) of the Northeast Fisheries Science Center (NEFSC). Both projects base their provenance modeling on the W3C Provenance ontology (PROV-O), which proves to be an effective way to create models for provenance capturing. We will illustrate the commonalities and differences between use cases of these two projects and how we derive a common model from models specifically designed to capture provenance information for each of the projects.