IN31C-3738:
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework

Wednesday, 17 December 2014
Patrick West1, James Michaelis1, Tim Lebot1, Deborah L McGuinness1 and Peter Arthur Fox2, (1)Rensselaer Polytechnic Institute, Troy, NY, United States, (2)Rensselaer Polytechnic Inst., Troy, NY, United States
Abstract:
Providing proper citation and attribution for published data, derived data products, and the software tools used to generate them, has always been an important aspect of scientific research. However, It is often the case that this type of detailed citation and attribution is lacking. This is in part because it often requires manual markup since dynamic generation of this type of provenance information is not typically done by the tools used to access, manipulate, transform, and visualize data. In addition, the tools themselves lack the information needed to be properly cited themselves.

The OPeNDAP Hyrax Software Framework is a tool that provides access to and the ability to constrain, manipulate, and transform, different types of data from different data formats, into a common format, the DAP (Data Access Protocol), in order to derive new data products. A user, or another software client, specifies an HTTP URL in order to access a particular piece of data, and appropriately transform it to suit a specific purpose of use. The resulting data products, however, do not contain any information about what data was used to create it, or the software process used to generate it, let alone information that would allow the proper citing and attribution to down stream researchers and tool developers.

We will present our approach to provenance capture in Hyrax including a mechanism that can be used to report back to the hosting site any derived products, such as publications and reports, using the W3C PROV recommendation pingback service. We will demonstrate our utilization of Semantic Web and Web standards, the development of an information model that extends the PROV model for provenance capture, and the development of the pingback service. We will present our findings, as well as our practices for providing provenance information, visualization of the provenance information, and the development of pingback services, to better enable scientists and tool developers to be recognized and properly cited for their contributions.