IN13C-1852
Data Citation Concept for CMIP6

Monday, 14 December 2015
Poster Hall (Moscone South)
Martina Stockhause1, Frank Toussaint1, Michael Lautenschlager1 and Bryan Lawrence2, (1)DKRZ German Climate Computing Centre, Data Management, Hamburg, Germany, (2)University of Reading, Reading, RG6, United Kingdom
Abstract:
There is a broad consensus among data centers and scientific publishers on Force 11's 'Joint Declaration of Data Citation Principles'. To put these principles into operation is not always as straight forward.

The focus for CMIP6 data citations lies on the citation of data created by others and used in an analysis underlying the article. And for this source data usually no article of the data creators is available ('stand-alone data publication'). The planned data citation granularities are model data (data collections containing all datasets provided for the project by a single model) and experiment data (data collections containing all datasets for a scientific experiment run by a single model).

In case of large international projects or activities like CMIP, the data is commonly stored and disseminated by multiple repositories in a federated data infrastructure such as the Earth System Grid Federation (ESGF). The individual repositories are subject to different institutional and national policies. A Data Management Plan (DMP) will define a certain standard for the repositories including data handling procedures.

Another aspect of CMIP data, relevant for data citations, is its dynamic nature. For such large data collections, datasets are added, revised and retracted for years, before the data collection becomes stable for a data citation entity including all model or simulation data. Thus, a critical issue for ESGF is data consistency, requiring thorough dataset versioning to enable the identification of the data collection in the cited version. Currently, the ESGF is designed for accessing the latest dataset versions. Data citation introduces the necessity to support older and retracted dataset versions by storing metadata even beyond data availability (data unpublished in ESGF).

Apart from ESGF, other infrastructure components exist for CMIP, which provide information that has to be connected to the CMIP6 data, e.g. ES-DOC providing information on models and simulations and the IPCC Data Distribution Centre (DDC) storing a subset of data together with available metadata (ES-DOC) for the long-term reuse of the interdisciplinary community. Other connections exist to standard project vocabularies, to personal identifiers (e.g. ORCID), or to data products (including provenance information).