Metrics and Citations for Data and Software

Jessica Hausman1, Lewis John McGibbney2, Suresh Vannan1, Sara Bond1 and Dudee Chiang3, (1)Jet Propulsion Laboratory, California Institute of Technology, Pasadena, United States, (2)NASA Jet Propulsion Laboratory, Pasadena, CA, United States, (3)Jet Propulsion Laboratory, United States
Abstract:
The Physical Oceanography Distributed Active Archive Center (PO.DAAC) is NASA’s data repository and archive for physical oceanographic data, which includes winds, ocean surface topography, salinity, sea surface temperature, gravity, ocean circulation and more. Datasets at PO.DAAC have Digital Object Identifiers (DOI) and citations. This way users can easily comply to journal requirements of citing data used in their research, but citations also provide other uses. Datasets that are cited and have DOIs make collecting metrics about that dataset much easier. Those metrics can provide information on what impacts those data have in the science community, let the creators know if their data are being used, lets users know who have used that data, and allows for better transparency and provenance of what work was done because the exact version and type of data are known. PO.DAAC developed an API that gathers dataset metrics based on citations and that information is reported on the dataset’s landing page. PO.DAAC currently displays metrics for data cited in 2016-2017 and is currently working on other years. If you go to the GHRSST mission page at PO.DAAC you will see that 37 publications cited GHRSST data and the list of those publications. This way the creator can easily see what papers are using their data, users can see what data are being used for various topics, and program managers can see what impact the data are having.

The next step is to apply this to software. Citing software makes sense as it is part of a workflow for doing research and helps with the reproducibility. The movement of open source software to help with big data makes citing software more important as researchers are using more community contributed software rather than just trying to develop everything themselves. However, citing software has different challenges than data. Software is more dynamic than data and it can have dependencies on other software. PO.DAAC is developing a system to handle DOIs and citations for software in a consistent manner.

This presentation will cover how PO.DAAC manages DOIs and citations for data and software and how it gathers dataset citation metrics.