Data Curation in the Long Tail of Science: Preparing Community Land Model Validation Data for Reuse and Preservation

Thursday, 18 December 2014
Madison L Langseth1, Gary Strand2 and William R Wieder2, (1)University of Tennessee, Knoxville, TN, United States, (2)National Center for Atmospheric Research, Boulder, CO, United States
Long tail science is argued to account for the majority of scientific output. Long tail scientific research tends to be conducted by small research teams with limited budgets, affecting the team’s ability to properly curate their data for reuse and preservation. The data set that was curated in this project is a small data set of global soil properties that was processed for use with the Community Land Model (CLM). The Data Curation Profile Toolkit was utilized to work with the scientist in order to determine his needs with respect to curation of the data set. Through a number of formal interviews, it was established that the scientist required assistance in documenting the data workflow, updating the metadata, and eventually archiving the data with an appropriate repository. The data curator worked with the scientist to document the data workflow both as a visual diagram and in a README file for the scientist’s future reference. The existing metadata content was verified for accuracy against original data set documentation and additional metadata was added to include provenance and detailed data descriptions. The authors determined that the most appropriate repository for the data was the Oak Ridge National Laboratory’s Distributed Active Archive Center (ORNL DAAC). The authors appraised, selected, and submitted the data to the ORNL DAAC. The project’s final results and lessons learned will be discussed.