IN31C-3731:
NCAR Earth Observing Laboratory's Data Tracking System

Wednesday, 17 December 2014
Linda E Cully and Steven F Williams, NCAR, Boulder, CO, United States
Abstract:
The NCAR Earth Observing Laboratory (EOL) maintains an extensive collection of complex, multi-disciplinary datasets from national and international, current and historical projects accessible through field project web pages (https://www.eol.ucar.edu/all-field-projects-and-deployments). Data orders are processed through the EOL Metadata Database and Cyberinfrastructure (EMDAC) system. Behind the scenes is the institutionally created EOL Computing, Data, and Software/Data Management Group (CDS/DMG) Data Tracking System (DTS) tool. The DTS is used to track the complete life cycle (from ingest to long term stewardship) of the data, metadata, and provenance for hundreds of projects and thousands of data sets.

The DTS is an EOL internal only tool which consists of three subsystems: Data Loading Notes (DLN), Processing Inventory Tool (IVEN), and Project Metrics (STATS). The DLN is used to track and maintain every dataset that comes to the CDS/DMG. The DLN captures general information such as title, physical locations, responsible parties, high level issues, and correspondence. When the CDS/DMG processes a data set, IVEN is used to track the processing status while collecting sufficient information to ensure reproducibility. This includes detailed “How To” documentation, processing software (with direct links to the EOL Subversion software repository), and descriptions of issues and resolutions. The STATS subsystem generates current project metrics such as archive size, data set order counts, “Top 10” most ordered data sets, and general information on who has ordered these data.

The DTS was developed over many years to meet the specific needs of the CDS/DMG, and it has been successfully used to coordinate field project data management efforts for the past 15 years. This paper will describe the EOL CDS/DMG Data Tracking System including its basic functionality, the provenance maintained within the system, lessons learned, potential improvements, and future developments.