CruisePack: A Tool to Facilitate Submission and Archiving of Cruise-Based Data

Charles Anderson, Cooperative Institute for Research in Environmental Sciences, Boulder, CO, United States and Brian Meyer, CIRES, Boulder, CO, United States
Abstract:
The ever increasing diversity and volume of complex oceanographic data being collected poses a multitude of of data management challenges. Not only do these large volumes complicate data analysis and increase processing time, they significantly impact data preservation, management, and reuse. Effective archiving of these voluminous data requires efficient systems throughout the data submission, ingest and management process. To facilitate data submission, the NOAA National Centers for Environmental Information (NCEI) Center for Coasts, Oceans, and Geophysics developed CruisePack, a data packaging tool for data providers to efficiently create uniform data packages containing the data files and metadata required to archive a diverse array of cruise-based geophysical data. The uniform nature of CruisePack’s data packages facilitates automated archiving systems that perform checksum validation to ensure file integrity, archive the data, populate domain specific metadata databases, and update data discovery and delivery web portals. CruisePack is a platform-agnostic, stand-alone executable with a simple user interface to control packager operation and facilitate the entry of metadata by the data provider. Once the data provider enters the necessary metadata to describe the datasets, data packaging is fully automated, and the metadata is saved for future data compiling efforts. The packager copies the data, generates standard-compliant metadata records and creates a checksum manifest file; all contained in a structured data package conforming to the Library of Congress BagIt specification. The produced data package facilitates data delivery by diverse means including network transmission and external drives. CruisePack is in use by a number of data providers and is proving highly effective in facilitating data submission to NCEI, and it is a valuable resource for both data providers and archives.