The Surface Ocean CO2 Atlas: Stewarding Underway Carbon Data from Collection to Archival

Kevin O'Brien1, Karl Matthew Smith2, Benjamin Pfeil3, Camilla Landa3, Dorothee C E Bakker4, Are Olsen3, Steve Jones5, Biva Shrestha6, Alexander Kozyr6, Ansley B Manke7, Roland Schweitzer8 and Eugene F Burger9, (1)University of Washington Seattle Campus, JISAO, Seattle, WA, United States, (2)JISAO, Univ. of Washington, Seattle, WA, United States, (3)Bjerknes Centre for Climate Research, Bergen, Norway, (4)University of East Anglia, School of Environmental Sciences, Norwich, United Kingdom, (5)University of Exeter, Exeter, United Kingdom, (6)Oak Ridge National Lab, Oak Ridge, TN, United States, (7)NOAA/PMEL, Seattle, WA, United States, (8)Weathertop Consulting, LLC, College Station,, TX, United States, (9)NOAA Seattle, Seattle, WA, United States
Abstract:
The Surface Ocean CO2 Atlas (SOCAT, www.socat.info) is a quality controlled, global surface ocean carbon dioxide (CO2) data set gathered on research vessels, SOOP and buoys. To the degree feasible SOCAT is comprehensive; it draws together and applies uniform QC procedures to all such observations made across the international community. The first version of SOCAT (version 1.5) was publicly released September 2011(Bakker et al., 2011) with 6.3 million observations. This was followed by the release of SOCAT version 2, expanded to over 10 million observations, in June 2013 (Bakker et al., 2013). Most recently, in September 2015 SOCAT version 3 was released containing over 14 millions observations spanning almost 60 years! 


The process of assembling, QC’ing and publishing V1.5 and V2 of SOCAT required an unsustainable level of manual effort. To ease the burden on data managers and data providers, the SOCAT community agreed to embark an automated data ingestion process which would create a streamlined workflow to improve data stewardship from ingestion to quality control and from publishing to archival. To that end, for version 3 and beyond, the SOCAT automation team created a framework which was based upon standards and conventions, yet at the same time allows scientists to work in the data formats they felt most comfortable with (ie, csv files). This automated workflow provides several advantages: 1) data ingestion into uniform and standards-based file formats; 2) ease of data integration into standard quality control system; 3) data ingestion and quality control can be performed in parallel; 4) provides uniform method of archiving carbon data and generation of digital object identifiers (DOI).

In this presentation, we will discuss and demonstrate the SOCAT data ingestion dashboard and the quality control system. We will also discuss the standards, conventions, and tools that were leveraged to create a workflow that allows scientists to work in their own formats, yet provides a framework for creating high quality data products on an annual basis, while meeting or exceeding data requirements for access, documentation and archival.