Bridging the OA Data Processing and Quality Control Workflow Gap
Eugene F Burger, NOAA Seattle, Seattle, WA, United States, Kevin O'Brien, University of Washington Seattle Campus, JISAO, Seattle, WA, United States, Karl Matthew Smith, JISAO, Univ. of Washington, Seattle, WA, United States, Roland Schweitzer, Weathertop Consulting, LLC, College Station,, TX, United States, Ansley B Manke, NOAA/PMEL, Seattle, WA, United States and Liqing Jiang, National Centers for Environmental Information, Silver Spring, MD, United States
Abstract:
To effectively use data collected by the ocean acidification community for analysis and synthesis product generation, it is desirable that the data are quality controlled, documented, and accessible by the applications scientists prefer to use. The processing requirements, increases in data volume, require a significant effort by OAP collaborators as second level data processing and quality control is time-consuming. Federal and NOAA data directives now require our scientific data to be documented, publically available and archived in two years or less, further adding to the scientists’ data management burden. Time spent on these data processing activities reduces the resources available to scientists to perform their research. This data-workflow gap between initial data processing, known as level one processing, and National Data Center submission of contextual quality controlled data, has not been addressed for a significant amount of OA data.
We propose tools and processes that will streamline OA data processing and contextual quality control. This vision suggests a solution that relies on a combination of extending existing development and new development on tools that will allow users to span this data workflow gap; to streamline the processing, quality control, and archive submission of biogeochemical OA data and metadata. Workflow established by this software will reduce the data management burden for scientists while also creating quality controlled data in interoperable standards-based formats that promote easier use of the high-value data. Time savings gained by this streamlined data processing will also allow scientists to meet their obligations for data submission to the National Data Centers. This talk will present this vision and highlight the existing applications and tools that, if extended, can meet the requirements at a much reduced development cost.