The Ocean Observatories Initiative: Data pre-Processing: Diagnostic Tools to Prepare Data for QA/QC Processing.

Leila Belabbassi1, Lori M Garzio1, Michael J Smith1, Friedrich Knuth1, Michael Vardaro1 and John Kerfoot2, (1)Rutgers University, Department of Marine and Coastal Sciences, New Brunswick, NJ, United States, (2)Rutgers University, Marine and Coastal Sciences, New Brunswick, NJ, United States
Abstract:
The Ocean Observatories Initiative (OOI), funded by the National Science Foundation, provides users with access to long-term datasets from a variety of deployed oceanographic sensors. The Pioneer Array in the Atlantic Ocean off the Coast of New England hosts 10 moorings and 6 gliders. Each mooring is outfitted with 6 to 19 different instruments telemetering more than 1000 data streams. These data are available to science users to collaborate on common scientific goals such as water quality monitoring and scale variability measures of continental shelf processes and coastal open ocean exchanges. To serve this purpose, the acquired datasets undergo an iterative multi-step quality assurance and quality control procedure automated to work with all types of data. Data processing involves several stages, including a fundamental pre-processing step when the data are prepared for processing. This takes a considerable amount of processing time and is often not given enough thought in development initiatives. The volume and complexity of OOI data necessitates the development of a systematic diagnostic tool to enable the management of a comprehensive data information system for the OOI arrays. We present two examples to demonstrate the current OOI pre-processing diagnostic tool. First, Data Filtering is used to identify incomplete, incorrect, or irrelevant parts of the data and then replaces, modifies or deletes the coarse data. This provides data consistency with similar datasets in the system. Second, Data Normalization occurs when the database is organized in fields and tables to minimize redundancy and dependency. At the end of this step, the data are stored in one place to reduce the risk of data inconsistency and promote easy and efficient mapping to the database.