B33A-0631
Generation and use of observational data patterns in the evaluation of data quality for AmeriFlux and FLUXNET

Wednesday, 16 December 2015
Poster Hall (Moscone South)
Gilberto Pastorello, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
Abstract:
The fluxes-measuring sites that are part of AmeriFlux are operated and maintained in a fairly independent fashion, both in terms of scientific goals and operational practices. This is also the case for most sites from other networks in FLUXNET. This independence leads to a degree of heterogeneity in the data sets collected at the sites, which is also reflected in data quality levels. The generation of derived data products and data synthesis efforts, two of the main goals of these networks, are directly affected by the heterogeneity in data quality. In a collaborative effort between AmeriFlux and ICOS, a series of quality checks are being conducted for the data sets before any network-level data processing and product generation take place. From these checks, a set of common data issues were identified, and are being cataloged and classified into data quality patterns. These patterns are now being used as a basis for implementing automation for certain data quality checks, speeding up the process of applying the checks and evaluating the data. Currently, most data checks are performed individually in each data set, requiring visual inspection and inputs from a data curator. This manual process makes it difficult to scale the quality checks, creating a bottleneck for the data processing. One goal of the automated checks is to free up time of data curators so they can focus on new or less common issues. As new issues are identified, they can also be cataloged and classified, extending the coverage of existing patterns or potentially generating new patterns, helping both improve existing automated checks and create new ones. This approach is helping make data quality evaluation faster, more systematic, and reproducible. Furthermore, these patterns are also helping with documenting common causes and solutions for data problems. This can help tower teams with diagnosing problems in data collection and processing, and also in correcting historical data sets. In this presentation, using AmeriFlux fluxes and micrometeorological data, we discuss our approach to creating observational data patterns, and how we are using them to implement new automated checks. We also detail examples of these observational data patterns, illustrating how they are being used.