Provenance of Earth Science Datasets – How Deep Should One Go?
Abstract:For credibility of scientific research, transparency and reproducibility are essential. This fundamental tenet has been emphasized for centuries, and has been receiving increased attention in recent years. The Office of Management and Budget (2002) addressed reproducibility and other aspects of quality and utility of information from federal agencies. Specific guidelines from NASA (2002) are derived from the above. According to these guidelines, “NASA requires a higher standard of quality for information that is considered influential. Influential scientific, financial, or statistical information is defined as NASA information that, when disseminated, will have or does have clear and substantial impact on important public policies or important private sector decisions.” For information to be compliant, “the information must be transparent and reproducible to the greatest possible extent.”
We present how the principles of transparency and reproducibility have been applied to NASA data supporting the Third National Climate Assessment (NCA3). The depth of trace needed of provenance of data used to derive conclusions in NCA3 depends on how the data were used (e.g., qualitatively or quantitatively). Given that the information is diligently maintained in the agency archives, it is possible to trace from a figure in the publication through the datasets, specific files, algorithm versions, instruments used for data collection, and satellites, as well as the individuals and organizations involved in each step. Such trace back permits transparency and reproducibility.