Reproducibility and Transparency in Ocean-Climate Modeling
Friday, 18 December 2015
Poster Hall (Moscone South)
Reproducibility is a cornerstone of the scientific method. Within geophysical modeling and simulation achieving reproducibility can be difficult, especially given the complexity of numerical codes, enormous and disparate data sets, and variety of supercomputing technology. We have made progress on this problem in the context of a large project - the development of new ocean and sea ice models, MOM6 and SIS2. Here we present useful techniques and experience.
We use version control not only for code but the entire experiment working directory, including configuration (run-time parameters, component versions), input data and checksums on experiment output. This allows us to document when the solutions to experiments change, whether due to code updates or changes in input data. To avoid distributing large input datasets we provide the tools for generating these from the sources, rather than provide raw input data.
Bugs can be a source of non-determinism and hence irreproducibility, e.g. reading from or branching on uninitialized memory. To expose these we routinely run system tests, using a memory debugger, multiple compilers and different machines. Additional confidence in the code comes from specialised tests, for example automated dimensional analysis and domain transformations. This has entailed adopting a code style where we deliberately restrict what a compiler can do when re-arranging mathematical expressions.
In the spirit of open science, all development is in the public domain. This leads to a positive feedback, where increased transparency and reproducibility makes using the model easier for external collaborators, who in turn provide valuable contributions. To facilitate users installing and running the model we provide (version controlled) digital notebooks that illustrate and record analysis of output. This has the dual role of providing a gross, platform-independent, testing capability and a means to documents model output and analysis.