IN21A-3690:
Light-Weight Parallel Python Tools for Climate Model Workflows

Tuesday, 16 December 2014
Sheri A Mickelson, Kevin Paul, John Dennis and Gary Strand, National Center for Atmospheric Research, Boulder, CO, United States
Abstract:
It is expected that the data required for the next Intergovernmental Panel on Climate Change (IPCC) Assessment Report (AR6) will increase by more than a factor of 10 to an expected 25 terabytes per model. Experiences from the last Coupled Model Intercomparison Project (CMIP5), which assembled the data used for the last IPCC Assessment Report (AR5), concluded that the processing, archiving, and post-run diagnostic operations required on such large model output took almost as long to complete as the model runs themselves! As a result, we have been investigating and developing light-weight Python-based tools to parallelize the time-intensive post-run steps in the climate model workflow. In particular, we have developed a parallel Python tool for converting time-slice model output to time-series format, and we have more recently developed a parallel Python tool to perform fast time-averaging of time-series data, an operation needed for many diagnostic computations. These tools are designed to be light-weight, easy to install, with very few dependencies, and that can be easily inserted into the climate model workflow with negligible disruption. In this work, we present the motivation, approach, and results of the two light-weight parallel Python tools that we have developed, as well as our plans for future research and development.