Distributed Computation Resources for Earth System Grid Federation (ESGF)

Friday, 19 December 2014
Daniel Duffy1, Charles Doutriaux2 and Dean Norman Williams2, (1)NASA Center for Climate Simulation, Greenbelt, MD, United States, (2)Lawrence Livermore National Laboratory, Livermore, CA, United States
The Intergovernmental Panel on Climate Change (IPCC), prompted by the United Nations General Assembly, has published a series of papers in their Fifth Assessment Report (AR5) on processes, impacts, and mitigations of climate change in 2013. The science used in these reports was generated by an international group of domain experts. They studied various scenarios of climate change through the use of highly complex computer models to simulate the Earth’s climate over long periods of time. The resulting total data of approximately five petabytes are stored in a distributed data grid known as the Earth System Grid Federation (ESGF). Through the ESGF, consumers of the data can find and download data with limited capabilities for server-side processing.

The Sixth Assessment Report (AR6) is already in the planning stages and is estimated to create as much as two orders of magnitude more data than the AR5 distributed archive. It is clear that data analysis capabilities currently in use will be inadequate to allow for the necessary science to be done with AR6 data—the data will just be too big. A major paradigm shift from downloading data to local systems to perform data analytics must evolve to moving the analysis routines to the data and performing these computations on distributed platforms. In preparation for this need, the ESGF has started a Compute Working Team (CWT) to create solutions that allow users to perform distributed, high-performance data analytics on the AR6 data. The team will be designing and developing a general Application Programming Interface (API) to enable highly parallel, server-side processing throughout the ESGF data grid. This API will be integrated with multiple analysis and visualization tools, such as the Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT), netCDF Operator (NCO), and others.

This presentation will provide an update on the ESGF CWT’s overall approach toward enabling the necessary storage proximal computational capabilities to study climate change using the AR6 extreme scale distributed data archive. An update on the API will be provided along with a survey of the overall computational approaches being reviewed and studied by the members of the ESGF CWT.