Integration and Exposure of Large Scale Computational Resources Across the Earth System Grid Federation (ESGF)

Wednesday, 16 December 2015
Poster Hall (Moscone South)
Daniel Duffy1, Thomas Patrick Maxwell2, Charles Doutriaux3, Dean Norman Williams3, Aashish Chaudhary4 and Sasha Ames3, (1)NASA Center for Climate Simulation, Greenbelt, MD, United States, (2)NASA Goddard Space Flight Center, Greenbelt, MD, United States, (3)Lawrence Livermore National Laboratory, Livermore, CA, United States, (4)Kitware Inc., Clifton Park, NY, United States
As the size of remote sensing observations and model output data grows, the volume of the data has become overwhelming, even to many scientific experts. As societies are forced to better understand, mitigate, and adapt to climate changes, the combination of Earth observation data and global climate model projects is crucial to not only scientists but to policy makers, downstream applications, and even the public. Scientific progress on understanding climate is critically dependent on the availability of a reliable infrastructure that promotes data access, management, and provenance. The Earth System Grid Federation (ESGF) has created such an environment for the Intergovernmental Panel on Climate Change (IPCC). ESGF provides a federated global cyber infrastructure for data access and management of model outputs generated for the IPCC Assessment Reports (AR).

The current generation of the ESGF federated grid allows consumers of the data to find and download data with limited capabilities for server-side processing. Since the amount of data for future AR is expected to grow dramatically, ESGF is working on integrating server-side analytics throughout the federation. The ESGF Compute Working Team (CWT) has created a Web Processing Service (WPS) Application Programming Interface (API) to enable access scalable computational resources. The API is the exposure point to high performance computing resources across the federation. Specifically, the API allows users to execute simple operations, such as maximum, minimum, average, and anomalies, on ESGF data without having to download the data. These operations are executed at the ESGF data node site with access to large amounts of parallel computing capabilities. This presentation will highlight the WPS API, its capabilities, provide implementation details, and discuss future developments.