IN53A-3792:
Addressing (some) Big Data Challenges in Climate Science: Cross-Sciences Collaborative Efforts Driven By Eudat Emerging Services
Abstract:
As climate model horizontal and spatial resolutions are getting higher, in line with increasing available computing power on High Performance Computing (HPC) systems, the amount of data generated by climate simulations is getting very large. Our road toward exascale will continue to increase the generated data volumes to be analyzed, even when reducing data output to coarser output grids before storage and analysis. These problems are not confined to the climate scientific community, but it is shared among several scientific fields, such as high-particle physics, linguistics, and seismology, among others.Within the framework of the European EUDAT project, several emerging services are being developed and deployed operationally to enhance collaborative and federated infrastructures that can scale to very large data volumes, driven by scientific communities' needs and international collaborations notably with the Research Data Alliance (RDA) and through Working Groups involving EUDAT partners and international experts. One of these Working Groups is focusing on Workflows and their execution near the data storage in a federated infrastructure, and these workflows will also be using EUDAT services.
EUDAT current and upcoming services will be presented, with a focus in how these services will be useful to the climate community ESGF infrastructure in a Big Data era, to perform data analyses that are not hampered by limitations simply because of too large data volumes given today's tools and infrastructures. A generic interface/protocol for abstraction of specific communities federated data environments, enabling cross-communities data sharing and collaboration, will also be presented.
This study was funded by the EU project EUDAT funded by the European Commission’s Seventh Framework Research Programme under the grant agreement 283304.