IN31A-1749
A WPS Based Architecture for Climate Data Analytic Services (CDAS) at NASA

Wednesday, 16 December 2015
Poster Hall (Moscone South)
Thomas Patrick Maxwell1, Mark McInerney2, Daniel Duffy3, Laura Carriere4, Gerald L Potter1 and Charles Doutriaux5, (1)NASA Goddard Space Flight Center, Greenbelt, MD, United States, (2)NASA Goddard, Greenbelt, MD, United States, (3)NASA Center for Climate Simulation, Greenbelt, MD, United States, (4)NCCS, NASA Goddard, Greenbelt, MD, United States, (5)Lawrence Livermore National Laboratory, Livermore, CA, United States
Abstract:
Faced with unprecedented growth in the Big Data domain of climate science, NASA has developed the Climate Data Analytic Services (CDAS) framework. This framework enables scientists to execute trusted and tested analysis operations in a high performance environment close to the massive data stores at NASA. The data is accessed in standard (NetCDF, HDF, etc.) formats in a POSIX file system and processed using trusted climate data analysis tools (ESMF, CDAT, NCO, etc.). The framework is structured as a set of interacting modules allowing maximal flexibility in deployment choices. The current set of module managers include:
  • Staging Manager: Runs the computation locally on the WPS server or remotely using tools such as celery or SLURM.
  • Compute Engine Manager: Runs the computation serially or distributed over nodes using a parallelization framework such as celery or spark.
  • Decomposition Manger: Manages strategies for distributing the data over nodes.
  • Data Manager: Handles the import of domain data from long term storage and manages the in-memory and disk-based caching architectures.
  • Kernel manager: A kernel is an encapsulated computational unit which executes a processor’s compute task. Each kernel is implemented in python exploiting existing analysis packages (e.g. CDAT) and is compatible with all CDAS compute engines and decompositions.

CDAS services are accessed via a WPS API being developed in collaboration with the ESGF Compute Working Team to support server-side analytics for ESGF. The API can be executed using either direct web service calls, a python script or application, or a javascript-based web application. Client packages in python or javascript contain everything needed to make CDAS requests.

The CDAS architecture brings together the tools, data storage, and high-performance computing required for timely analysis of large-scale data sets, where the data resides, to ultimately produce societal benefits. It is is currently deployed at NASA in support of the Collaborative REAnalysis Technical Environment (CREATE) project, which centralizes numerous global reanalysis datasets onto a single advanced data analytics platform. This service permits decision makers to investigate climate changes around the globe, inspect model trends, compare multiple reanalysis datasets, and variability.