From “Inspiration-driven” Research to “Industrial-strength” Research: Applying User-developed Climate Analytics at Large scale

Friday, 19 December 2014: 2:40 PM
Aparna Radhakrishnan1,2, Erik E Mason1,2, Amy R Langenhorst1, V. Balaji1,3 and Serguei Nikonov1,3, (1)Geophysical Fluid Dynamics Laboratory, Princeton, NJ, United States, (2)Engility, Chantilly, VA, United States, (3)Princeton University, Princeton, NJ, United States
Numerous climate models, several parameters output from a vast range of climate scenarios -- most likely motivates a climate scientist to analyze a suite of available data to research and address a plethora of scientific questions, eg. occurrence of El Niño events or simply validate and compute specialized metrics for a specific climate field. Providing a platform for our scientists to work with data from different models both in-house and extending a similar approach to the application of climate analysis on data from different modeling centers is a key goal that will be addressed in this presentation.

Model intercomparison projects, Earth System Grid Federation and knowledge exchange within the climate science community have all enabled successful establishment of “data standards and controlled vocabulary” . This opens key possibilities to facilitate techniques used to “explore” dataset(s) in the Big-Data archive and perform climate analyses following a simple, standardized templated approach.

A typical pattern of use would be where the scientist works with a few datasets interactively to refine and extract a signal of a particular climate phenomenon. At this point data access patterns are random, as the analysis is exploratory. We call this the “inspiration-driven” phase of research. Subsequently, the scientist would need to apply her analysis to a much wider set of data: different models and scenarios from CMIP5 for example. This can be thought of as the “industrial” phase of research.

We provide a pathway for user-developed analyses to transition from inspiration to industry. We will illustrate techniques being adopted at GFDL to develop analysis through interactive computational exploration on selected data; Provide analysis capabilities in batch workflows (using: Flexible Runtime Environment) and also web-based with data exploration mechanisms tapped from GFDL’s Curator infrastructure. Comparing climate data both at the inter and intra-laboratory level, simplified by use of “unified analysis templates” and data standards.

This work was partly funded by the International ExArch project under the G8 initiative by NSF Award 1119308. Thanks to Paul Kushner (University of Toronto), Bruce Wyman (GFDL).