IN53A-3789:
Climate Model Evaluation in Distributed Environments.
Friday, 19 December 2014
Amy J Braverman, NASA Jet Propulsion Laboratory, Pasadena, CA, United States
Abstract:
As the volume of climate-model-generated and observational data increases, it has become infeasible to perform large-scale comparisons of model output against observations by moving the data to a central location. Data reduction techniques, such as gridding or subsetting, can reduce data volume, but also sacrifice information about spatial and temporal variability that may be important for the comparison. Alternatively, it is generally recognized that "moving the computaton to the data" is more efficient for leveraging large data sets. In the spirit of the latter approach, we describe a new methodology for comparing time series structure in model-generated and observational time series when those data are stored on different computers. The method involves simulating the sampling distribution of the difference between a statistic computed from the model output and the same statistic computed from the observed data. This is accomplished with separate wavelet decompositions of the two time series on their respective local machines, and the transmission of only a very small set of information computed from the wavelet coefficients. The smaller that set is, the cheaper it is to transmit, but also the less accurate will be the result. From the standpoint of the analysis of distributed data, the main question concerns the nature of that trade-off. In this talk, we describe the comparison methodology and the results of some preliminary studies on the cost-accuracy trade-off.