A43D-3305:
Evaluating Lossy Data Compression on Climate Simulation Data within a Large Ensemble
Abstract:
High-resolution climate simulations, such as those in the CESM-CAM5Large Ensemble (CESM-LE) Community Project, often require tremendous
computing resources and can generate massive datasets. Preserving the
data from these simulations consumes vast storage resources, and the
availability of storage resources can limit science objectives. For
example, for the CESM-LE project, consisting of 30 160-year
simulations, storage constraints influenced the frequency of output
and necessitated the deletion of the raw monthly mean output files
(only the processed time-series data were kept).
To mitigate this problem, we are investigating the use of lossy data
compression techniques on climate simulation data from the Community
Earth System Model. In preliminary work, we developed an approach for
verifying climate data that characterizes the natural variability of
the system, and we evaluated several compression algorithms with this
technique. The idea is that the effects of data compression on the
original data should not be statistically distinguishable from the
natural variability of the climate system as demonstrated by an
ensemble. Ultimately, to convince climate scientists that it is
acceptable to use lossy data compression, direct experience is
necessary. Therefore, output from the CESM-LE project is an ideal
venue for further evaluating data compression.
We report on the results of a blind experiment evaluating the impact
of data compression on output from the CESM-LE project. We contribute
three additional runs to the large ensemble, first compressing and
reconstructing the output of one or two of the new ensemble runs. The
challenge for the climate scientists is then to identify which of the
additional ensemble members has been compressed and reconstructed. We
will present the specific details of our approach and the results of
this blind experiment.