Organizing Data to Support Diverse Access Patterns
Monday, 15 December 2014
Many Earth Science archives are currently structured as the data are collected or calculated: individual files hold results for single time slices. This approach is easy to write while data are being collected or calculated and is optimized for viewing the data in latitude/longitude slices (maps or images) or animations of a series of these maps. It also works well for spatial comparisons of values at the same time. Several groups have recently explored approaches to reorganizing data in order to optimize access for analysis of climate variations, i.e. temporal changes at a given location. One approach, termed “data rods”, takes this reorganization to the opposite end of the organization spectrum with a file for each grid cell time series. An alternative in the middle of the organization spectrum offers flexible access that can support either map or time series views. This approach takes advantage of chunking to divide the data into three-dimensional blocks, i.e. two-dimensional latitude/longitude tiles with a time dimension. This chunking approach is implemented as a core capability of the HDF5 file format and its tools and offers a number of advantages: improved multi-use case data access, integrated data compression, expanding or shrinking data dimensions, adding or deleting data, and simplifying data management by decreasing the number of files in each collection. We will provide real-world examples of these benefits using data products from current data archives.