IN53B-1849
Object Storage for Geophysical Data

Friday, 18 December 2015
Poster Hall (Moscone South)
John Readey, HDF Group, Champaign, IL, United States
Abstract:
Object storage systems (such as Amazon S3 or Ceph) have been shown to be cost-effective and highly scalable for data repositories in the Petabyte range and larger. However traditionally data storage used for geophysical software systems has centered on file-based systems and libraries such as NetCDF and HDF5. In this session we’ll discuss the advantages and challenges of moving to an object store-based model for geophysical data. We’ll review a proposed model for a geophysical data service that provides an API-compatible library for traditional NetCDF and HDF5 applications while providing high scalability and performance. One further advantage of this approach is that any dataset or dataset selection can be referenced as a URI. By using versioning, the data the URI references can be guaranteed to be unmodified, thus enabling reproducibility of referenced data.