Data federation in a large international satellite project
Data federation in a large international satellite project
Abstract:
The Group for High Resolution Sea Surface Temperature (GHRSST), is a large coordinated international activity to distribute and archive standardized SST datasets from a diverse set of data producers including national space agencies, meteorological agencies and other scientific and operational satellite groups. It currently has over a dozen unique data producers with over 90 unique datasets many of them available with an observational latency of less than 6 hours. In the current distribution model data flows through at least three pipelines in route to a centralized archive(s) for further distribution and data stewardship. This centralized model of GHRSST data stewardship including services, archiving and distribution has worked well in the past but does not scale appropriately or meet the requirements of other data producers than could potentially contribute to GHRSST. Recently the GHRSST project undertook an exercise to architecture a more decentralized model for GHRSST data production and distribution. This principally includes a role for data producers and/or distributors to act as both the access node for distribution with standardized services, and providing archiving and data discovery capabilities. In a direct sense, the future GHRSST data management structure will be more decentralized with a federated approach (more individual distributed components) for data production, distribution, and data inventory search and access. We will describe this structure in detail in our presentation focusing not only on the architecture but the necessary services sack, and search and dataset discovery capabilities that are required, as well as the international metadata and protocol standards it will be built on.