IN13C-1851
Standardised online data access and publishing for Earth Systems and Climate data in Australia
Abstract:
The National Computational Infrastructure (NCI) hosts Australia’s largest repository (10+ PB) of research data collections spanning a wide range of fields from climate, coasts, oceans, and geophysics through to astronomy, bioinformatics, and the social sciences. Spatial scales range from global to local ultra-high resolution, requiring storage volumes from MB to PB. The data have been organised to be highly connected to both the NCI HPC and cloud resources (e.g., interactive visualisation and analysis environments). Researchers can login to utilise the high performance infrastructure for these data collections, or access the data via standards-based web services. Our aim is to provide a trusted platform to support interdisciplinary research across all the collections as well as services for use of the data within individual communities.We thus cater to a wide range of researcher needs, whilst needing to maintain a consistent approach to data management and publishing. All research data collections hosted at NCI are governed by a data management plan, prior to being published through a variety of platforms and web services such as OPeNDAP, HTTP, and WMS. The data management plan ensures the use of standard formats (when available) that comply with relevant data conventions (e.g., CF-Convention) and metadata standards (e.g., ISO19115). Digital Object Identifiers (DOIs) can be minted at NCI and assigned to datasets and collections. Large scale data growth and use in a variety of research fields has led to a rise in, and acceptance of, open spatial data formats such as NetCDF4/HDF5, prompting a need to extend these data conventions to fields such as geophysics and satellite Earth observations.
The fusion of DOI-minted data that is discoverable and accessible via metadata and web services, creates a complete picture of data hosting, discovery, use, and citation. This enables standardised and reproducible data analysis.