IN53E-01:
Computational Environments and Analysis methods available on the NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform

Friday, 19 December 2014: 1:40 PM
Ben James Kingston Evans1, Clinton Foster2, Stuart A Minchin3, Tim Pugh4, Adam Lewis3, Lesley A Wyborn2, Bradley John Evans5 and Alf Uhlherr6, (1)Australian National University, Canberra, ACT, Australia, (2)Geoscience Australia, Canberra, ACT, Australia, (3)Geoscience Australia, Canberra, Australia, (4)Bureau of Meteorology, Melbourne, Australia, (5)Macquarie University, SOUTH TURRAMURRA, Australia, (6)Commonwealth Scientific and Industrial Research Organisation - CSIRO, Clayton, Australia
Abstract:
The National Computational Infrastructure (NCI) has established a powerful in-situ computational environment to enable both high performance computing and data-intensive science across a wide spectrum of national environmental data collections – in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress in addressing harmonisation of the underlying data collections for future transdisciplinary research that enable accurate climate projections.

NCI makes available 10+ PB major data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI’s partners (which include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations.

The data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. This computational environment supports a catalogue of integrated reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis.

To enable transdisciplinary research on this scale, data needs to be harmonised so that researchers can readily apply techniques and software across the corpus of data available and not be constrained to work within artificial disciplinary boundaries. Future challenges will involve the further integration and analysis of this data across the social sciences to facilitate the impacts across the societal domain, including timely analysis to more accurately predict and forecast future climate and environmental state.