The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

Evans, Ben

IN53A-3798:

The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

Friday, 19 December 2014

Ben James Kingston Evans¹, Tim Pugh², Lesley A Wyborn³, David Porter⁴, Chris Allen⁴, Jon Smillie⁴, Joseph Antony⁴, Claire Trenham⁴, Bradley John Evans⁵, Duan Beckett², Tim Erwin⁶, Edward King⁷, Jonathan Hodge⁸, Robert Woodcock⁹, Ryan Fraser¹⁰ and David Tondl Lescinsky¹¹, (1)Australian National University, Canberra, ACT, Australia, (2)Bureau of Meteorology, Melbourne, Australia, (3)Geoscience Australia, Canberra, ACT, Australia, (4)Australian National University, Canberra, Australia, (5)Macquarie University, SOUTH TURRAMURRA, Australia, (6)Commonwealth Scientific and Industrial Research Organisation - CSIRO, Melbourne, Australia, (7)CSIRO Marine and Atmospheric Research Hobart, Hobart, Australia, (8)Commonwealth Scientific and Industrial Research Organisation - CSIRO, Brisbane, Australia, (9)CSIRO Land and Water Canberra, Canberra, Australia, (10)CSIRO, Kensington, WA, Australia, (11)Geoscience Australia, Canberra, Australia

Abstract:

The National Computational Infrastructure (NCI) has co-located a priority set of national data assets within a HPC research platform. This powerful in-situ computational platform has been created to help serve and analyse the massive amounts of data across the spectrum of environmental collections – in particular the climate, observational data and geoscientific domains. This paper examines the infrastructure, innovation and opportunity for this significant research platform.

NCI currently manages nationally significant data collections (10+ PB) categorised as 1) earth system sciences, climate and weather model data assets and products, 2) earth and marine observations and products, 3) geosciences, 4) terrestrial ecosystem, 5) water management and hydrology, and 6) astronomy, social science and biosciences. The data is largely sourced from the NCI partners (who include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. By co-locating these large valuable data assets, new opportunities have arisen by harmonising the data collections, making a powerful transdisciplinary research platform

The data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems.

New scientific software, cloud-scale techniques, server-side visualisation and data services have been harnessed and integrated into the platform, so that analysis is performed seamlessly across the traditional boundaries of the underlying data domains. Characterisation of the techniques along with performance profiling ensures scalability of each software component, all of which can either be enhanced or replaced through future improvements.

A Development-to-Operations (DevOps) framework has also been implemented to manage the scale of the software complexity alone. This ensures that software is both upgradable and maintainable, and can be readily reused with complexly integrated systems and become part of the growing global trusted community tools for cross-disciplinary research.

IN53A-3798: The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

Abstract:

IN53A-3798:

The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections