Enabling Data Intensive Science through Service Oriented Science: Virtual Laboratories and Science Gateways

Tuesday, 16 December 2014
David Tondl Lescinsky1, Lesley A Wyborn2, Ben James Kingston Evans3, Chris Allen4, Ryan Fraser5 and Terry Rankine5, (1)Geoscience Australia, Canberra, Australia, (2)Geoscience Australia, Canberra, ACT, Australia, (3)Australian National University, Canberra, ACT, Australia, (4)Australian National University, Canberra, Australia, (5)CSIRO, Kensington, WA, Australia
We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia’s National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI’s 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud.

The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes.

We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).

The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.