A23E-3309:
Access to the NCAR Research Data Archive via the Globus Data Transfer Service

Tuesday, 16 December 2014
Thomas Cram1, Douglas Schuster2, Zaihua Ji3 and Steven J Worley3, (1)National Center for Atmospheric Research, Boulder, CO, United States, (2)Nat'l Ctr for Atmospheric Research, Boulder, CO, United States, (3)NCAR, Boulder, CO, United States
Abstract:
The NCAR Research Data Archive (RDA; http://rda.ucar.edu) contains a large and diverse collection of meteorological and oceanographic observations, operational and reanalysis outputs, and remote sensing datasets to support atmospheric and geoscience research. The RDA contains greater than 600 dataset collections which support the varying needs of a diverse user community. The number of RDA users is increasing annually, and the most popular method used to access the RDA data holdings is through web based protocols, such as wget and cURL based scripts. In the year 2013, 10,000 unique users downloaded greater than 820 terabytes of data from the RDA, and customized data products were prepared for more than 29,000 user-driven requests.

In order to further support this increase in web download usage, the RDA is implementing the Globus data transfer service (www.globus.org) to provide a GridFTP data transfer option for the user community. The Globus service is broadly scalable, has an easy to install client, is sustainably supported, and provides a robust, efficient, and reliable data transfer option for RDA users. This paper highlights the main functionality and usefulness of the Globus data transfer service for accessing the RDA holdings.

The Globus data transfer service, developed and supported by the Computation Institute at The University of Chicago and Argonne National Laboratory, uses the GridFTP as a fast, secure, and reliable method for transferring data between two endpoints. A Globus user account is required to use this service, and data transfer endpoints are defined on the Globus web interface. In the RDA use cases, the access endpoint is created on the RDA data server at NCAR. The data user defines the receiving endpoint for the data transfer, which can be the main file system at a host institution, a personal work station, or laptop. Once initiated, the data transfer runs as an unattended background process by Globus, and Globus ensures that the transfer is accurately fulfilled. Users can monitor the data transfer progress on the Globus web interface and optionally receive an email notification once it is complete. Globus also provides a command-line interface to support scripted transfers, which can be useful when embedded in data processing workflows.