IN43B-1736
LVFS: A Big Data File Storage Bridge for the HPC Community
Abstract:
Merging Big Data capabilities into High Performance Computing architecture starts at the file storage level. Heterogeneous storage systems are emerging which offer enhanced features for dealing with Big Data such as the IBM GPFS storage system’s integration into Hadoop Map-Reduce. Taking advantage of these capabilities requires file storage systems to be adaptive and accommodate these new storage technologies. We present the extension of the Lightweight Virtual File System (LVFS) currently running as the production system for the MODIS Level 1 and Atmosphere Archive and Distribution System (LAADS) to incorporate a flexible plugin architecture which allows easy integration of new HPC hardware and/or software storage technologies without disrupting workflows, system architectures and only minimal impact on existing tools.We consider two essential aspects provided by the LVFS plugin architecture needed for the future HPC community. First, it allows for the seamless integration of new and emerging hardware technologies which are significantly different than existing technologies such as Segate’s Kinetic disks and Intel’s 3DXPoint non-volatile storage. Second is the transparent and instantaneous conversion between new software technologies and various file formats. With most current storage system a switch in file format would require costly reprocessing and nearly doubling of storage requirements.
We will install LVFS on UMBC’s IBM iDataPlex cluster with a heterogeneous storage architecture utilizing local, remote, and Seagate Kinetic storage as a case study. LVFS merges different kinds of storage architectures to show users a uniform layout and, therefore, prevent any disruption in workflows, architecture design, or tool usage. We will show how LVFS will convert HDF data produced by applying machine learning algorithms to Xco2 Level 2 data from the OCO-2 satellite to produce CO2 surface fluxes into GeoTIFF for visualization.