The Multi-View Lightweight Virtual File System: Simplifying Collaboration with In-line Data Format Conversion and Big Data Volume Streaming

Friday, 19 December 2014: 3:10 PM
Navid Golpayegani1, Milton Halem2, Gregory Ederer3 and Edward Mauoka1, (1)NASA Goddard Space Flight Center, Greenbelt, MD, United States, (2)University of Maryland Baltimore County, Computer Science, Baltimore, MD, United States, (3)Sigma Space Corporation, Lanham, MD, United States
Effective collaboration starts with the ability to easily and reliably access data across data center holdings regardless of data format. Incorporating a seamless data format conversion system with multiple organizational views of the data presents a formidable challenge given the many different file formats and variable ways data is organized by different data centers for data discovery and file retrieval. We will present results on how the Multi-View Lightweight Virtual File System (MLVFS) can be used by a broad range of machines to seamlessly access data from multiple data centers in client requested formats and views.

We consider two aspects for effective collaboration: first, seeing what data are available, and second, accessing and converting the data on the fly. There are many tools which address these problems. For example, protocols such as OGC-WCS and OPeNDAP allow for standardized methods of data discovery, retrieval, and format conversions. Custom tools developed in-house are used to provide multiple views of available data. These solutions, however, require separate data search, retrieval, or conversion stages, and usually require local storage to hold the converted data. Additionally, these solutions require custom tools to be developed specifically for accessing a particular data center.

With MLVFS, we take a radically different approach to this problem. Data centers can provide their data sets in any desired format or organized in any organizational view using well established protocols. Collaborators will see all datasets of interest to them appear on their local desktops without having to retrieve the data prior to using it. Additionally, MLVFS will perform on the fly conversion of the datasets stored at the data center to the desired format. MLVFS will free the users from unnecessary tasks, such as data retrieval and conversion, and will allow them to concentrate on tasks related to using the datasets. MLVFS will also free up available resources for users, as they will not have to store local copies of the data, and no data will get transferred unless accessed. MLVFS will do this without requiring the user to learn a new protocol or rewrite any code.

We will use MODAPS datasets as a case study for this approach converting the native HDF4 dataset to HDF5 for processing.