IN51C-07
Between a Map and a Data Rod

Friday, 18 December 2015: 09:28
2020 (Moscone West)
William L Teng1, Hualan Rui2, Richard F Strub3 and Bruce Vollmer3, (1)NASA GSFC (ADNET Systems), Greenbelt, MD, United States, (2)ADNET Systems Inc. Greenbelt, Greenbelt, MD, United States, (3)NASA Goddard Space Flight Center, Greenbelt, MD, United States
Abstract:
A “Digital Divide” has long stood between how NASA and other satellite-derived data are typically archived (time-step arrays or "maps") and how hydrology and other point-time series oriented communities prefer to access those data. In essence, the desired method of data access is orthogonal to the way the data are archived. Our approach to bridging the Divide is part of a larger NASA-supported “data rods” project to enhance access to and use of NASA and other data by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) Hydrologic Information System (HIS) and the larger hydrology community. Our main objective was to determine a way to reorganize data that is optimal for these communities. Two related objectives were to optimally reorganize data in a way that (1) is operational and fits in and leverages the existing Goddard Earth Sciences Data and Information Services Center (GES DISC) operational environment and (2) addresses the scaling up of data sets available as time series from those archived at the GES DISC to potentially include those from other Earth Observing System Data and Information System (EOSDIS) data archives. Through several prototype efforts and lessons learned, we arrived at a non-database solution that satisfied our objectives/constraints. We describe, in this presentation, how we implemented the operational production of pre-generated data rods and, considering the tradeoffs between length of time series (or number of time steps), resources needed, and performance, how we implemented the operational production of on-the-fly (“virtual”) data rods. For the virtual data rods, we leveraged a number of existing resources, including the NASA Giovanni Cache and NetCDF Operators (NCO) and used data cubes processed in parallel. Our current benchmark performance for virtual generation of data rods is about a year’s worth of time series for hourly data (~9,000 time steps) in ~90 seconds. Our approach is a specific implementation of the general optimal strategy of reorganizing data to match the desired means of access. Results from our project have already significantly extended NASA data to the large and important hydrology user community that has been, heretofore, mostly unable to easily access and use NASA data.