IN21C-3714:
Environmental Data Store (EDS): A multi-node Data Storage Facility for diverse sets of Geoscience Data
Tuesday, 16 December 2014
Michael Piasecki, City College New York, New York City, NY, United States and Peng Ji, The City College of New York, New York, NY, United States
Abstract:
Geoscience data comes in many flavors that are determined by type of data such as continous on a grid or mesh or discrete colelcted at point either as one time samples or a stream of data coming of sensors, but coudl also encompass digital files of any time type such text files, WORD or EXCEL documents, or audio and video files. We present a storage facility that is comprsed of 6 nodes each of speciaized to host a certain data type: grid based data (netCDF on a THREDDS server), GIS data (shapefiles using GeoServer), point time series data (CUAHSI ODM), sample data (EDBS), and any digital data (RAMADAA) plus a server fro Remote sensing data and its products. While there is overlap in data type storage capabilities (rasters can go into several of these nodes) we prefer to use dedicated storage facilities that are a) freeware, and b) have a good degree of maturity, and c) have shown their utility for stroing a cetain type. In addition it allows to place these commonly used software stacks and storage solutiosn side-by-side to develop interoprability strategies. We have used a DRUPAL based system to handle user regoistration and authentication, and also use the system for data submission and data search. In support for tis system we developed an extensive controlled vocabulary system that is an amalgamation of various CVs used in the geosciecne community in order to achieve as high a degree of recognition, such the CF conventions, CUAHSI Cvs, , NASA (GCMD), EPA and USGS taxonomies, GEMET, in addition to ontological representations such as SWEET.