Data System Architectures: Recent Experiences from Data Intensive Projects

Thursday, 18 December 2014
Giriprakash Palanisamy1, Mike Thomas Frame2, Tom Boden3, Ranjeet Devarakonda1, Lisa Zolly4, Viv Hutchison4, Natalie Latysh4, Misha Krassovski3, Terri Killeffer5 and Leslie Hook3, (1)Oak Ridge National Laboratory, Oak Ridge, TN, United States, (2)USGS Headquarters, Reston, VA, United States, (3)Oak Ridge National Laboratory, Carbon Dioxide Information Analysis Center, Oak Ridge, TN, United States, (4)USGS Core Science Analytics, Synthesis, and Libraries, Denver, United States, (5)Oak Ridge National Lab, Oak Ridge, TN, United States
U.S. Federal agencies are frequently trying to address new data intensive projects that require next generation of data system architectures. This presentation will focus on two new such architectures: USGS’s Science Data Catalog (SDC) and DOE’s Next Generation Ecological Experiments – Arctic Data System.

The U.S. Geological Survey (USGS) developed a Science Data Catalog ( to include records describing datasets, data collections, and observational or remotely-sensed data. The system was built using service oriented architecture and allows USGS scientists and data providers to create and register their data using either a standards-based metadata creation form or simply to register their already-created metadata records with the USGS SDC Dashboard. This dashboard then compiles the harvested metadata records and sends them to the post processing and indexing service using the JSON format. The post processing service, with the help of various ontologies and other geo-spatial validation services, auto-enhances these harvested metadata records and creates a Lucene index using the Solr enterprise search platform. Ultimately, metadata is made available via the SDC search interface.

DOE’s Next Generation Ecological Experiments (NGEE) Arctic project deployed a data system that allows scientists to prepare, publish, archive, and distribute data from field collections, lab experiments, sensors, and simulated modal outputs. This architecture includes a metadata registration form, data uploading and sharing tool, a Digital Object Identifier (DOI) tool, a Drupal based content management tool (, and a data search and access tool based on ORNL’s Mercury software ( The team also developed Web-metric tools and a data ingest service to visualize geo-spatial and temporal observations.