Design of Community Resource Inventories as a Component of Scalable Earth Science Infrastructure: Experience of the Earthcube CINERGI Project

Tuesday, 16 December 2014: 5:44 PM
Ilya Zaslavsky1, Stephen M Richard2, David W Valentine Jr1, Jeffrey S Grethe3, Leslie Hsu4, Tanu Malik5, Luis E Bermudez6, Amarnath Gupta1, Kerstin A Lehnert7, Thomas Whitenack1, Ibrahim Burak Ozyurt3, Christopher Condit1, Raquel Calderon1 and Leah Musil2, (1)University of California San Diego, San Diego Supercomputer Center, La Jolla, CA, United States, (2)Arizona Geological Survey, Tucson, AZ, United States, (3)University of California San Diego, La Jolla, CA, United States, (4)Lamont-Doherty Earth Obs, Palisades, NY, United States, (5)University of Chicago, Chicago, IL, United States, (6)Open Geospatial Consortium, Gaithersburg, MD, United States, (7)Columbia University, Palisades, NY, United States
EarthCube is envisioned as a cyberinfrastructure that fosters new, transformational geoscience by enabling sharing, understanding and scientifically-sound and efficient re-use of formerly unconnected data resources, software, models, repositories, and computational power. Its purpose is to enable science enterprise and workforce development via an extensible and adaptable collaboration and resource integration framework. A key component of this vision is development of comprehensive inventories supporting resource discovery and re-use across geoscience domains. The goal of the EarthCube CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project is to create a methodology and assemble a large inventory of high-quality information resources with standard metadata descriptions and traceable provenance. The inventory is compiled from metadata catalogs maintained by geoscience data facilities, as well as from user contributions. The latter mechanism relies on community resource viewers: online applications that support update and curation of metadata records. Once harvested into CINERGI, metadata records from domain catalogs and community resource viewers are loaded into a staging database implemented in MongoDB, and validated for compliance with ISO 19139 metadata schema. Several types of metadata defects detected by the validation engine are automatically corrected with help of several information extractors or flagged for manual curation. The metadata harvesting, validation and processing components generate provenance statements using W3C PROV notation, which are stored in a Neo4J database. Thus curated metadata, along with the provenance information, is re-published and accessed programmatically and via a CINERGI online application.

This presentation focuses on the role of resource inventories in a scalable and adaptable information infrastructure, and on the CINERGI metadata pipeline and its implementation challenges. Key project components are described at the project’s website (, which also provides access to the initial resource inventory, the inventory metadata model, metadata entry forms and a collection of the community resource viewers.