Exposing USGS sample collections for broader discovery and access: collaboration between ScienceBase, IEDA:SESAR, and Paleobiology Database

Tuesday, 16 December 2014
Leslie Hsu1, Sky Bristol2, Kerstin A Lehnert1, Robert A Arko1, Shanan E Peters3, Mark D Uhen4 and Lulin Song1, (1)Columbia University, Palisades, NY, United States, (2)USGS Core Science Systems, Denver, United States, (3)University of Wisconsin Madison, Geoscience, Madison, WI, United States, (4)George Mason University Fairfax, Fairfax, VA, United States
The U.S. Geological Survey (USGS) is an exemplar of the need for improved cyberinfrastructure for its vast holdings of invaluable physical geoscience data. Millions of discrete paleobiological and geological specimens lie in USGS warehouses and at the Smithsonian Institution. These specimens serve as the basis for many geologic maps and geochemical databases, and are a potential treasure trove of new scientific knowledge. The extent of this treasure is virtually unknown and inaccessible outside a small group of paleogeoscientists and geochemists. A team from the USGS, the Integrated Earth Data Applications (IEDA) facility, and the Paleobiology Database (PBDB) are working to expose information on paleontological and geochemical specimens for discovery by scientists and citizens. This project uses existing infrastructure of the System for Earth Sample Registration (SESAR) and PBDB, which already contains much of the fundamental data schemas that are necessary to accommodate USGS records. The project is also developing a new Linked Data interface for the USGS National Geochemical Database (NGDB). The International Geo Sample Number (IGSN) is the identifier that links samples between all systems.

For paleontological specimens, SESAR and PBDB will be the primary repositories for USGS records, with a data syncing process to archive records within the USGS ScienceBase system. The process began with mapping the metadata fields necessary for USGS collections to the existing SESAR and PBDB data structures, while aligning them with the Observations & Measurements and Darwin Core standards. New functionality needed in SESAR included links to a USGS locality registry, fossil classifications, a spatial qualifier attribution for samples with sensitive locations, and acknowledgement of data and metadata licensing. The team is developing a harvesting mechanism to periodically transfer USGS records from within PBDB and SESAR to ScienceBase. For the NGDB, the samples are being registered with IGSNs in SESAR and the geochemical data are being published as Linked Data. This system allows the USGS collections to benefit from disciplinary and institutional strengths of the participating resources, while simultaneously increasing the discovery, accessibility, and citation of USGS physical collection holdings.