IEDA Thesaurus: A Controlled Vocabulary for IEDA Systems to Advance Integration

Wednesday, 17 December 2014
Peng Ji1, Kerstin A Lehnert1, Robert A Arko1, Lulin Song1, Leslie Hsu1, Megan R Carter1, Vicki Lynn Ferrini1 and Jason Ash2, (1)Columbia University, Palisades, NY, United States, (2)University of Kansas, Lawrence, KS, United States
Integrated Earth Data Applications (IEDA) is a community-based facility that serves to support, sustain, and advance the geosciences by providing data services for observational geoscience data from the Ocean, Earth, and Polar Sciences. Many dedicated systems such as the Petrological Database (PetDB), Marine Geoscience Data System (MGDS), System for Earth Sample Registration (SESAR), Data Coordination Center for the U.S. Antarctic Program (USAP-DCC), etc., under the umbrella of the IEDA framework, were developed to support the preservation, discovery, retrieval, and analysis of a wide range of observational field and analytical data types from diverse communities. However, it is currently difficult to maintain consistency of indexing content within IEDA schema, and perform unified or precise searching of the data in these diverse systems as each system maintains separate vocabularies, hierarchies, authority files, or sub taxonomies. We present here the IEDA Thesaurus, a system, which combines existing separate controlled vocabularies from the different systems under the IEDA schema into a single master controlled vocabulary, also introducing some new top facets for future long-term use. The IEDA thesaurus contains structured terminology for petrology, geochemistry, sedimentology, oceanography, geochronology, and volcanology, and other general metadata fields. 18 top facets (also called ‘top categories’) are defined, including equipment, geographic gazetteer, geologic ages, geologic units, materials, etc. The terms of the thesaurus are cross validated with others popular geoscience vocabularies such as GeoRef Thesaurus, U.S. Geological Survey Library Classification System, Global Change Master Directory (GCMD), and Semantic Web for Earth and Environmental Terminology (SWEET) ontologies. The thesaurus is organized along with the ANSI/NISO Z39.19-2005 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, and is published using Simple Knowledge Organization System (SKOS) format. The IEDA thesaurus server provides classic web semantic features such as SPARQL, RESTful web services, and unique URI based on open source technologies.