Components for Maintaining and Publishing Earth Science Vocabularies

Tuesday, 16 December 2014
Simon J D Cox and Jonathan Yu, CSIRO, Land and Water, Highett, Vic, Australia
Shared vocabularies are an important aid to geoscience data interoperability. Many organizations maintain useful vocabularies, with Geologic Surveys having a particularly long history of vocabulary and lexicon development. However, the mode of publication is heterogeneous, ranging from PDFs and HTML web pages, spreadsheets and CSV, through various user-interfaces and APIs. Update and maintenance ranges from tightly-governed and externally opaque, through various community processes, all the way to crowd-sourcing (‘folksonomies’). A general expectation, however, is for greater harmonization and vocabulary re-use. In order to be successful this requires (a) standardized content formalization and APIs (b) transparent content maintenance and versioning. We have been trialling a combination of software dealing with registration, search and linking.

SKOS is designed for formalizing multi-lingual, hierarchical vocabularies, and has been widely adopted in earth and environmental sciences. SKOS is an RDF vocabulary, for which SPARQL is the standard low-level API. However, for interoperability between SKOS vocabulary sources, a SKOS-based API (i.e. based on the SKOS predicates prefLabel, broader, narrower, etc) is required. We have developed SISSvoc for this purpose, and used it to deploy a number of vocabularies on behalf of the IUGS, ICS, NERC, OGC, the Australian Government, and CSIRO projects. SISSvoc Search provides simple search UI on top of one or more SISSvoc sources.

Content maintenance is composed of many elements, including content-formalization, definition-update, and mappings to related vocabularies. Typically there is a degree of expert judgement required. In order to provide confidence in users, two requirements are paramount: (i) once published, a URI that denotes a vocabulary item must remain dereferenceable; (ii) the history and status of the content denoted by a URI must be available. These requirements match the standard ‘registration’ paradigm which is implemented in the Linked Data Registry, which is currently used by WMO and the UK Environment Agency for publication of vocabularies.

Together, these components provide a powerful and flexible system for providing earth science vocabularies for the community, consistent with semantic web and linked-data principles.