IN53B-1842
Sharing meanings: developing interoperable semantic technologies to enhance reproducibility in earth and environmental science research

Friday, 18 December 2015
Poster Hall (Moscone South)
Mark Schildhauer, National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, United States
Abstract:
Earth and environmental scientists are familiar with the entities, processes, and theories germane to their field of study, and comfortable collecting and analyzing data in their area of interest. Yet, while there appears to be consistency and agreement as to the scientific “terms” used to describe features in their data and analyses, aside from a few fundamental physical characteristics—such as mass or velocity-- there can be broad tolerances, if not considerable ambiguity, in how many earth science “terms” map to the underlying “concepts” that they actually represent. This ambiguity in meanings, or “semantics”, creates major problems for scientific reproducibility. It greatly impedes the ability to replicate results—by making it difficult to determine the specifics of the intended meanings of terms such as deforestation or carbon flux -- as to scope, composition, magnitude, etc. In addition, semantic ambiguity complicates assemblage of comparable data for reproducing results, due to ambiguous or idiosyncratic labels for measurements, such as percent cover of forest, where the term “forest” is undefined; or where a reported output of “total carbon-emissions” might just include CO2 emissions, but not methane emissions.

In this talk, we describe how the NSF-funded DataONE repository for earth and environmental science data (http://dataone.org), is using W3C-standard languages (RDF/OWL) to build an ontology for clarifying concepts embodied in heterogeneous data and model outputs. With an initial focus on carbon cycling concepts using terrestrial biospheric model outputs and LTER productivity data, we describe how we are achieving interoperability with “semantic vocabularies” (or ontologies) from aligned earth and life science domains, including OBO-foundry ontologies such as ENVO and BCO; the ISO/OGC O&M; and the NSF Earthcube GeoLink project.

Our talk will also discuss best practices that may be helpful for other groups interested in constructing their own ontologies. Interoperability of community vocabularies will facilitate reproducibility in the earth and environmental sciences, by clarifying when data and results are comparable, as opposed to when we might be using the same term to mean different things, or using different terms to mean the same thing.