IN33A-3760:
Linking descriptive geology and quantitative machine learning through an ontology of lithological concepts

Wednesday, 17 December 2014
Jens F Klump1, Robert Huber2, Jess Robertson1, Simon J D Cox3 and Robert Woodcock4, (1)CSIRO Mineral Resources Flagship, Kensington, WA, Australia, (2)MARUM - University of Bremen, Bremen, Germany, (3)CSIRO Land and Water, Highett, Australia, (4)CSIRO Digital Productivity & Services, Acton, ACT, Australia
Abstract:
Despite the recent explosion of quantitative geological data, geology remains a fundamentally qualitative science. Numerical data only constitute a certain part of data collection in the geosciences. In many cases, geological observations are compiled as text into reports and annotations on drill cores, thin sections or drawings of outcrops. The observations are classified into concepts such as lithology, stratigraphy, geological structure, etc. These descriptions are semantically rich and are generally supported by more quantitative observations using geochemical analyses, XRD, hyperspectral scanning, etc, but the goal is geological semantics. In practice it has been difficult to bring the different observations together due to differing perception or granularity of classification in human observation, or the partial observation of only some characteristics using quantitative sensors.

In the past years many geological classification schemas have been transferred into ontologies and vocabularies, formalized using RDF and OWL, and published through SPARQL endpoints. Several lithological ontologies were compiled by stratigraphy.net and published through a SPARQL endpoint. This work is complemented by the development of a Python API to integrate this vocabulary into Python-based text mining applications.

The applications for the lithological vocabulary and Python API are

  1. automated semantic tagging of geochemical data and descriptions of drill cores,
  2. machine learning of geochemical compositions that are diagnostic for lithological classifications, and
  3. text mining for lithological concepts in reports and geological literature.

This combination of applications can be used to identify anomalies in databases, where composition and lithological classification do not match. It can also be used to identify lithological concepts in the literature and infer quantitative values. The resulting semantic tagging opens new possibilities for linking these diverse sources of data.