IN41C-1709
Digital Crust: Information architecture for heterogeneous data integration

Thursday, 17 December 2015
Poster Hall (Moscone South)
Stephen M Richard, Arizona Geological Survey, Tucson, AZ, United States
Abstract:
The Digital Crust EarthCube Building block is addressing the issue of multiple, heterogeneous but related datasets characteristic of field and sample based research using a ‘loose-schema’ approach, with linked entity and attribute definitions in an information model (ontology) registry (IMR). Various data entities (RDA ‘data types’) are defined by mapping entity and attribute definitions to definitions in the IMR. Inclusion (loading) of new data at the simplest level can bring in entities that are not registered, but these will not be ‘integratable’ with other data until someone does the schema matching into the IMR. New datasets can be designed using registered entity and attributes that will from the beginning be integrated into the system (similar to the approach used by the National Information Exchange Model). The fundamental abstract components in this system are 1) a data repository that allows storage of key-value structured data objects; and 2) a registry that documents information models-- the base data types, attributes and entities -- and mappings from the registered types in the datastore to the registered items. This constitutes the data repository subsystem. Data access is enabled by caching views of aggregated data from the datastore (aggregated based on the semantics of the registered items in the IMR) and creating indexes based on the registered items in the IMR. Contributing data to this system will be greatly facilitated by using existing, documented information models. It can accept datasets that are not ‘standardized’ as well, but the consequence is that those data will not be integratable with other existing data until the work is done to document the entities and attributes in the data and to map those into existing registered types.