IN33A-3758:
Geo-Semantic Framework for Integrating Long-Tail Data and Model Resources for Advancing Earth System Science

Wednesday, 17 December 2014
Praveen Kumar and Mostafa Elag, University of Illinois, Urbana, IL, United States
Abstract:
Often, scientists and small research groups collect data, which target to address issues and have limited geographic or temporal range. A large number of such collections together constitute a large database that is of immense value to Earth Science studies. Complexity of integrating these data include heterogeneity in dimensions, coordinate systems, scales, variables, providers, users and contexts. They have been defined as long-tail data. Similarly, we use “long-tail models” to characterize a heterogeneous collection of models and/or modules developed for targeted problems by individuals and small groups, which together provide a large valuable collection. Complexity of integrating across these models include differing variable names and units for the same concept, model runs at different time steps and spatial resolution, use of differing naming and reference conventions, etc. Ability to “integrate long-tail models and data” will provide an opportunity for the interoperability and reusability of communities’ resources, where not only models can be combined in a workflow, but each model will be able to discover and (re)use data in application specific context of space, time and questions. This capability is essential to represent, understand, predict, and manage heterogeneous and interconnected processes and activities by harnessing the complex, heterogeneous, and extensive set of distributed resources. Because of the staggering production rate of long-tail models and data resulting from the advances in computational, sensing, and information technologies, an important challenge arises: how can geoinformatics bring together these resources seamlessly, given the inherent complexity among model and data resources that span across various domains. We will present a semantic-based framework to support integration of “long-tail” models and data. This builds on existing technologies including: (i) SEAD (Sustainable Environmental Actionable Data) which supports curation and preservation of long-tail data during its life-cycle; (ii) BrownDog, which enhances the machine interpretability of large unstructured and uncurated data; and (iii) CSDMS (Community Surface Dynamics Modeling System), which “componentizes” models by providing plug-and-play environment for models integration.