IN33C-3785:
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web

Wednesday, 17 December 2014
Benno Man-ho Lee1, Sumit Purohit2, William Smith2, Jesse Weaver2, Alan Chappell2, Patrick West1 and Peter Arthur Fox3, (1)Rensselaer Polytechnic Institute, Troy, NY, United States, (2)Pacific Northwest National Laboratory, Richland, WA, United States, (3)Rensselaer Polytechnic Inst., Troy, NY, United States
Abstract:
The volume and variety of data generated in science is rapidly increasing. Geophysical science is no exception in that various independent projects produce disparate, heterogeneous datasets. While researchers typically make this data available to others, there is a need to make these valuable resources more discoverable and understandable to user communities in order to accelerate scientific research.  The cost of making data discoverable and understandable depends on how the original data was curated, transformed, generated, and published.  User interfaces and visualizations that support exploration and interaction with the data further enhance understanding of available content.
This presentation describes research and development conducted under the Resource Discovery for Extreme Scale Collaboration (RDESC) project. As part of RDESC we curate, clean, publish, and visualize scientific data following Linked Data principles. Towards enabling discovery and understandability, we curated data from multiple, interdisciplinary science domains and represented the metadata using standard Semantic Web and Web technologies. As a result of this transformation, we generated some 1.4 billion RDF triples that describe these previously existing data resources. These efforts led to our formulation of a number of suggested best practices for data publishers to reduce the cost and barriers to making data discoverable and understandable to research communities. Additionally, we developed a set of tools that provide scalable visualizations of this large-scale metadata to enhance the understandability for prospective users of the data resources.