IN23E-01
The DataBridge: A System For Optimizing The Use Of Dark Data From The Long Tail Of Science

Tuesday, 15 December 2015: 13:40
2020 (Moscone West)
Howard Lander, Renaissance Computing Institute, Chapel Hill, NC, United States
Abstract:
The DataBridge is a National Science Foundation funded collaborative project (OCI-1247652, OCI-1247602, OCI-1247663) designed to assist in the discovery of dark data sets from the long tail of science. The DataBridge aims to to build queryable communities of datasets using sociometric network analysis. This approach is being tested to evaluate the ability to leverage various forms of metadata to facilitate discovery of new knowledge. Each dataset in the Databridge has an associated name space used as a first level partitioning. In addition to testing known algorithms for SNA community building, the DataBridge project has built a message-based platform that allows users to provide their own algorithms for each of the stages in the community building process. The stages are:
  • Signature Generation (SG): An SG algorithm creates a metadata signature for a dataset. Signature algorithms might use text metadata provided by the dataset creator or derive metadata.
  • Relevance Algorithm (RA): An RA compares a pair of datasets and produces a similarity value between 0 and 1 for the two datasets.
  • Sociometric Network Analysis (SNA): The SNA will operate on a similarity matrix produced by an RA to partition all of the datasets in the name space into a set of clusters. These clusters represent communities of closely related datasets. 

The DataBridge also includes a web application that produces a visual representation of the clustering. Future work includes a more complete application that will allow different types of searching of the network of datasets.

The DataBridge approach is relevant to geoscience research and informatics. In this presentation we will outline the project, illustrate the deployment of the approach, and discuss other potential applications and next steps for the research such as applying this approach to models. In addition we will explore the relevance of DataBridge to other geoscience projects such as various EarthCube Building Blocks and DIBBS projects.