IN33F-06
Evaluation of data analytic approaches to generating cross-domain mappings of controlled science vocabularies

Wednesday, 16 December 2015: 14:55
2020 (Moscone West)
Stephan Zednik, Rensselaer Polytechnic Institute, Troy, NY, United States
Abstract:
Recent data publication practices have made increasing amounts of diverse datasets available online for the general research community to explore and integrate. Even with the abundance of data online, relevant data discovery and successful integration is still highly dependent upon the data being published with well-formed and understandable metadata. Tagging a dataset with well-known or controlled community terms is a common mechanism to indicate the intended purpose, subject matter, or other relevant facts of a dataset, however controlled domain terminology can be difficult for cross-domain researchers to interpret and leverage. It is also a challenge for integration portals to successfully provide cross-domain search capabilities over data holdings described using many different controlled vocabularies.

Mappings between controlled vocabularies can be challenging because communities frequently develop specialized terminologies and have highly specific and contextual usages of common words. Despite this specificity it is highly desirable to produce cross-domain mappings to support data integration.

In this contribution we evaluate the applicability of several data analytic techniques for the purpose of generating mappings between hierarchies of controlled science terms. We hope our efforts initiate more discussion on the topic and encourage future mapping efforts.