IN13A-1827
Curating and Preserving the Big Canopy Database System: an Active Curation Approach using SEAD
Monday, 14 December 2015
Poster Hall (Moscone South)
Jim Myers1, Judith B Cushing2, Peter Lynn2, Noah Weiner2, Anna Ovchinnikova1, Nalini Nadkarni3 and Anne McIntosh4, (1)University of Michigan Ann Arbor, Ann Arbor, MI, United States, (2)Evergreen State College, Olympia, WA, United States, (3)University of Utah, Biology, Salt Lake City, UT, United States, (4)University of Alberta, Edmonton, AB, Canada
Abstract:
Modern research is increasingly dependent upon highly heterogeneous data and on the associated cyberinfrastructure developed to organize, analyze, and visualize that data. However, due to the complexity and custom nature of such combined data-software systems, it can be very challenging to curate and preserve them for the long term at reasonable cost and in a way that retains their scientific value. In this presentation, we describe how this challenge was met in preserving the Big Canopy Database (CanopyDB) system using an agile approach and leveraging the Sustainable Environment – Actionable Data (SEAD) DataNet project’s hosted data services. The CanopyDB system was developed over more than a decade at Evergreen State College to address the needs of forest canopy researchers. It is an early yet sophisticated exemplar of the type of system that has become common in biological research and science in general, including multiple relational databases for different experiments, a custom database generation tool used to create them, an image repository, and desktop and web tools to access, analyze, and visualize this data. SEAD provides secure project spaces with a semantic content abstraction (typed content with arbitrary RDF metadata statements and relationships to other content), combined with a standards-based curation and publication pipeline resulting in packaged research objects with Digital Object Identifiers. Using SEAD, our cross-project team was able to incrementally ingest CanopyDB components (images, datasets, software source code, documentation, executables, and virtualized services) and to iteratively define and extend the metadata and relationships needed to document them. We believe that both the process, and the richness of the resultant standards-based (OAI-ORE) preservation object, hold lessons for the development of best-practice solutions for preserving scientific data in association with the tools and services needed to derive value from it.