IEDA: Making Small Data BIG Through Interdisciplinary Partnerships Among Long-tail Domains

Monday, 15 December 2014: 4:00 PM
Kerstin A Lehnert1, Suzanne M Carbotte2, Robert A Arko3, Vicki Lynn Ferrini1, Leslie Hsu2, Lulin Song3, Mark S Ghiorso4 and Douglas J. Walker5, (1)Lamont -Doherty Earth Observatory, Palisades, NY, United States, (2)Lamont-Doherty Earth Obs, Palisades, NY, United States, (3)Lamont-Doherty Earth Observatory, Palisades, NY, United States, (4)OFM Research, Redmond, CA, United States, (5)University of Kansas, Lawrence, KS, United States
The Big Data world in the Earth Sciences so far exists primarily for disciplines that generate massive volumes of observational or computed data using large-scale, shared instrumentation such as global sensor networks, satellites, or high-performance computing facilities. These data are typically managed and curated by well-supported community data facilities that also provide the tools for exploring the data through visualization or statistical analysis. In many other domains, especially those where data are primarily acquired by individual investigators or small teams (known as ‘Long-tail data’), data are poorly shared and integrated, lacking a community-based data infrastructure that ensures persistent access, quality control, standardization, and integration of data, as well as appropriate tools to fully explore and mine the data within the context of broader Earth Science datasets.

IEDA (Integrated Earth Data Applications, is a data facility funded by the US NSF to develop and operate data services that support data stewardship throughout the full life cycle of observational data in the solid earth sciences, with a focus on the data management needs of individual researchers. IEDA builds on a strong foundation of mature disciplinary data systems for marine geology and geophysics, geochemistry, and geochronology. These systems have dramatically advanced data resources in those long-tail Earth science domains. IEDA has strengthened these resources by establishing a consolidated, enterprise-grade infrastructure that is shared by the domain-specific data systems, and implementing joint data curation and data publication services that follow community standards. In recent years, other domain-specific data efforts have partnered with IEDA to take advantage of this infrastructure and improve data services to their respective communities with formal data publication, long-term preservation of data holdings, and better sustainability. IEDA hopes to foster such partnerships with streamlined data services, including user-friendly, single-point interfaces for data submission, discovery, and access across the partner systems to support interdisciplinary science.