PA53A-2232
Challenges for Data Archival Centers in Evolving Environmental Sciences

Friday, 18 December 2015
Poster Hall (Moscone South)
Yaxing Wei, Oak Ridge National Laboratory, Oak Ridge, TN, United States
Abstract:
Environmental science has entered into a big data era as enormous data about the Earth environment are continuously collected through field and airborne missions, remote sensing observations, model simulations, sensor networks, etc. An open-access and open-management data infrastructure for data-intensive science is a major grand challenge in global environmental research (BERAC, 2010). Such an infrastructure, as exemplified in EOSDIS, GEOSS, and NSF EarthCube, will provide a complete lifecycle of environmental data and ensures that data will smoothly flow among different phases of collection, preservation, integration, and analysis. Data archival centers, as the data integration units closest to data providers, serve as the source power to compile and integrate heterogeneous environmental data into this global infrastructure. This presentation discusses the interoperability challenges and practices of geosciences from the aspect of data archival centers, based on the operational experiences of the NASA-sponsored Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) and related environmental data management activities. Specifically, we will discuss the challenges to 1) encourage and help scientists to more actively share data with the broader scientific community, so that valuable environmental data, especially those dark data collected by individual scientists in small independent projects, can be shared and integrated into the infrastructure to tackle big science questions; 2) curate heterogeneous multi-disciplinary data, focusing on the key aspects of identification, format, metadata, data quality, and semantics to make them ready to be plugged into a global data infrastructure. We will highlight data curation practices at the ORNL DAAC for global campaigns such as BOREAS, LBA, SAFARI 2000; and 3) enhance the capabilities to more effectively and efficiently expose and deliver “big” environmental data to broad range of users and systems. Experiences and challenges with integrating large data sets via the ORNL DAAC’s data discovery and delivery Web services will be discussed.