B12C-06
SEEDS for Big Dark Data Collection and Sharing in Biogeochemistry: LeafWeb as an Example

Monday, 14 December 2015: 11:35
2004 (Moscone West)
Lianhong Gu, Oak Ridge National Laboratory, Oak Ridge, TN, United States
Abstract:
   Dark data are any data of value to society but not easily found or readily used by potential users. Dark data are typically those gathered by individual scientists in small independent projects, in contrast to visible data collected, in general, by large coordinated teams in organized projects with well-executed data management and sharing plans. Analogous to dark matter vs. visible matter in the universe, dark data may be more voluminous than visible data as majority of the scientists of the world work in small independent projects. Dark data can be easily lost to science. The scientific community has a collective responsibility to curb this wasteful loss of valuable research resources. We believe this responsibility can be carried out without adding an undue administrative burden to data contributors if an innovative information system implementing the SErvices in Exchange for Data Sharing (SEEDS) principle is available. It is a tedious and time-consuming process to make data ready for long-term archiving and for sharing with the broader scientific community. At present, this process occurs after individual scientists have already spent considerable amount of time to prepare the data for rigorous mathematical and statistical analyses to satisfy their own research objectives, which generally culminates in the publication of scientific papers. Productive scientists are busy. Even if they are willing to share their data, they may not have the time needed to take cumbersome extra steps to make the data understandable and usable to others. Lack of motivation may also become an issue after the scientific findings of interest to them have been extracted and published; for them, this is a time to look for a new exciting research frontier to attack, not a time to dwell on past research. SEEDS integrates data analyses, sharing, and management into a single IS service process, which reduces data contributors’ own research workload and also eliminates the time cost of data sharing. In this presentation, I will use LeafWeb (leafweb.ornl.gov) to demonstrate how the SEEDS principle can be implemented in the context of Earth system science and biogeochemistry.