IN13C-1845
Past, Current, and Future Challenges in Linking Data to Publications

Monday, 14 December 2015
Poster Hall (Moscone South)
Brooks Hanson, American Geophysical Union, Washington, DC, United States
Abstract:
Data are the currency of science and assure the integrity of published research. As the ability to collect, analyze, and visualize data has grown beyond what could be included in a publication, and as the value of the data become more clear (or the lack of availability of data was criticized), publishers and the scientific community developed several solutions to enhance access to underlying data. Most leading journals now require authors to agree as a condition of submission that underlying data will be included or made available; indeed, publication is the key leverage point in exposing much scholarly data. Most journals allow PDF or other supplements and links to data sets hosted by authors or labs, or better, data repositories such as Dryad, and some have banned “data not shown” or any reference to unpublished work. Many of these solutions have proven problematic and recent studies have found that large fraction of data are undiscoverable even a few years after publication. The best solution has been dedicated domain repositories collectively supported by publishers, funders, and the scientific community and where deposition is required before or at the time of publication. These provide quality control and curation and facilitate reuse. However, expanding these beyond a few key repositories and developing standardized workflows and functionality among repositories and between them and publishers has been problematic. Addressing these and other data challenges requires collaborative efforts among funders, publishers, repositories, societies, and researchers. One example is the Coalition on Publishing Data in the the Earth and space sciences, where most major publishers and repositories have signed a joint statement of commitment (COPDESS.org), and are starting work to direct and link published data to domain repositories. Much work remains to be done. Major challenges include establishing data curation practices into the workflow of science from data collection through peer-review, developing efficient linking of data among repositories and publishers, and establishing common practices regarding minimal and functional requirements for data, code and sample identification and preservation in disparate disciplines, and sustaining the repositories.