Community-Supported Data Repositories in Paleobiology: A ‘Middle Tail’ Between the Geoscientific and Informatics Communities

Tuesday, 15 December 2015
Poster Hall (Moscone South)
John W Williams1, Allan C Ashworth2, Julio L Betancourt3, Brian Bills4, Jessica Blois5, Robert Booth6, Philip Buckland7, Donald Charles8, Ben Brandon Curry9, Simon J Goring1, Edward Davis10, Eric C Grimm11, Russell W Graham4, Alison J Smith12 and Neotoma Paleoecology Database Team, (1)University of Wisconsin Madison, Madison, WI, United States, (2)North Dakota State University Main Campus, Geosciences, Fargo, ND, United States, (3)U.S. Geological Survey, Reston, VA, United States, (4)Penn State, University Park, PA, United States, (5)University of California Merced, School of Natural Sciences, Merced, CA, United States, (6)Lehigh University, Earth and Environmental Science, Bethlehem, PA, United States, (7)Umea University, Department of Historical, Philosophical and Religious Studies, Umea, Sweden, (8)Drexel University, Department of Biodiversity, Earth and Environmental Science, Philadelphia, PA, United States, (9)Illinois State Geological Survey, Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, United States, (10)University of Oregon, Department of Geosciences, Eugene, OR, United States, (11)University of Illinois at Urbana Champaign, Plant Biology, Urbana, IL, United States, (12)Kent State University Kent Campus, Department of Geology, Kent, OH, United States
Community-supported data repositories (CSDRs) in paleoecology and paleoclimatology have a decades-long tradition and serve multiple critical scientific needs. CSDRs facilitate synthetic large-scale scientific research by providing open-access and curated data that employ community-supported metadata and data standards. CSDRs serve as a ‘middle tail’ or boundary organization between information scientists and the long-tail community of individual geoscientists collecting and analyzing paleoecological data. Over the past decades, a distributed network of CSDRs has emerged, each serving a particular suite of data and research communities, e.g. Neotoma Paleoecology Database, Paleobiology Database, International Tree Ring Database, NOAA NCEI for Paleoclimatology, Morphobank, iDigPaleo, and Integrated Earth Data Alliance. Recently, these groups have organized into a common Paleobiology Data Consortium dedicated to improving interoperability and sharing best practices and protocols.

The Neotoma Paleoecology Database offers one example of an active and growing CSDR, designed to facilitate research into ecological and evolutionary dynamics during recent past global change. Neotoma combines a centralized database structure with distributed scientific governance via multiple virtual constituent data working groups. The Neotoma data model is flexible and can accommodate a variety of paleoecological proxies from many depositional contests. Data input into Neotoma is done by trained Data Stewards, drawn from their communities. Neotoma data can be searched, viewed, and returned to users through multiple interfaces, including the interactive Neotoma Explorer map interface, REST-ful Application Programming Interfaces (APIs), the neotoma R package, and the Tilia stratigraphic software. Neotoma is governed by geoscientists and provides community engagement through training workshops for data contributors, stewards, and users. Neotoma is engaged in the Paleobiological Data Consortium and other efforts to improve interoperability among cyberinfrastructure in the paleogeosciences.