IN13C-1856
Key Lessons in Building “Data Commons”: The Open Science Data Cloud Ecosystem

Monday, 14 December 2015
Poster Hall (Moscone South)
Maria Patterson, University of Chicago, Chicago, IL, United States
Abstract:
Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher’s computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications.

Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public.

In our experience operating these resources, researchers are well served by “data commons,” meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets.


As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, “What does it mean to publish data in a data commons?” Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.