IN21C-1702
Bridging the knowledge gap between Big Data producers and consumers
Abstract:
Most weather data is produced, disseminated and consumed by expert users in large national operational centers or laboratories. Data ‘ages’ off their systems in days or weeks. While archives exist, would-be users often lack the credentials necessary to obtain an account to access or search its contents. Moreover, operational centers and many national archives lack the mandate and the resources to serve non-expert users.The National Center for Atmospheric Research (NCAR) Research Data Archive (RDA), rda.ucar.edu, was created over 40 years ago to collect data for NCAR’s internal Big Science projects such as the NCEP/NCAR Reanalysis Project. Over time, the data holdings have grown to 1.8+ Petabytes spanning 600+ datasets. The user base has also grown; in 2014, we served 1.1 Petabytes of data to over 11,000 unique users.
The RDA works with national centers, such as NCEP, ECMWF and JMA to make their data available to worldwide audiences and mutually support data access at the production source. We have become not just an open-access data center, but also a data education center.
Each dataset archived at the RDA is assigned to a data specialist (DS) who curates the data. If a user has a question not answered in the dataset information web pages prepared by the DS, they can call or email a skilled DS for further clarification. The RDA’s diverse staff—with academic training in meteorology, oceanography, engineering (electrical, civil, ocean and database), mathematics, physics, chemistry and information science—means we likely have someone who “speaks your language.”
Erroneous data assumptions are the Achilles heel of Big Data. It doesn’t matter how much data you crunch if the data is not what you think it is. Data discovery is another difficult Big Data problem; one can only solve problems with data if one can find the right data. Metadata, both machine and human-generated, underpin the RDA data search tools.
The RDA has stepped in to fill the gap between data producers and users.