IN54A-05
Can Data be Organized for Science and Reuse?

Friday, 18 December 2015: 17:00
2020 (Moscone West)
Ted Habermann, HDF Group, Champaign, IL, United States
Abstract:
The Data Life Cycle is an important general concept for thinking about data collection, management, and preservation practices across the geophysical scientific data community. The cycle generally spans the scientific process from ideation, through experimental design, observation collection, data analysis and visualization, publication, archive, distributions and eventual reuse. During the cycle, the data may change through new analyses, presentations, and responsible parties, but, historically, the format and organization of the data have generally remained the same. Data collected as a time series at a point remains as a time series and data collected/calculated as grids remains as grids. BIP is BIP and BSQ is BSQ. In fact, in many large data centers, the native format remains sacrosanct and, in the scientific community, reformatting is avoided because of fear of losing information or introducing data quality problems and irreproducible results.

This traditional approach has worked well in areas where data are collected and used for a single purpose throughout the life cycle and domains where comparisons across different data sets are rare and problematic due to conflicting data organizational structures or incomplete documentation. This is not the world we live in today. Reuse for unexpected purposes and data (and model) comparisons are becoming increasingly common (e.g. climate model / observation comparisons). Data sets are preserved for future global investigators that may be unaware of the original project or purpose of the data. Also, it is becoming more common for data to be restructured and reformatted for particular problem or to support a flexible web service. Unfortunately, many of these efforts do not preserve the metadata that, hopefully, accompanies the data in the original format.

In this presentation we will discuss alternative approaches to data management that will facilitate data reuse across teams and disciplines.