Finding Data Only Gets You So Far: The Role of Data Differentiation in Data Discovery
Abstract:The Earth science data management community has dedicated significant energy and technology towards improving the discoverability of the ever-growing wealth of scientific data. As we make strides towards this goal and more data sets are exposed in online catalogs, a new challenge emerges – how do users determine which data to use. Met with the wave of ever greater and more complex data, users must be able to differentiate between data sets and determine which are most useful and relevant to their applications. In response to this, the NASA National Snow and Ice Data Center Distributed Active Archive Center (NSIDC DAAC) has been developing a data set search tool with researchers' needs and preferences in mind. Web metrics and usability studies have been used to answer the following questions:
· How do users search for data?
· Which data attributes are of primary importance when selecting a data set?
· When similar data sets exist, how is one chosen over another?
NSIDC DAAC has found that data set differentiation is driven by both what you present and how you present it. The “what” can range from relatively straightforward information such as the parameter measured or the spatial coverage to the more complex and nuanced such as data usage suggestions or data usage metrics. The “how” includes considerations such as making data set information scannable and comparable to tailoring relevance ranking algorithms to match user expectations. In this presentation, we will discuss the challenge of data differentiation, the strategies the NSIDC DAAC has employed with its search tool, and the opportunities available for assisting researchers in their data selection process.