Applying data stream processing to handle high volume metadata management needed by ingest and discovery software systems and compatible with on-premise, cloud, or hybrid deployment environments.

David Neufeld1, Evan McQuinn2, Arianna Jakositz2, Zeb Delk2, Elliott Richerson2, Chris Esterlein3 and Semere Ghebrechristos3, (1)CIRES, National Centers for Environmental Information (NCEI), Boulder, CO, United States, (2)Cooperative Institute for Research in Environmental Sciences, Boulder, CO, United States, (3)Cooperative Institute for Research in Environmental Sciences, Boulder, United States
Abstract:
Workflows and metadata supporting Ocean Science data discovery are evolving along with cloud technologies. This talk will highlight an approach under development at NCEI that utilizes Apache Kafka, ElasticSearch, and Kubernetes to support high volume metadata generation, storage and discovery for the National Oceanic and Atmospheric Administration’s National Centers for Environmental Information (NCEI). Developers for the Cooperative Institute for Research in Environmental Science in collaboration with NESDIS and NCEI partners have developed and deployed a cloud native stack using an event sourcing approach that integrates with data ingest systems running on premise in the case of NCEI and in the cloud in the case of NESDIS cloud pilot activities. The cloud native and vendor neutral approach has allowed the team to be successful in supporting strategic changes in direction that are accelerating towards a cloud based deployment environment while still able to maintain a “bridge” implementation that will run on premise while the transition to the cloud is in process. Finally, we will discuss the architecture that allows us to take advantage of vendor specific technologies when appropriate.