Achieving Sub-Second Search in the CMR

Friday, 19 December 2014
Jason Gilman1, Kathleen Baynes2, Daniel Pilone3, Andrew E Mitchell1 and Kevin J Murphy1, (1)NASA Goddard Space Flight Center, Greenbelt, MD, United States, (2)Raytheon Company Riverdale, Riverdale, MD, United States, (3)Organization Not Listed, Washington, DC, United States
The Common Metadata Repository (CMR) is the next generation Earth Science Metadata catalog for NASA’s Earth Observing data. It joins together the holdings from the EOS Clearing House (ECHO) and the Global Change Master Directory (GCMD), creating a unified, authoritative source for EOSDIS metadata. The CMR allows ingest in many different formats while providing consistent search behavior and retrieval in any supported format. Performance is a critical component of the CMR, ensuring improved data discovery and client interactivity. The CMR delivers sub-second search performance for any of the common query conditions (including spatial) across hundreds of millions of metadata granules. It also allows the addition of new metadata concepts such as visualizations, parameter metadata, and documentation.

The CMR's goals presented many challenges. This talk will describe the CMR architecture, design, and innovations that were made to achieve its goals. This includes:

* Architectural features like immutability and backpressure.
* Data management techniques such as caching and parallel loading that give big performance gains.
* Open Source and COTS tools like Elasticsearch search engine.
* Adoption of Clojure, a functional programming language for the Java Virtual Machine.
* Development of a custom spatial search plugin for Elasticsearch and why it was necessary.
* Introduction of a unified model for metadata that maps every supported metadata format to a consistent domain model.