IN21A-3700:
Big Data Solution for CTBT Monitoring Using Global Cross Correlation

Tuesday, 16 December 2014
Pierre Gaillard1, Dmitry Bobrov2, Aurelien Dupont1, Agnès Grenouille1, Ivan O Kitov3 and Mikhail Rozhkov2, (1)CEA Commissariat à l'Energie Atomique DAM, Arpajon Cedex, France, (2)CTBTO Preparatory Commission for the Comprehensive Nuclear Test-Ban Organization, Vienna, Austria, (3)Institute of Geosphere Dynamics RAS, Moscow, Russia
Abstract:
Due to the mismatch between data volume and the performance of the Information Technology infrastructure used in seismic data centers, it becomes more and more difficult to process all the data with traditional applications in a reasonable elapsed time. To fulfill their missions, the International Data Centre of the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO/IDC) and the Département Analyse Surveillance Environnement of Commissariat à l’Energie atomique et aux énergies alternatives (CEA/DASE) collect, process and produce complex data sets whose volume is growing exponentially. In the medium term, computer architectures, data management systems and application algorithms will require fundamental changes to meet the needs. This problem is well known and identified as a “Big Data” challenge.

To tackle this major task, the CEA/DASE takes part during two years to the “DataScale” project. Started in September 2013, DataScale gathers a large set of partners (research laboratories, SMEs and big companies). The common objective is to design efficient solutions using the synergy between Big Data solutions and the High Performance Computing (HPC). The project will evaluate the relevance of these technological solutions by implementing a demonstrator for seismic event detections thanks to massive waveform correlations.

The IDC has developed an expertise on such techniques leading to an algorithm called “Master Event” and provides a high-quality dataset for an extensive cross correlation study. The objective of the project is to enhance the Master Event algorithm and to reanalyze 10 years of waveform data from the International Monitoring System (IMS) network thanks to a dedicated HPC infrastructure operated by the "Centre de Calcul Recherche et Technologie" at the CEA of Bruyères-le-Châtel. The dataset used for the demonstrator includes more than 300,000 seismic events, tens of millions of raw detections and more than 30 terabytes of continuous seismic data from the primary IMS stations.

In this talk, we will present the Master Event algorithm and the associated workflow, we will give an overview of the designed technical solutions (from the building blocks to the global infrastructure), and we will show the preliminary results at a regional scale.