Seismic Data Archive Quality Assurance -- Analytics Adding Value at Scale

Tuesday, 15 December 2015
Poster Hall (Moscone South)
Robert E Casey, Timothy Keith Ahern, Gillian Sharer, Mary E. Templeton, Bruce Weertman and Laura Keyson, IRIS Data Services, IRIS DMC, Seattle, WA, United States
Since the emergence of real-time delivery of seismic data over the last two decades, solutions for near-real-time quality analysis and station monitoring have been developed by data producers and data stewards. This has allowed for a nearly constant awareness of the quality of the incoming data and the general health of the instrumentation around the time of data capture. Modern quality assurance systems are evolving to provide ready access to a large variety of metrics, a rich and self-correcting history of measurements, and more importantly the ability to access these quality measurements en-masse through a programmatic interface.

The MUSTANG project at the IRIS Data Management Center is working to achieve ‘total archival data quality’, where a large number of standardized metrics, some computationally expensive, are generated and stored for all data from decades past to the near present. To perform this on a 300 TB archive of compressed time series requires considerable resources in network I/O, disk storage, and CPU capacity to achieve scalability, not to mention the technical expertise to develop and maintain it. In addition, staff scientists are necessary to develop the system metrics and employ them to produce comprehensive and timely data quality reports to assist seismic network operators in maintaining their instrumentation. All of these metrics must be available to the scientist 24/7.

We will present an overview of the MUSTANG architecture including the development of its standardized metrics code in R. We will show examples of the metrics values that we make publicly available to scientists and educators and show how we are sharing the algorithms used. We will also discuss the development of a capability that will enable scientific researchers to specify data quality constraints on their requests for data, providing only the data that is best suited to their area of study.