V52C-02
Quantifying comparison of large detrital geochronology data sets

Friday, 18 December 2015: 10:35
306 (Moscone South)
Joel Edward Saylor and Kurt E Sundell II, University of Houston, Department of Earth and Atmospheric Sciences, Houston, TX, United States
Abstract:
The increasing size of detrital geochronological data challenges existing approaches to data visualization and comparison, highlighting the need for quantitative techniques able to compare multiple large data sets. Using the DZstats software package we applied five metrics to twenty large synthetic data sets and one large empirical data set. The metrics included the Kolmogorov-Smirnov (K-S) and Kuiper tests as well as Cross-correlation, Likeness, and Similarity coefficients of probability density plots (PDPs), kernel density estimates (KDEs) and locally adaptive, variable-bandwidth KDEs (LA-KDEs). We evaluate the metrics’ utility based on three criteria: 1) samples from the same population should become systematically more similar with increasing sample size; 2) the metrics should maximize the range of possible coefficients; and 3) the metrics should minimize artifacts resulting from sample-specific complexity. K-S and Kuiper test p-values, and all KDE and LA-KDE coefficients passed a maximum of one criterion. Likeness and Similarity coefficients of PDPs, as well as K-S and Kuiper test D- and V-values passed two of the criteria. Cross-correlation of PDPs passed all three.

As hypothesis tests of derivation from a common source, individual K-S and Kuiper p-values too frequently reject the null hypothesis that samples come from a common source. However, mean p-values calculated by bootstrap subsampling and comparison of sample data sets yield a binary discrimination of identical versus different source populations. Cross-correlation and Likeness of PDPs, and Cross-correlation of KDEs yield the widest divergence in coefficients and thus a consistent discrimination between identical and different source populations, with Cross-correlation of PDPs requiring the smallest sample size. In light of this, we recommend standard acquisition of large (n > 300) detrital geochronology data sets and repeated subsampling for robust quantitative comparison using Likeness, Cross-correlation, or K-S p-values.