H41C-0818:
Evaluation of Outlier Detection and Modification Methods Used in Flood Frequency Analysis
Abstract:
Accurate and reliable information on flood frequency and magnitude is vital for the design of hydraulic structures and for floodplain mapping. The time-series data used for flood frequency analysis are obtained from both gauged systematic records and also historical floods. These time-series data sets, however, may include outliers. Outliers are defined as values that are much larger or smaller than most of the values in a given data set. Therefore, by definition, outliers include both very high and very low values. However, the main focus of this study is only on high outliers (hereafter referred to as simply outliers). The availability of a wide-array of tests and techniques makes the outlier detection and modification procedure anything but straightforward. The set of tests and techniques that is selected to detect and modify these outliers can have a direct impact on flood frequency and magnitude estimates.Many different approaches, based on the principles of hypothesis testing, are available for the detection of outliers. Several such methods are examined in detail in this study by using data from Southeast Europe. Outliers detected using these approaches can be eliminated, modified or disregarded (i.e., by treating them as non-outliers) prior to being used in flood frequency analyses. Eliminating or disregarding the outliers typically requires additional gauge- or event-specific information in order to be fully implemented. In the absence of such information, the practice of using modified outliers in flood frequency analysis provides the best compromise. Nevertheless, the effects of all three options on flood frequency and magnitude estimates are investigated to provide a more meaningful comparison. Flood magnitude-frequency estimates based on the log Pearson type 3 distribution are used in this study to evaluate the effect of outlier modification. The study findings have important implications for estimation of flood frequency and magnitude with time-series data sets that have one or more outliers.