High-resolution climate data over conterminous US using random forest algorithm

Friday, 19 December 2014
Hirofumi Hashimoto, California State University Monterey Bay, Seaside, CA, United States, Ramakrishna R Nemani, NASA Ames Research Center, Moffett Field, CA, United States and Weile Wang, CSUMB & NASA/AMES, Seaside, CA, United States
We developed a new methodology to create high-resolution precipitation data using the random forest algorithm. We have used two approaches: physical downscaling from GCM data using a regional climate model, and interpolation from ground observation data. Physical downscaling method can be applied only for a small region because it is computationally expensive and complex to deploy. On the other hand, interpolation schemes from ground observations do not consider physical processes. In this study, we utilized the random forest algorithm to integrate atmospheric reanalysis data, satellite data, topography data, and ground observation data. First we considered situations where precipitation is same across the domain, largely dominated by storm like systems. We then picked several points to train random forest algorithm. The random forest algorithm estimates out-of-bag errors spatially, and produces the relative importance of each of the input variable.
This methodology has the following advantages. (1) The methodology can ingest any spatial dataset to improve downscaling. Even non-precipitation datasets can be ingested such as satellite cloud cover data, radar reflectivity image, or modeled convective available potential energy. (2) The methodology is purely statistical so that physical assumptions are not required. Meanwhile, most of interpolation schemes assume empirical relationship between precipitation and elevation for orographic precipitation. (3) Low quality value in ingested data does not cause critical bias in the results because of the ensemble feature of random forest. Therefore, users do not need to pay a special attention to quality control of input data compared to other interpolation methodologies. (4) Same methodology can be applied to produce other high-resolution climate datasets, such as wind and cloud cover. Those variables are usually hard to be interpolated by conventional algorithms. In conclusion, the proposed methodology can produce reasonable high-resolution data and has the potential for improving the results using other datasets including new satellite or reanalysis data.