Enabling Web-Based Analysis of CUAHSI HIS Hydrologic Data Using R and Web Processing Services

Thursday, 17 December 2015
Poster Hall (Moscone South)
Jiri Kadlec, Brigham Young University, Provo, UT, United States
The CUAHSI Hydrologic Information System (CUAHSI HIS) provides open access to a large number of hydrological time series observation and modeled data from many parts of the world. Several software tools have been designed to simplify searching and access to the CUAHSI HIS datasets. These software tools include: Desktop client software (HydroDesktop, HydroExcel), developer libraries (WaterML R Package, OWSLib, ulmo), and the new interactive search website, http://data.cuahsi.org. An issue with using the time series data from CUAHSI HIS for further analysis by hydrologists (for example for verification of hydrological and snowpack models) is the large heterogeneity of the time series data. The time series may be regular or irregular, contain missing data, have different time support, and be recorded in different units. R is a widely used computational environment for statistical analysis of time series and spatio-temporal data that can be used to assess fitness and perform scientific analyses on observation data. R includes the ability to record a data analysis in the form of a reusable script. The R script together with the input time series dataset can be shared with other users, making the analysis more reproducible. The major goal of this study is to examine the use of R as a Web Processing Service for transforming time series data from the CUAHSI HIS and sharing the results on the Internet within HydroShare. HydroShare is an online data repository and social network for sharing large hydrological data sets such as time series, raster datasets, and multi-dimensional data. It can be used as a permanent cloud storage space for saving the time series analysis results. We examine the issues associated with running R scripts online: including code validation, saving of outputs, reporting progress, and provenance management. An explicit goal is that the script which is run locally should produce exactly the same results as the script run on the Internet. Our design can be used as a model for other studies that need to run R scripts on the web.