A suite of R packages for web-enabled modeling and analysis of surface waters

Thursday, 18 December 2014: 4:15 PM
Jordan Stuart Read1, Luke A. Winslow2, Daniel Nüst3, Laura De Cicco1 and Jordan I Walker1, (1)USGS Center for Integrated Data Analytics, Middleton, WI, United States, (2)University of Wisconsin Madison, Madison, WI, United States, (3)52°North Initiative for Geospatial Open Source Software GmbH, Muenster, Germany
Researchers often create redundant methods for downloading, manipulating, and analyzing data from online resources. Moreover, the reproducibility of science can be hampered by complicated and voluminous data, lack of time for documentation and long-term maintenance of software, and fear of exposing programming skills. The combination of these factors can encourage unshared one-off programmatic solutions instead of openly provided reusable methods. Federal and academic researchers in the water resources and informatics domains have collaborated to address these issues. The result of this collaboration is a suite of modular R packages that can be used independently or as elements in reproducible analytical workflows. These documented and freely available R packages were designed to fill basic needs for the effective use of water data: the retrieval of time-series and spatial data from web resources (dataRetrieval, geoknife), performing quality assurance and quality control checks of these data with robust statistical methods (sensorQC), the creation of useful data derivatives (including physically- and biologically-relevant indices; GDopp, LakeMetabolizer), and the execution and evaluation of models (glmtools, rLakeAnalyzer). Here, we share details and recommendations for the collaborative coding process, and highlight the benefits of an open-source tool development pattern with a popular programming language in the water resources discipline (such as R). We provide examples of reproducible science driven by large volumes of web-available data using these tools, explore benefits of accessing packages as standardized web processing services (WPS) and present a working platform that allows domain experts to publish scientific algorithms in a service-oriented architecture (WPS4R). We assert that in the era of open data, tools that leverage these data should also be freely shared, transparent, and developed in an open innovation environment.