IN53B-1839
Data Processing Workflows to Support Reproducible Data-driven Research in Hydrology

Friday, 18 December 2015
Poster Hall (Moscone South)
Jonathan L Goodall1, Bakinam Essawy1, Hao Xu2, Arcot Rajasekar3 and Reagan Wentworth Moore3, (1)University of Virginia Main Campus, Charlottesville, VA, United States, (2)University of North Carolina at Chapel Hill, Data Intensive CyberEnvironments Center, Chapel Hill, NC, United States, (3)University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
Abstract:
Geoscience analyses often require the use of existing data sets that are large, heterogeneous, and maintained by different organizations. A particular challenge in creating reproducible analyses using these data sets is automating the workflows required to transform raw datasets into model specific input files and finally into publication ready visualizations. Data grids, such as the Integrated Rule-Oriented Data System (iRODS), are architectures that allow scientists to access and share large data sets that are geographically distributed on the Internet, but appear to the scientist as a single file management system. The DataNet Federation Consortium (DFC) project is built on iRODS and aims to demonstrate data and computational interoperability across scientific communities. This paper leverages iRODS and the DFC to demonstrate how hydrological modeling workflows can be encapsulated as workflows using the iRODS concept of Workflow Structured Objects (WSO). An example use case is presented for automating hydrologic model post-processing routines that demonstrates how WSOs can be created and used within the DFC to automate the creation of data visualizations from large model output collections. By co-locating the workflow used to create the visualization with the data collection, the use case demonstrates how data grid technology aids in reuse, reproducibility, and sharing of workflows within scientific communities.