geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling

Thursday, 17 December 2015
Poster Hall (Moscone South)
Jessica Block1, Daniel Crawl2, John Graham1, Charles Cowart1, Amarnath Gupta1, Mai Nguyen1,2, Raymond de Callafon1, Larry Smarr1, Ilkay Altintas1 and WIFIRE Team, (1)University of California San Diego, La Jolla, CA, United States, (2)San Diego Supercomputer Center, La Jolla, CA, United States
The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE’s objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler’s ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable.

The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler’s Distributed Data Parallel (DDP) capability to provide a framework for scalable processing.

geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented.

In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.