A big data geospatial analytics platform - Physical Analytics Integrated Repository and Services (PAIRS)

Friday, 18 December 2015: 14:55
2020 (Moscone West)
Fernando Jimenez Marianno, Levente Klein, Conrad Albrecht, Marcus Freitag, Nigel Hinds and Hendrik Hamann, IBM Yorktown Heights, Yorktown Heights, NY, United States
A big data geospatial analytics platform:

Physical Analytics Information Repository and Services (PAIRS)

Fernando Marianno, Levente Klein, Siyuan Lu, Conrad Albrecht, Marcus Freitag, Nigel Hinds, Hendrik Hamann

IBM TJ Watson Research Center, Yorktown Heights, NY 10598

A major challenge in leveraging big geospatial data sets is the ability to quickly integrate multiple data sources into physical and statistical models and be run these models in real time. A geospatial data platform called Physical Analytics Information and Services (PAIRS) is developed on top of open source hardware and software stack to manage Terabyte of data. A new data interpolation and re gridding is implemented where any geospatial data layers can be associated with a set of global grid where the grid resolutions is doubling for consecutive layers. Each pixel on the PAIRS grid have an index that is a combination of locations and time stamp. The indexing allow quick access to data sets that are part of a global data layers and allowing to retrieve only the data of interest. PAIRS takes advantages of parallel processing framework (Hadoop) in a cloud environment to digest, curate, and analyze the data sets while being very robust and stable. The data is stored on a distributed no-SQL database (Hbase) across multiple server, data upload and retrieval is parallelized where the original analytics task is broken up is smaller areas/volume, analyzed independently, and then reassembled for the original geographical area.

The differentiating aspect of PAIRS is the ability to accelerate model development across large geographical regions and spatial resolution ranging from 0.1 m up to hundreds of kilometer. System performance is benchmarked on real time automated data ingestion and retrieval of Modis and Landsat data layers. The data layers are curated for sensor error, verified for correctness, and analyzed statistically to detect local anomalies. Multi-layer query enable PAIRS to filter different data layers based on specific conditions (e.g analyze flooding risk of a property based on topography, soil ability to hold water, and forecasted precipitation) or retrieve information about locations that share similar weather and vegetation patterns during extreme weather events like heat wave.