Integrating Parallel and Distributed Data Mining Algorithms into the NASA Earth Exchange (NEX)

Friday, 19 December 2014
Nikunj Oza1, Vipin Kumar2, Ramakrishna R Nemani3, Shyam Boriah4, Kamalika Das1,5, Ankush Khandelwal2, Bryan Matthews1,6, Andrew Michaelis7, Varun Mithal2, Guruprasad Nayak2 and Petr Votava8, (1)NASA - Ames Research Center, Mountain View, CA, United States, (2)University of Minnesota Twin Cities, Minneapolis, MN, United States, (3)NASA Ames Research Center, Moffett Field, CA, United States, (4)University of Minnesota, Minneapolis, MN, United States, (5)University of California Santa Cruz, UARC, Santa Cruz, CA, United States, (6)Stinger Ghaffarian Technologies Greenbelt, Greenbelt, MD, United States, (7)University Corporation at Monterey Bay, Seaside, CA, United States, (8)California State University Monterey Bay, Seaside, CA, United States
There is an urgent need in global climate change science for efficient model and/or data analysis algorithms that can be deployed in distributed and parallel environments because of the proliferation of large and heterogeneous data sets. Members of our team from NASA Ames Research Center and the University of Minnesota have been developing new distributed data mining algorithms and developing distributed versions of algorithms originally developed to run on a single machine. We are integrating these algorithms together with the Terrestrial Observation and Prediction System (TOPS), an ecological nowcasting and forecasting system, on the NASA Earth Exchange (NEX). We are also developing a framework under which data mining algorithm developers can make their algorithms available for use by scientists in our system, model developers can set up their models to run within our system and make their results available, and data source providers can make their data available, all with as little effort as possible. We demonstrate the substantial time savings and new results that can be derived through this framework by demonstrating an improvement to the Burned Area (BA) data product on a global scale. Our improvement was derived through development and implementation on NEX of a novel spatiotemporal time series change detection algorithm which will also be presented.