IN11A-3593:
Integrating a Collaborative Infrastructure with a Big Data Technology to Boost Data-analysis Productivity
Monday, 15 December 2014
Kwo-Sen Kuo1,2, Thomas Clune3, Rahul Ramachandran4, John Rushing5, Manil Maskey5, Gyorgy Fekete3,6, Amidu Oloso3,7, Amy Lin3,5 and Khoa Doan8, (1)NASA Goddard SFC, Greenbelt, MD, United States, (2)Bayesics, LLC, Bowie, MD, United States, (3)NASA Goddard Space Flight Center, Greenbelt, MD, United States, (4)NASA Marshall Space Flight Center, Huntsville, AL, United States, (5)University of Alabama in Huntsville, Huntsville, AL, United States, (6)Computer Science Corporation, Lanham, MD, United States, (7)Science Systems and Applications, Inc., Lanham, MD, United States, (8)University of Maryland College Park, College Park, MD, United States
Abstract:
Because the Earth is a “system of systems”, i.e., it is composed of many strongly coupled and complicated subsystems, Earth scientists must frequently collaborate beyond their individual focus areas. As a result, Earth Science data analysis typifies the four V’s of Big Data: 1) We need to analyze a great “variety” of data from all subsystems; 2) We need to analyze greater “volumes” of data because, to improve understanding, we are resorting to ever finer resolutions in both observations and model simulations; 3) We need to analyze these data with greater “velocity,” for climate change has imposed urgency and the current processes are taking too long to digest these data; 4) We need to analyze these data with “veracity” lest we contribute more to confusion than solution. However, current data-analysis practices exhibit numerous inefficiencies and are poorly suited to many interdisciplinary collaborations. We demonstrate, using a realistic science scenario, a prototype system that integrates a collaborative infrastructure with Big Data techniques to address these inefficiencies. This system has great potential to provide higher return on investment and boost significantly science data analysis productivity. We will discuss its implications for the (near) future of scientific data analysis.