IN21B-3704:
Analyzing a 35-Year Hourly Data Record: Why So Difficult?

Tuesday, 16 December 2014
Christopher Lynnes, NASA Goddard Space Flight Center, Greenbelt, MD, United States
Abstract:
At the Goddard Distributed Active Archive Center, we have recently added a 35-Year record of output data from the North American Land Assimilation System (NLDAS) to the Giovanni web-based analysis and visualization tool. Giovanni (Geospatial Interactive Online Visualization ANd aNalysis Infrastructure) offers a variety of data summarization and visualization to users that operate at the data center, obviating the need for users to download and read the data themselves for exploratory data analysis. However, the NLDAS data has proven surprisingly resistant to application of the summarization algorithms. Algorithms that were perfectly happy analyzing 15 years of daily satellite data encountered limitations both at the algorithm and system level for 35 years of hourly data. Failures arose, sometimes unexpectedly, from command line overflows, memory overflows, internal buffer overflows, and time-outs, among others. These serve as an early warning sign for the problems likely to be encountered by the general user community as they try to scale up to “Big Data” analytics. Indeed, it is likely that more users will seek to perform remote web-based analysis precisely to avoid the issues, or the need to reprogram around them. We will discuss approaches to mitigating the limitations and the implications for data systems serving the user communities that try to scale up their current techniques to analyze Big Data.