System Learning via Exploratory Data Analysis: Seeing Both the Forest and the Trees

Wednesday, 17 December 2014
Linda Habash Krause, NASA Marshall Space Flght Ctr, Huntsville, AL, United States
As the amount of observational Earth and Space Science data grows, so does the need for learning and employing data analysis techniques that can extract meaningful information from those data. Space-based and ground-based data sources from all over the world are used to inform Earth and Space environment models. However, with such a large amount of data comes a need to organize those data in a way such that trends within the data are easily discernible. This can be tricky due to the interaction between physical processes that lead to partial correlation of variables or multiple interacting sources of causality. With the suite of Exploratory Data Analysis (EDA) data mining codes available at MSFC, we have the capability to analyze large, complex data sets and quantitatively identify fundamentally independent effects from consequential or derived effects. We have used these techniques to examine the accuracy of ionospheric climate models with respect to trends in ionospheric parameters and space weather effects. In particular, these codes have been used to 1) Provide summary "at-a-glance" surveys of large data sets through categorization and/or evolution over time to identify trends, distribution shapes, and outliers, 2) Discern the underlying “latent” variables which share common sources of causality, and 3) Establish a new set of basis vectors by computing Empirical Orthogonal Functions (EOFs) which represent the maximum amount of variance for each principal component. Some of these techniques are easily implemented in the classroom using standard MATLAB functions, some of the more advanced applications require the statistical toolbox, and applications to unique situations require more sophisiticated levels of programming. This paper will present an overview of the range of tools available and how they might be used for a variety of time series Earth and Space Science data sets. Examples of feature recognition from both 1D and 2D (e.g. imagery) time series data sets will be presented.