PA53A-2234
Interoperability Outlook in the Big Data Future
Friday, 18 December 2015
Poster Hall (Moscone South)
Kwo-Sen Kuo, Earth System Science Interdisciplinary Center, COLLEGE PARK, MD, United States and Rahul Ramachandran, NASA Marshall Space Flight Center, Huntsville, AL, United States
Abstract:
The establishment of distributed active archive centers (DAACs) as data warehouses and the standardization of file format by NASA’s Earth Observing System Data Information System (EOSDIS) had doubtlessly propelled interoperability of NASA Earth science data to unprecedented heights in the 1990s. However, we obviously still feel wanting two decades later. We believe the inadequate interoperability we experience is a result of the the current practice that data are first packaged into files before distribution and only the metadata of these files are cataloged into databases and become searchable. Data therefore cannot be efficiently filtered. Any extensive study thus requires downloading large volumes of data files to a local system for processing and analysis.The need to download data not only creates duplication and inefficiency but also further impedes interoperability, because the analysis has to be performed locally by individual researchers in individual institutions. Each institution or researcher often has its/his/her own preference in the choice of data management practice as well as programming languages. Analysis results (derived data) so produced are thus subject to the differences of these practices, which later form formidable barriers to interoperability. A number of Big Data technologies are currently being examined and tested to address Big Earth Data issues. These technologies share one common characteristics: exploiting compute and storage affinity to more efficiently analyze large volumes and great varieties of data. Distributed active “archive” centers are likely to evolve into distributed active “analysis” centers, which not only archive data but also provide analysis service right where the data reside. “Analysis” will become the more visible function of these centers. It is thus reasonable to expect interoperability to improve because analysis, in addition to data, becomes more centralized. Within a “distributed active analysis center” interoperability is almost guaranteed because data, analysis, and results all can be readily shared and reused. Effectively, with the establishment of “distributed active analysis centers”, interoperation turns from a many-to-many problem into a less complicated few-to-few problem and becomes easier to solve.