IN53C-04
Exploring Relationships in Big Data

Friday, 18 December 2015: 14:25
2020 (Moscone West)
Ashish Mahabal1, S. George Djorgovski1, Daniel J Crichton2, Luca Cinquini2, Sean Kelly2, Maureen A Colbert3 and Heather Kincaid2, (1)California Institute of Technology, Pasadena, CA, United States, (2)NASA Jet Propulsion Laboratory, Pasadena, CA, United States, (3)Geisel School of Medicine, Dartmouth, NH, United States
Abstract:
Big Data are characterized by several different ‘V’s. Volume, Veracity, Volatility, Value and so on. For many datasets inflated Volumes through redundant features often make the data more noisy and difficult to extract Value out of. This is especially true if one is comparing/combining different datasets, and the metadata are diverse. We have been exploring ways to exploit such datasets through a variety of statistical machinery, and visualization. We show how we have applied it to time-series from large astronomical sky-surveys. This was done in the Virtual Observatory framework. More recently we have been doing similar work for a completely different domain viz. biology/cancer. The methodology reuse involves application to diverse datasets gathered through the various centers associated with the Early Detection Research Network (EDRN) for cancer, an initiative of the National Cancer Institute (NCI). Application to Geo datasets is a natural extension.