Feature extraction from scientific datasets using Apache Spark

Paral, Jan

IN51A-1795
Feature extraction from scientific datasets using Apache Spark

Friday, 18 December 2015

Poster Hall (Moscone South)

Jan Paral, National Center for Atmospheric Research, Boulder, CO, United States and Michael James Wiltberger, National Center for Atmospheric Research, High Altitude Observatory, Boulder, CO, United States

Abstract:

We present an example of feature extraction from scientific datasets such as global numerical models using Apache Spark. The algorithm uses a simple penalized linear regression technique and a training dataset to learn and extract a similar feature from the rest of the data. Thanks to Apache Spark, algorithm can scale to a large number of computing nodes.

IN51A-1795 Feature extraction from scientific datasets using Apache Spark

Abstract:

IN51A-1795
Feature extraction from scientific datasets using Apache Spark