What would a data scientist do with 10 seconds on a supercomputer?

Tuesday, 16 December 2014
Douglas W Nychka, National Center for Atmospheric Research, Boulder, CO, United States
The statistical problems of large climate datasets, the flexibility of
high level data languages such as R, and the architectures of current
supercomputers have motivated a different paradigm for data analysis
problems that are amenable to being parallelized. Part of the
switch in thinking is to harness many cores for a short amount of time
to produce interactive-like exploratory data analysis for the
space-time data sets typically encountered in the geosciences.
As motivation we consider the near interactive analysis of
daily observed temperature and rainfall fields for North America over the
past 30 years. For certain kinds of analysis the potential is for
speedups on the order of a factor of a 1000 more and so changes
traditional work flows of statistical modeling and inference for large
geophysical datasets.