The Anatomy and Physiology of Data Science

Wednesday, 17 December 2014
Peter Arthur Fox, Rensselaer Polytechnic Inst., Troy, NY, United States
Whether the science (especially geosciences) community at-large likes it or not, the co-opting of the term Data Science by the private sector has led to increased hype over data science as a career and as a means to solve challenging data problems, and lack of educational innovation in curricula for data science. If the full benefits of a new generation of statistical and analytical software tools that operate on high-performance computational infrastructure are to be attained, adequate attention to the 'science of data science' is needed. In this contribution, we present a science view of data science both from an education and research perspective. We also will introduce a research agenda that explores the key challenges that must be met to meet the needs of research driven by large-scale data analytics. We focus on three, as-yet untapped, data science topics: understanding scale in systems, spare systems, and abductive reasoning. We conclude with a specific call to action to make progress on the aforementioned topics.