IN21D-03:
SQL is Dead; Long-live SQL: Relational Database Technology in Science Contexts
Abstract:
Relational databases are often perceived as a poor fit in science contexts: Rigid schemas, poor support for complex analytics, unpredictable performance, significant maintenance and tuning requirements --- these idiosyncrasies often make databases unattractive in science contexts characterized by heterogeneous data sources, complex analysis tasks, rapidly changing requirements, and limited IT budgets.In this talk, I'll argue that although the value proposition of typical relational database systems are weak in science, the core ideas that power relational databases have become incredibly prolific in open source science software, and are emerging as a universal abstraction for both big data and small data. In addition, I'll talk about two open source systems we are building to "jailbreak" the core technology of relational databases and adapt them for use in science. The first is SQLShare, a Database-as-a-Service system supporting collaborative data analysis and exchange by reducing database use to an Upload-Query-Share workflow with no installation, schema design, or configuration required. The second is Myria, a service that supports much larger scale data, complex analytics, and supports multiple back end systems. Finally, I'll describe some of the ways our collaborators in oceanography, astronomy, biology, fisheries science, and more are using these systems to replace script-based workflows for reasons of performance, flexibility, and convenience.