IN21C-3718:
Archiving Software Systems: Approaches to Preserve Computational Capabilities

Tuesday, 16 December 2014
Todd A King, University of California Los Angeles, EPSS, Los Angeles, CA, United States
Abstract:
A great deal of effort is made to preserve scientific data. Not only because data is knowledge, but it is often costly to acquire and is sometimes collected under unique circumstances. Another part of the science enterprise is the development of software to process and analyze the data. Developed software is also a large investment and worthy of preservation. However, the long term preservation of software presents some challenges. Software often requires a specific technology stack to operate. This can include software, operating systems and hardware dependencies. One past approach to preserve computational capabilities is to maintain ancient hardware long past its typical viability. On an archive horizon of 100 years, this is not feasible. Another approach to preserve computational capabilities is to archive source code. While this can preserve details of the implementation and algorithms, it may not be possible to reproduce the technology stack needed to compile and run the resulting applications. This future forward dilemma has a solution. Technology used to create clouds and process big data can also be used to archive and preserve computational capabilities. We explore how basic hardware, virtual machines, containers and appropriate metadata can be used to preserve computational capabilities and to archive functional software systems. In conjunction with data archives, this provides scientist with both the data and capability to reproduce the processing and analysis used to generate past scientific results.