7 Feb outputs: keynote presentations

Workshop photo

To kick-off our workshop day, after a brief opening presentation by Matt Shreeve (click here, PDF, 400KB), we had a series of three mini-keynote presentations:

Some of the key points arising from these were:

  • Recommendations for software preservation are not as well known as those for data preservation. There is however some advice on using file types for which it is easy to build readers for (tiff/txt) and the need to ensure the representation of data is not lost, so we can make sense of even the simplest data standards (eg dictionaries and character sets for text files).
  • Curators must be careful when preserving software to minimise degradation of functionality or accuracy. This has been a problem even where a systematic approach to data migration was taken – problems with how graphics were rendered often arose migration approaches (though only with particular aspects of the graphics, rather than the entire thing).
  • Software preservation should help us understand the limitations of the user – rather than the mistakes they have made
  • Software is not easy to define let alone preserve. Workflows for example will have individual elements of software and processes which are brought together as a batch that is greater than the sum of its parts.
    • For example, within Geographic Information Systems (GIS) a mapping service is actually produced from underlying raw data and is not really an entity in its own right. Is it more important to sustain the software used to create the map, or the data that makes it available? The map is unique and only useful for one point, but it is difficult to separate data from processes.
    • For example with video and sound, what parts are signal and signal processing and what parts are data? Software has a whole technology stack and never stands alone – you can’t really preserve an operating system or create an emulator when software is a service and many components are disposable.
  • There are different approaches to software preservation listed on project website – technical preservation, emulation, migration, cultivation, hibernation, depreciation, and procrastination. Procrastination is never a really good option!
  • The issues in software preservation are not just technical (formidable as they may be) and include tricky activities such as managing digital rights, and justifying cost-benefit trade-offs.
  • Useful elements to help users approach this systematically are often significant properties of the software, key functionality for example.
  • The Software Sustainability Institute is creating a national facility for research software, encouraging the improvement of software design and architecture, and embedding maturity models into software training as a case for good practice in software engineering.
  • There are multiple ways to preserve data. The STFC uses a cultivation strategy to ensure that certain key software tools like ICAT are maintained. They have an active developer community that share the source code and keep it alive and documented.
  • The basic preservation steps for software are: preserve, retrieve, reconstruct and replay. These may sound self explanatory but are actually quite complex. For retrieval, in addition to knowledge of general software architecture and licensing data, there is a need for explicit information on the software’s functionality. With reconstruction there is a need for understanding the dependencies and components, details on program language and the libraries required to ensure the correct output. Replay will also need sufficient documentation and might be used as a benchmark to assess the success of the preservation method. In order to do this, there will need to be enough test cases to ensure accuracy.

We also had some curator-developer role play, with a couple of key points arising:

  • If you build a house in a jungle and then leave it, it will be overrun with the flora and fauna of the environment and eventually destroyed. However if you live in that house, you are committed to its continual up-keep and the house is maintained.
  • If software is required and there is a market for it, then it will always be maintained. If there is no market and no users that rely on it then it is more likely that the software will quickly become unusable. There are some exception to this, namely with open source software where there is a close community of users. Ironically, receiving many bug reports is a good thing, it means your code is being used and the more robust it will be in the long term.