Two or more things that I learn at the “Preserving Your Preservation Tools” workshop

Challenges of Dumping/Imaging old IDE Disks
These have been two busy days in Den Haag where Carl Wilson from the OPF tries to show us how to use tools in order to have clean environments and well-behave installation procedures that will "always" work.

 
The use of vagrant (connected to the appropriate provider, in our experimental case VirtualBox) allows to start from a genuine box and to experiment installation procedures. Everything being scripted or, better said, automatically provisioned allows for repeated tries until we reach an exact clean and complete installation. The important fact is that, once this goal is attained, its sharing is easy by just publishing the steps in a code repository. 
 
The second day was real experiments. We begin by looking at how that has been done for jpylyzer, the indispensable tool to validate JPEG2000 files created by Johan van der Knijff from the Nationale bibliotheek van Nederland, which hosted the event, with the traditional dutch welcoming.
Then we begin to look at the old but precious Jhove tool, from Gary McGath, which recently has migrated to GitHub and is actively been transformed to use a maven build process thanks to the efforts of Andrew Jackson and Will Palmer. A first (not so quick but dirty) debian package was obtained at the end of the session providing an automatic installation of this tool for Linux boxes that will take care of installing the script that hides the infamous java idiomatics and of providing a default configuration file so that when you can launch the simple jhove -v just after install it, its works !!!
 
One other thing that attracts my attention was the use of vagrant as a simple way of making sure that every developper works against the same environment so that there is no misconfiguration. In case of need for other tools, an automatic provision can be established and distributed around. Of course, the same process can be applied in production, making sure the deployment is as smooth as possible.
 
So now it appears that it becomes easy to have a base (or reference) environment and the exact list of extra dependencies that allow for a given program to run. From a preservation perspective, this is quite enlightening and is very closed to the work made by the premis group on describing environments. We then can think about transforming the provision script into a premis environment description so that we will have not only an operational way of having emulation but also a standard description of it. The base environment could be collected in a repository and rightly described. The extra steps to make a program revival could then be embeded in the AIP of the program or the datas we try to preserve.
 
Incidentally, at the same time we were working on these virtual environments, Microsoft announces it releases the source code of MS-DOS 2.2. This makes me wondering if we could rebuild a msdos box from scratch and uses it as a base reference environment for all this "old" (some thirty years ago only) programs.
 
All in all, these 2 days went so quickly we just have time for a dutch break along the Plein ; but those were fruitful in giving us the aim to come with more easy to use and better documented tools  that we can rely to build great preservation repositories.

 

Leave a Reply

Join the conversation