As part of our work on test-beds for the SCAPE project we have been investigating the various ways in which a large scale file format migration workflow could be implemented. The underlying technologies chosen for the platform are Hadoop and Taverna. One of the aims of the SCAPE project is to allow the automatic generation and execution of Taverna workflows, which will be executed via Hadoop.
The four methods for implementing a file format migration workflow that we tested were:
I've already written a number of blog posts on format validation of JP2 files. Format validation is only a one aspect of a quality assessment workflow. Digitisation guidelines typically impose various constraints on the technical characteristics of preservation and access images. For example, they may state that a preservation master must be losslessly compressed, and that its progression order must be RPCL. A format profile is a set of such technical constraints.
This will be my shortest blog post ever. Following up on my previous blog post on a prototype JP2 validator and properties extractor (jpylyzer), there is now a comprehensive User Manual of the tool. Just follow the link below and get the PDF file that is listed under 'Download Packages':
A few months ago I wrote a blog post on a simple JP2 file structure checker. This led to some interesting online discussions on JP2 validation. Some people asked me about the feasibility of expanding the tool to a full-fledged JP2 validator. Despite some initial reservations, I eventually decided to dedicate a couple of weeks to writing a rough prototype.
Over the last few weeks I've been working on the design of a workflow that the KB is planning to use for the migration of a collection of (mostly old) TIFF images to JP2. One major risk of such a migration is that hardware failures during the migration process may result in corrupted images. For instance, one could imagine a brief network or power interruption that occurs while an image is being written to disk. In that case data may be missing from the written file.