Related articles

Will the real lazy pig please scale up: quality assured large scale image migration

Authors: Martin Schaller, Sven Schlarb, and Kristin Dill

In the SCAPE Project, the memory institutions are working on practical application scenarios for the tools and solutions developed within the project. One of these application scenarios is the migration of a large image collection from one format to another.

SCAPE Training - Preserving Your Preservation Tools


Learning to Think Like a Package Maintainer

Lots of great digital preservation applications and services exist, however very few are actively maintained and thus preserved! This is a big problem! By introducing the steps to develop these and engage the support of the community, this training course looks at what can be done to improve this situation. Specifically, this training course looks at how to prepare packages for submission into the very heart of many digital environments; the operating system and directly associated “app-stores”. Attendees will be given hands-on experience with developing and maintaining packages rather than software and key differences will be discussed and evaluated. Better preservation of preservation tools, means better preservation our digital history.

Learning Outcomes (by the end of the training event the attendees will be able to):

  1. Understand the complexities of package management and distinguish between the different practices relating to both package objectives and chosen programming language. 
  2. Be able to carry out advanced package management operations in order to critically appraise current packages and propose changes. 
  3. Understand the importance of clearly defined versioning and licenses and the role of clear documentation and examples. 
  4. Apply best practice techniques in order to create a simple package suitable for long term maintenance. 
  5. Evaluate a number of options for managing package configuration and behavior relating to package installation, removal, upgrade and re-installation. 
  6. Analyse opportunities for automating package management and releases, maintaining a clear focus on the user and not the developer. 
  7. Critically evaluate opportunities to generalise package management to allow the easy building and maintenance of packages on multiple platforms.
  8. Assess the potential to apply package management techniques in your own environment. 

Delegates will receive a certificate of attendance for the training course.

The agenda can be seen here:

Registration is now open!

26 March 2014 to 27 March 2014
Event Types: 

SCAPE/OPF Continuous Integration update

As previously blogged about by Carl we now have virtually all SCAPE and OPF projects in Continuous Integration; building and unit testing in both Travis CI and Jenkins

  • Travis compiles the projects and executes unit tests whenever a new commit is pushed to Github, or when a pull request is submitted to the project. 
  • Jenkins builds are generally scheduled once per day.  After a build the software has its code quality analysed by Sonar

SCAPE Training Event - Future Formats First: Building Application Infrastructures for Action Services

This workshop is the second event in the SCAPE project training programme. It will focus on using tools and workflows to carry out digital preservation actions at scale. 

It will begin with an introduction to scalability and will present techniques to use a scalable platform with common preservation tools.on using tools and workflows to carry out digital preservation actions at scale.

By building on a real use case from the British Library, delegates will gain hands on experience in migrating a large volume of image files to the JPEG 2000 format, verifying each migration against the original file using tools including ImageMagik, jpylyzer and Matchbox.

Delegates will learn about building workflows to invoke multiple operations, and how to share and discover other workflows. By building a scalable environment using Hadoop and Taverna, delegates will then be able to execute their workflow at scale, performing multiple simultaneous migrations and verifications.


Learning Outcomes (by the end of the training event the attendees will be able to):

  1. Understand scalable platforms and evaluate the situations in which such environments are required.
  2. Apply knowledge of existing tools to solve migration and quality control problems.
  3. Combine and modify tool chains in order to create automated workflows for migration and quality control.
  4. Implement best practice for discovering and sharing workflows for use and re-use.
  5. Make use of a scalable environment and apply a number of workflows to automatically perform migration and quality assurance checks on a large number of objects.
  6. Identify a number of potential problems when working in a scalable environment and propose solutions.
  7. Understand the potential to use scalable platforms in digital preservation and synthesise new opportunities within your own environments.

Delegates will receive a certificate of attendance for the training course.

The draft agenda is available here: SCAPE Future Formats First Agenda

The event will be conducted in English.

Who should attend?
Practitioners (digital librarians and archivists, digital curators, repository managers, or anyone responsible for managing digital collections) with an interest in building digital preservation workflows using a variety of preservation tools, and then executing them at scale. To get the most out of this training course you will ideally have some knowledge or experience of digital preservation.

Developers who are interested in learning about digital preservation at scale.

Registration is now open at:

The cost for the two days is £90. Morning and afternoon coffee breaks and lunch will be provided and are included in the registration fee.

*Please ensure you bring your laptop with you so you can participate in the practical exercises.*

Registration will close on Friday 6 September

Further information
Please visit the event wiki page for details about how to get to the venue, where to stay and how to prepare for the event.

To find out more about the SCAPE project visit:


Photograph © The British Library Board

16 September 2013 to 17 September 2013
Event Types: 

ICC profiles and resolution in JP2: update on 2011 D-Lib paper

It's been more than two years now since I wrote my D-Lib paper JPEG 2000 for Long-term Preservation: JP2 as a Preservation Format. From time to time people ask me about the status of the issues that are mentioned in that paper, so here's a long overdue update.

Mixing Hadoop and Taverna

As part of our work on test-beds for the SCAPE project we have been investigating the various ways in which a large scale file format migration workflow could be implemented.  The underlying technologies chosen for the platform are Hadoop and Taverna.  One of the aims of the SCAPE project is to allow the automatic generation and execution of Taverna workflows, which will be executed via Hadoop.

The four methods for implementing a file format migration workflow that we tested were:

  1. Batch execution of a shell script (no parallelisation)
  2. A workflow written in/controlled from Java, run on Hadoop
  3. A workflow written in/controlled from Taverna, run on Hadoop
  4. A workflow written in Taverna, calling an XML defined unit of execution in Hadoop

Automated assessment of JP2 against a technical profile

I've already written a number of blog posts on format validation of JP2 files. Format validation is only a one aspect of a quality assessment workflow. Digitisation guidelines typically impose various constraints on the technical characteristics of preservation and access images. For example, they may state that a preservation master must be losslessly compressed, and that its progression order must be RPCL. A format profile is a set of such technical constraints.