The Open Planets Foundation (OPF) addresses core digital preservation challenges by engaging with its members and the community to develop practical and sustainable tools and services to ensure long-term access to digital content.
The DROID software tool is developed by The National Archives (UK) to perform automated batch identification of file formats by assigning Pronom Unique Identifiers (PUIDs) to files. The tool uses so called signature files which are curated and continuously being published by The National Archives (UK) and contain information from the PRONOM technical registry. I am here presenting some considerations for using the tool on the Hadoop platform together with a performance evaluation of the job execution on a Hadoop cluster using the publicly available Govdocs1 corpus as data set.
Last year (2012) the KB released a report on the suitability of the EPUB format for archival preservation. A substantial number of EPUB-related developments have happened since then, and as a result some of the report's findings and conclusions have become outdated.
Before Easter we planned to do a correctness benchmark for Audio Migration QA, specifically targeting the new tool xcorrSound waveform-compare, see http://openplanetsfoundation.org/blogs/2012-07-09-xcorrsound-waveform-compare-new-audio-quality-assurance-tool.
We have been evaluating the use of the latest Fedora Commons, version 3.6.2, as a test repository. Having followed the straightforward installation process we were left with a repository with one preconfigured user – fedoraAdmin.
There are two APIs – API-A for access and API-M for management. For our test instance API-A was configured on installation to require a log in, but it can be configured to require no log in. It appeared that whilst the REST API for API-A was restricted, the SOAP API for API-A was not, this was corrected by using the example policy, below. Investigations of how to configure multiple users are also detailed.