Jay Gattuso’s blog

Question: Who is/isn’t retaining technical provenance notes?

If you are, what are you retaining and why?

If not, why not?

There is more to come from us on this topic – but for now I’d love hear any opinions / thoughts.

And what do I mean by technical provenance?

Good question. I mean any filename sanitation, or QA changes to (meta)data, or any file structure moves, or normalisation data or details of any technical process that has touched the original bitstream as it was found (at rest, if applicable) on its source medium.

Exploring the impact of Flipped Bits.

Following a few interesting conversations recently, I got interested in the idea of ‘bit flip’ – an occasion where a single binary bit changes state from a 0 to a 1 or from a 1 to a 0 inside a file.

I wrote a very inefficient script that sequentially flipped every bit in jpeg file, saved the new bitstream as a jpeg, attempted to render it in the [im] python library, and if successful, to calculate an RMSe error value for the new file.

Investigating PRONOM EOF patterns and DROID ‘Fast’ Scanning

Following on from the interesting discussion in my last post about the jpeg signatures, I undertook some quick testing on the impact of using / not using the EOF sections of a DROID signature file.

I previously posted this signature file here: http://dl.dropbox.com/u/59534857/DROID_SignatureFile_V55%20-%20no%20EOF….

can we talk about fmt/42, fmt/43 and fmt/44?

In a relatively recent signature update, the fmt/44 signature was updated to in allow some data after the stated EOF marker (ff d9).

In the case that started this off, a number of fmt/44 jpg files were found that had a couple of bytes after what DROID looks for as an absolute EOF.

I had a look into the specs for jpg, trying to unravel this story – were the extra bytes useful to someone? were we missing something by ignoring these bytes?

FMT 7,8,9,10

I wonder if many people have had a chance to dig around in the latest signature release for DROID?

There is an interesting format change that I think warrants some discussion.

As I understand it, historically tif has been a problem for DROID based ID. Pronom has records for TIFF versions 3, 4, 5, and 6, but the signatures all matched the same hex strings, meaning that the best DROID can do is to return a multiple match to the corresponding PUIDs.