Preservation of databases

The National Archives of the Netherlands have been looking at approaches to preservation of databases for some time, going back at least as far as the Digital Preservation Testbed project (2000-2003). 

More recently - in May 2010 - we prepared a case study on database preservation as part of the PLANETS project.  Prompted by recent discussions of the topic and participation in the recent Preservation of Complex Objects Symposium in London, we decided to carry out a review of that case study to update it with our most recent thinking and to consider how best to make further progress.

Our updated version of the review is attached to this post - it's a copy of the PLANETS case study with a commentary inserted and a couple of new sections.

Our conclusion is that while we have a reasonable range of tools and techniques available, we still don't have a good enough understanding of how the use of databases in government relates to archival records.  (It's those archival records that will be transferred to the archive for long term preservation). 

And if we don't have a clear picture of how the databases relate to the business processes and records management in the ministries, then we can't decide which preservation approach is the best one.  Neither can we decide whether we need to look after database contents, or the reports created by databases, or also the applications used to access the databases.

So our next step in this work is to go and talk to our contacts in the ministries of the Dutch government and between us come up with some answers.

We'd be interested to hear of other experiences in this area - post a comment here or send us a mail.

AttachmentSize
Database archiving review.pdf98.8 KB

Comments

Andy Jackson's picture

I've been trying to share some basic information and experiences with different tools in the OPF Knowledgebase wiki, e.g. Database Archiving & Migration Tools. I would personally really like it if documents like this one of yours provided summaries, while the gory details lived on the wiki. Do you think that might be a good idea? Or does that make little sense, e.g. because it's impossible to separate the tool information from the context you are evaluating it in?

Bill Roberts's picture

Hi Andy

Seems a good idea to me to compile information on experience with tools.  In this particular example, the document was written last year towards the end of PLANETS and its more of an overview kind of document rather than specific tests on particular tools - so in this particular case perhaps linking from your wiki page back to this document is the best approach.

However, next time we have some info on tool evaluation to publish, then the OPF wiki seems a good place to do it.  For example we've got some experiments on email coming up - I'll suggest to my colleagues that we post our evaluation on the OPF Knowledgebase.

Andy Jackson's picture

It's just a proposal, but it seemed to work okay for AQuA, the results from which I'd seek to fold into the knowledgebase area if folks like the idea. Any feedback on this, or preferred ways of aggregating tool experience over time, would be more than welcome.

Paul Wheatley's picture

Further to Andy's comment, the AQuA page that links Collections, preservation issues with those Colelctions, institutional Contexts to the Issues, and Solutions developed to meet the Issues may also be of interest. I'm doing some work to develop this structure further for use in the SCAPE Project, and this is something we'd like to open up more widely at some point in the future.

Euan Cochrane's picture

Hi Bill,

Databases are a fascinating preservation challenge. Just yesterday I had to help query a database from our holidings to fulfil an access request and it got me thinking about the problem again. 

The appraisal challenge is a particularly interesting component of this issue. As you highlight, knowing what to preserve from a database is a significant problem. I know of (at least) one agency that has decided to automatically export into text files, records of transactions in its business-system database rather than attempting to preserve the live data within the database. It was decided that this was sufficient for recordkeeping purposes in this case but in others it may be an inappropriate approach.

I wrote a report on Public Sector Dataset Preservation in New Zealand in 2009 that might be of interest. It is available here

Regards, 

Euan Cochrane

Bill Roberts's picture

Hi Euan

Thanks for that dataset report, it's very interesting.  We haven't carried out a proper survey of current practices and attitudes here, but my general impression of the situation in the Netherlands is very similar to the one you describe. ie there are lots of datasets and databases that may well turn out to be important archival records, but there is little awareness of the problem.  Datasets in current active use are probably very well managed.  Those that are not in active use are probably not well managed, but we don't really know!

I've been discussing this issue with colleagues here who have close connections in government departments, to try to gather some information on the scale of this issue and to start to raise awareness.  

Have there been any significant developments around datasets at Archives NZ since you wrote the report?

Bill

Jan Dalsten Sørensen's picture

 

Preservation of databases has been an important task for the Danish National Archives since 1973.  All of the approximately 3,600 AIPs in our collection are files exported from databases. Contents from both business systems and records management systems are transferred as relationel databases. One AIP represents one transfer from one database.

We have three main types of transfers:

a) where the content is from a specific period of time. E.g. data (and, if applicable, documents) from a records management system, where the usual archival period is 5 years. It could also be data from a business system from e.g. the tax authorities, where we get one transfer per year, after that year’s tax assesment is completed.

The transfer will include all information from the specified period.

b) a snapshot where the content of the database at a specific point in time is transferred. Snapshots are usually made every 5 years as long as the system is in operation. The transfer will include all information at that particular point. However, unless there is a log of database history, changes between transfers may not be preserved.

c) a final transfer of the content of a database once the agency in question no longer updates the database.

I would estimate that we preserve the content of approximately 20-25% of all the databases used by the government that we have identified. “Record-ness” of the content is not the main criterion for appraisal. Instead, we try to assess whether the content will serve as a useful source of information for future historians and other researchers.

The present format for databases has been in effect since September 2010 and it is based on a modified version of the SIARD-format. Together with the content of the database in SIARD-XML and, if applicable, all documents converted to preservation format (TIFF, JPEG-2000, MP3, MPEG-2 or MPEG-4) the transfer must include archival information (about the records’ creator and specific metadata about the content of the SIP) and context documentation. We have defined various categories of context documentation that covers both technical and administrative documents, as well as documents about the process of transfer and test.

The format is specified in an executive order which has been translated into English. It is available here:

http://www.sa.dk/media(3367,1033)/Executive_Order_on_Submission_Information_Packages.pdf

An – already slightly dated – history about digital preservation of databases at the National Archives in Denmark can be found in the publication from a symposium about the transfer, preservation of and access to digital records, held by the National Archives in 2008:

http://www.sa.dk/media(3009,1033)/Symposium_about_the_Transfer%2C_Preservation_of_and_Access_to_Digital_Records.pdf