Do we need an architecture for digital preservation? If so, what might it cover? Perhaps content types (e.g. data, teaching materials), management types (e.g. repositories), institutions, infrastructure (networks, storage) and services. Something more concrete than OAIS (references below), for example, which is after all a reference model, not an implementation.
Working on the Preserv projects and now KeepIt, whenever it might have seemed possible to map a preservation architecture for institutional repositories, things have suddenly changed. There’s one advantage already of a reference model over an implementation. For example, we had a powerful new Sun Honeycomb storage server for Preserv, and not long after Sun announced it was being withdrawn from the market. Think again. Now we have cloud storage. How long until the fatal flaw in that strategy emerges, if it hasn’t already, although colleagues are working on a promising way of applying the Honeycomb principles in an institutionally controlled cloud (see these slides, but note this blog comment on the live presentation: “extraordinary presentation … The slides don’t give the flavour; you just had to be there”).
A preservation architecture is a nice idea in principle, for example being able to say: here is a repository responsible for content (a) which it can predict and plan, it can assess the risk and manage this content using tools (b) and outsource some technical functions to service (c), keeping copies of the content at locations (d) and (e) for access and archiving. Such a prescription, if it were possible, might appeal to repository managers, and might act as a template for repository preservation. In turn, it would become easier to plan and develop the sort of services on which such an architecture depends.
Admittedly, it all sounds a bit Soviet-style and rests on the assumption that once framed nothing will change this architecture, the antithesis of the digital environment where everything changes all the time. Would we want it any other way?
I was prompted to revisit these architectural thoughts by a curious story about a joint project – on Collection Development, Acquisitions, Preservation – between the university libraries of Cornell and Columbia. Actually I wasn’t that curious until this part: “Cornell and Columbia assert that the project—not a merger—could be …” Hold it there. It hadn’t seemed remotely like a merger story, but now you mention it, there is an interesting idea here. Are digital libraries enhanced by mergers? Certainly there is a scale problem with digital preservation that might be helped by a joint approach.
I don’t know to what extent digital academic libraries will be transformed by the open Web, that is, to become sources of content from the institution rather than acquirers of content from elsewhere. That’s my institutional repository viewpoint again. In the IR scenario, of the titular functions from the story above, preservation remains a concern.
You see, what they have done in this project by collaborating and suggesting a merger, is they have altered another of the factors in our nascent architecture: this time it’s the institutions and the infrastructure that have changed, rather than the storage service.
How would OAIS handle this? OAIS allows us to model a preservation architecture and build an implementation. If we are rigorous, we are encouraged to become ‘OAIS-compliant’, that is, we don’t simply treat OAIS as helpful advice, but we fulfil all the necessary requirements. This is important because one day we may want to be seen as a ‘trusted repository’, and that is likely to be measured against OAIS-related criteria. If we take library 1 and library 2, both of which are OAIS-compliant, and put them together as at Cornell and Columbia, is the result OAIS-compliant, or are we breaking compliance? Probably, initially at least until a full analysis has been performed on the new organisational framework. If we add in services, what needs to be OAIS-compliant? Without practice and experience, we don’t know. When it comes to institutional repositories we don’t have this experience, and I suspect not many others have it either, likely excepting the established preservation centres such as national libraries, and their experience may not be directly applicable.
There are certain places that digital preservation should begin, but these should not be the same for everyone. A manager of an institutional repository, for example, need not be a preservation specialist. At the moment, however, we seem to be expecting everyone to begin with OAIS (formal, intermediate (tutorial) or less formal), formats, etc. Instead there should be tools, services, interfaces, and perhaps a preservation architecture, that embed specialist knowledge and practice, and provide a better starting point for non-specialists. We have to be bold to move forward from the abstract model.
Steve, I’m having a little trouble understanding the distinction between architecture and reference model here. But I have to remind myself that OAIS is a model and not a design, so presumably the architecture is in some sense a design, informed by OAIS but also by local circumstances: existing repositories, services, hardware facilities, operating system expertise, etc. So is an architecture effectively a local issue?
Chris, I’m struggling with the same distinction, as you had spotted. ‘Architecture’ may be the wrong term here. If it suggests resolution to a local issue then perhaps it is wrong. What I’m vaguely seeking is a generic DP starting point, e.g. for institutional repositories, both to inform the IRs and potential service providers, and to differentiate this starting point from that for other types of repository or service, those which might specialise in DP, say. Is that described as an implementation, an architecture or a model? If we are not careful we will likely end up careering between all three. At the moment for DP newcomers I fear we are stuck at the model, whatever type of repository we are considering.