Skip to content


More on data curation for repositories

Previously we have considered the case for data curation in KeepIt, and here is more grist for that mill. Dorothea Salo has posted her recent presentation at the Access 2009 Canadian library technology conference. You have a selection of formats:

  1. Video (with slides)
  2. Slide show (with commentary)

[slideshare id=2061191&doc=accessdatamanage-key-090924120942-phpapp01]

Note. This embedded slide show is the pre-conference version. Follow the link above for the full conference slides. The full version will be embedded here when we can get it working.

Unless you have 1 hour to spare I recommend 2, while dipping into 1 for flavour and colour. Yet you may want to spend some time on this, because the range of issues is large, and Salo makes it feel like time well spent because of the quality of the narrative and the slides. It’s an excellent if provocative presentation, even if you don’t agree with all of it (which I don’t).

To whet your appetite, there are lots of data examples, a good teaching example, a positive reference to Kultur, and of course strong opinions on the the place of IRs in all this.

The presentation effectively has a section on the problems and challenges in adapting IRs for data curation (my comments in brackets):

  • on making repositories less institutional (I think the opposite is the case)
  • on managing many different content types using IR software
  • on static and final content (on the content angle, for OA content IRs are secondary to publications, and that is proving to be problematic for content collection for IRs)
  • on the inadequacy of manual and ‘one file at a time’ deposit
  • on an Archive It! button
  • on creating more and better repository interfaces, improving the look-and-feel
  • on the limitations of key-value pair metadata; “use XML or RDF”
  • on APIs for data and data interactions, data relations
  • on content modelling, and sharing (and reusing) content models
  • on Fedora

There are some technical issues in achieving these requirements. Perhaps the demo examples in the EPrints 10 year review will answer some of these points.

Salo makes the case that all of these need to be resolved if IRs are to become platforms for data curation.

I think you will see from this presentation that Salo believes libraries and digital libraries are to be transformed. The future of IRs is left open, but the bar is set high for IRs to have a future. In a follow-up blog answering what libraries should do about the ‘laundry-list of data-curation challenges presented’, Salo includes:

“Do you have an institutional repository? Are you getting value out of it? Really? If not, have the courage to migrate the content and shut it down, re-assigning its manager to something more useful—data services, perchance.”

So why more of this in KeepIt? This approach places the onus of taking an institutional perspective on to the repository. Fundamentally, there is not much point in thinking about preservation of repositories unless those repositories have a view on what they will look like, and what role they will play within the institution, into that same future. Preservation is about planning for that future, for the whole repository, and not just the content it has now. After this, should it include data curation?

None of this prescribes any solutions. None of this says that what the KeepIt exemplar repositories are doing now is wrong. Just that the wider picture has, at least, to be considered. If Salo is correct, the future for institutional repositories is up for grabs.

Posted in Uncategorized.

Tagged with , , .

One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Dorothea Salo says

    Thank you, Steve; this is very flattering.

    Re: IRs and institutionality… I don’t have a problem with IRs focusing on the institution; I never have. It’s when the institutional focus becomes a straitjacket that I start to object, especially considering the multi-institutional nature of a good deal of research these days.

    I have enough trouble recruiting IR content. I should turn some down because I can’t prove an institutional tie? I don’t think so!

    There’s a lot more to be said about the constrained market position of IRs in the minds of library administrators, too. I honestly believe that cutting the strings on quite a few IRs would free up quite a few cycles for data work.

Some HTML is OK

or, reply to this post via trackback.