Skip to content


Data deletion: it happens

Data deletion happens within institutions, but the institutional repository can help prevent it.

The previous post considered the prospect of single-click deletion of data, i.e. inadvertent or unexpected sudden loss of data. From a formal digital preservation perspective this is a rather simplistic example, but as Brian Kelly recently noted, “Why you never should leave it to the University”, it happens.

The case concerns a business researcher in Sweden who lost 10 years worth of data apparently when an institutional Web site was redesigned. Brian asks if this could happen elsewhere, say in the UK, and of course, it could. It is likely that we at least get close to similar data losses all the time.

The reason that personal data collections managed on institutional sites are at risk in this way is users fail to recognise a crucial distinction, between a site that has systems management from one that has data management. There are many more people supporting systems and IT infrastructure in institutions than manage data. To secure data against loss in this way requires designation: a data management approach backed by policy. Ed Pinsent provides an institutional Web archiving example.

A systems manager seeks to build a framework to enable users to create and look after data as efficiently as possible. What happens to that data over time is typically the responsibility of the creator. In maintaining the efficiency and security of the system, the systems manager’s priority is the system, and sometimes changes will be necessary that could endanger data.

In my experience, working within a computer science department that likes to stay ahead of the field, there are regular upgrades to the systems infrastructure. When changes affect data or require data to be moved, the data creator is given the chance to manage the transition, to avoid any unintended loss. This might not always be straightforward. I have authorised data movement to new infrastructure machines, but I have also had to take data offline where a proprietary server application could no longer be supported cost-effectively.

Making the necessary decision and provision for data is not always painless in such circumstances for data creators with little time or experience in data management, so the easiest decision is often deferral. Such inaction will usually be followed by a deadline and a warning from the systems manager, that the system change takes priority over data loss if the creator does not act.

This is where the institutional repository has a vital role to play. It should be designated not as a system but as a managed data environment, where data is the priority and expert data managers can work with creators to support data throughout its lifecycle. IRs are not the only such sites that can perform this role within institutions, as the archiving example above shows, but it is an opportunity for IRs to forge a needed role and identity that users can understand.

Without designation as a data management site, data can be at risk of loss, just as the researcher in Sweden discovered.

Posted in Uncategorized.

Tagged with .

3 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. andrew gray says

    i suppose data management site sounds better than institutional repository!

    but as you say its getting users to recognise distinctions and still for a lot of people, words/terminology such as data, systems etc just sounds like something that IT will deal with. I think one of the challenges is reaching our users with a language that engages them

  2. Vicky McCargar says

    I can’t resist posting one of the all-time great data deletions, the “Functional Requirements for Evidence in Recordkeeping” at the University of Pittsburgh, an early digital records management and preservation project that was accidentally destroyed in 2001. Thanks to the Internet Archive, the site was resurrected some years later, but it wasn’t what you’d call intentional preservation.

Continuing the Discussion

  1. Data repositories: the next new wave – Diary of a Repository Preservation Project linked to this post on September 23, 2009

    […] are already many people supporting systems and IT infrastructure in institutions; there are fewer people designated to manage data and support data creators. We can already see in our exemplar repositories the types of data that […]

Some HTML is OK

or, reply to this post via trackback.