Skip to content

Categories:

NECTAR and the KeepIt project – reflections

NECTAR logo“From a starting point of little knowledge and still less understanding, the KeepIt project has demonstrated that digital preservation need not be scary. The secret is to take small steps; to set preservation planning within the bigger institutional context; and to engage stakeholders.”

Near the start of the KeepIt project, Steve Hitchcock, our project manager, asked the exemplar repository managers to come up with preservation objectives for each of their repositories. NECTAR’s objectives were four-fold:

  • Objective 1: to define the preservation needs of all file types and formats held in NECTAR (now and in the foreseeable future).
  • Objective 2: to have procedures and tools to support the preservation needs identified in objective 1.
  • Objective 3: to have documentation to inform and support NECTAR stakeholders.
  • Objective 4: repository staff and others with collection management responsibility for the IR to receive training and ongoing support.

Reflecting now on what we have done in the KeepIt project –  what we have learned and the benefits we have derived from the project – it is clear that as far as NECTAR is concerned, we have gone a long way toward achieving our goals.

Early discussions between Steve and the project team identified several areas of common interest. This list, combined with a liberal helping of Steve’s expertise, led to the development of the KeepIt course. Others have written extensively about the course in this blog, so I’ll restrict myself now to the impact which it has had on myself, NECTAR and The University of Northampton.

Module 1, on organisational issues, introduced us to the Data Asset Framework (DAF) and the AIDA toolkit. Both of these appealed as tools for auditing different aspects of the institution’s state of readiness for preservation. At Northampton, we decided to have a go with the DAF tool and I have blogged already on this. In the NECTAR team we have always prided ourselves on supplying what is most wanted and most benefits our research community and we felt that the DAF methodology would not only provide us with the evidence we would need to produce an appropriate preservation plan, it would also highlight gaps in current service provision and opportunities for improving the university’s research environment. This proved to be the case.

“A major report from our DAF project is shortly to be presented to the university’s Research Committee and nine recommendations will be made.”

Module 2, which dealt with the costs of preservation, covered a further two models: Keeping Research Data Safe (KRDS2) and LIFE3. With Neil Beagrie we looked at the potential benefits of keeping research data in a repository (both direct and indirect; near term and long term; private and public). In theory, these are indisputable: time saved; transparency; opportunities for re-use and re-purposing of data; increased visibility and citation… and so on. But in practice there is great resistance among researchers to the idea of putting data into a repository. Our DAF project at Northampton was clear on this. Research datasets contain sensitive information; they represent a lot of effort on the part of researchers; they aren’t designed for communal use – all of these deter researchers from providing open access to their research data.

My colleague, Philip Thornborow, our Collections and Learning Resources Manager, was particularly interested in the LIFE3 model. In Philip’s words (just after attending module 2):

We have a lecturer who has located some content in another library that he believes it would be beneficial to digitise. Neither the lecturer nor my university have much experience in estimating the full economic cost of such a project. During the demonstration of LIFE at the [KeepIt] training programme, it became immediately obvious to me that this tool was just what we need. The experience of the British Library and the other case studies they have used provides real data which can be used in the model. As we worked through the model it also became obvious that all the assumptions are clearly labelled, so we could vary the values in line with our local situation and gain an idea of whether our proposed bid was viable.

We are looking forward to testing LIFE, and in particular are interested in testing Brian Hole’s (LIFE3 project manager) comment that it can be used in reverse, so to speak. In other words, if JISC gave us £x what would we be able to achieve. I am severely averse to wheel reinvention, so any tool that can allow us to stand on the shoulders of giants, as the man in the BL courtyard said, gets my vote.

The LIFE3 tool wasn’t at that stage quite ready for public use, so Philip wasn’t able to take advantage of it then. Events have now moved on and his immediate need for the tool has passed. However, should he be required to provide costs for a future digitisation project, he remains interested in using LIFE3.

With a change of venue (from Southampton to London) came a change of emphasis in Module 3 of the KeepIt course. We started to look at file formats and to consider the significant characteristics of digital objects. Based on the PRONOM list of format risks (ubiquity; support; stability, lossiness etc), Steve asked us to evaluate pairs of file formats (for example, MS Word vs PDF) and to score a point for the winner in each risk category.  We then had to come up with a reason why we might not choose the higher scoring file format for the repository. The subsequent discussion was fascinating since it clearly demonstrated that not all risks are equal, and that there is no consensus, even among the better informed, on the best file formats to preserve. So perhaps the repository manager can be forgiven for sometimes being uncertain and a little confused?

For the rest of the day we looked at signifcant properties with Stephen Grace and at preservation metadata and provenance with Steve Hitchcock. In the exercises that accompanied these sessions we were once again forced to think hard about the real issues in preservation and how the theory could be put into practice.

“The combination of EPrints and PLATO tools for the first time puts real preservation action within the grasp of repository managers.”

Module 4, the only two day module, promised to be the one where we got down to the nitty gritty of practising preservation. Covering the new EPrints storage and preservation apps, and their interaction with the PLATO preservation planning tool, we had two days to learn how to develop a specific preservation plan for our own repositories. The combination of EPrints and PLATO tools for the first time puts real preservation action within the grasp of inexpert repository managers. In a test environment we were able to conduct risk analysis within the repository itself, create and test preservation plans based on our own requirements, and import those plans back into EPrints.

Since module 4, the good folk from EPrints Services have upgraded NECTAR to version 3.2.4 of EPrints and Dave Tarrant has installed the preservation plugins. We now know that nearly all of our files are versions of PDF and MS Word formats, with a sprinkling of XML, plain text and others. None are currently deemed to be at risk. We will now need to monitor this, to be sure that NECTAR content does not fall into a high risk category. At that stage we will find out whether the training provided in module 4 translates into effective action in the repository.

“We now have the knowledge and confidence to implement other preservation tools as needed.”

The fifth and final module considered aspects of trust in data management. We looked at tools for assessing trustworthy repositories, specifically TRAC and DRAMBORA. A couple of the course attendees (LSE and UAL) have since adopted DRAMBORA to good effect, and it certainly seems to do the business with regard to identifying, assessing, managing and mitigating risks to the repository. I can see that the use of DRAMBORA would provide more than just a risk assessment for the repository, it would produce considerable evidence to inform other institutional policies and procedures. The drawback is that we’re told it takes approximately 40 hours of work to complete the tool. Perhaps, like the two institutions above, here at Northampton we can find a way to use the tool more selectively.

Of course the KeepIt course hasn’t provided the only learning opportunity in the project.  The regular meetings between Steve, Dave and the other exemplars have given us the chance to share experiences and to learn from each other. The repository community as a whole has always been mutually supportive – a culture fostered by the huge amount of funding and support from the JISC – so the good relationships between project members and course participants comes as no surprise.

“There is a risk that new knowledge is vested in one person. By involving colleagues I believe that we have broadened both interest in digital preservation and the impact of the project.”

So to return to our progress against the NECTAR objectives…

  • Objective 1: to define the preservation needs of all file types and formats held in NECTAR (now and in the foreseeable future).
    We have identified all file formats in NECTAR and established that none are currently at risk. We have undertaken a university-wide research data project using the DAF methodology. This has not only highlighted the data management practices of researchers, but also supplied us with valuable information about new file types that might be deposited in NECTAR in the future.
  • Objective 2: to have procedures and tools to support the preservation needs identified in objective 1.
    NECTAR has been upgraded to version 3.2.4 and the EPrints preservation plugins has been installed. We now have the knowledge and confidence to implement other preservation tools as needed.
  • Objective 3: to have documentation to inform and support NECTAR stakeholders.
    A major report from the DAF project is shortly to be presented to the university’s Research Committee and nine recommendations will be made. These include the creation of a data management policy for the university, clarification of the university’s position on data ownership, implementation of a programme of training on data management, and provision of online advice and guidance. It is hoped that this will raise the profile of records management in the institution and therefore have a benefit well beyond the relatively narrow field of research data.
  • Objective 4: repository staff and others with collection management responsibility for the IR to receive training and ongoing support.
    The University of Northampton was fortunate in that not only was I, as repository manager, able to attend all five modules of the KeepIt course, but also I was able to bring along colleagues with a professional interest in each area. Thus our Collections and Learning Resources manager attended modules 1 and 2; our metadata specialist participated in module 3 and our NECTAR technical specialist attended module 4. There is a risk in a project such as this that new knowledge is vested in the one person with commitment to the project. By involving colleagues in the course I believe that we have broadened both interest in digital preservation and the impact of the project.

Posted in Uncategorized.

Tagged with , , , , .


KeepIt course: revision, conclusion and evaluation

KeepIt course module 5, Northampton, 30 March 2010
Tags Find out more about the full KeepIt course
Presentations and tutorial exercises course 5 (source files)

Using copies of selected slides from the course, the following revision session summarises which tools were used and what we did with them, including a reminder of the practical exercises. Each module and tool is covered using this same basic structure.

[slideshare id=3665042&doc=keepit-course5-revision-100408064656-phpapp02]

Our final presentation from the KeepIt course sets out the criteria and context for evaluating the course. Participants attending this last course module were handed evaluation forms after this presentation and allowed quiet time before we enjoyed end-of-course food and drink. The results of this evaluation were summarised in a presentation at the European Conference on Digital Archiving (April 2010).

[slideshare id=3665193&doc=keepit-course5-conclusion-100408071350-phpapp01]

As the course ends, what can we say about the status of preservation for digital repositories? First we have to recognise that this might depend on which type of repository, or which type of software, is being considered. By Googling a series of questions for each type of repository, but not including the term ‘preservation’ in the query, we looked for highly ranked statements by repositories that made reference to preservation.

Based on this simple analysis, we end the course with the following thought: preservation is important for repositories, but needs to be connected with the other roles and activities of a repository if it is to find clear articulation and obtain proper resources.

Many thanks to all our course presenters: Sarah Jones, Harry Gibbs, Ed Pinsent, Neil Beagrie, Brian Hole, Stephen Grace, Gareth Knight, Andreas Rauber, Hannes Kulovits, Dave Tarrant, Adam Field, and Martin Donnelly.

Thanks too to all the participants who joined the course and stuck with it, and those who have followed the course on this blog. You deserve a reward – treat yourself.

That’s it; the end.

Posted in Uncategorized.

Tagged with .


KeepIt Course 5: DRAMBORA: Risk and Trust and Data Management

KeepIt course module 5, Northampton, 30 March 2010
Tools this module: TRAC, DRAMBORA
Tags Find out more about: this module KeepIt course 5, the full KeepIt course
Presentations and tutorial exercises course 5 (source files)

“The Digital Repository Audit Method Based On Risk Assessment (DRAMBORA) was developed by the Digital Curation Centre (DCC) and DigitalPreservationEurope (DPE) to assist repository management and staff to identify, assess, manage, and mitigate risks.”

Our expert leader for this session was Martin Donnelly from DCC, who joined us at just a couple of days notice, flying from Edinburgh and hopping into a hire car to reach Northampton before opening the presentation, all in a morning.

Here Martin shows how to apply the DRAMBORA methodology with a risk management exercise, and how to use the online interactive version. The presentation concludes with a view of how DRAMBORA complements DAF, effectively bringing the KeepIt course full circle, because DAF was the first tool we encountered back in course 1. First, we consider the relation between risk and trust.

[slideshare id=3664701&doc=keepit5-dramboranorthampton-given-100408053617-phpapp01]

I first came across DRAMBORA following its launch announcement and I registered for a guest login to try it. My initial impression was of a complex and awkward tool, and that wasn’t just the login process. It seemed to be engineered for perfect archiving scenarios, and expectations of comprehensive use seemed unlikely to be realised by typical repositories, given the estimated 28-40 hours to complete a full audit. It probably didn’t help that I wasn’t anticipating or trying to fulfil a real task when exploring the new tool.

I next encountered DRAMBORA in one of the DCC’s 101 lite courses (like this one). It wasn’t referred to as DRAMBORA, but I recognised the fingerprints. We were given a small group exercise to assess a single risk. Suddenly it became clearer. It is possible to apply the elements of DRAMBORA and build the bigger picture as needed. In other words, fit the tool to the task, not the other way around. DRAMBORA is well structured for this, and the benefits of this approach have been realised for the institutional repository of the LSE.

What these risk scenarios and exercises reveal is the degree of team building necessary to manage risk associated with digital content produced across large institutions.

DRAMBORA method

DRAMBORA method

The DRAMBORA method is based on discrete phases of (self-)assessment (see illustration left). In comparison with TRAC, we saw that self-assessment is a key feature that differentiates DRAMBORA. This method requires institutional and contextual profiles, and a detailed understanding of a repository’s activities and ‘assets’, an approach that recalls the work done earlier in the course with AIDA. At this point it becomes possible to identify the risks that might impact the repository. These risks then need to be itemised and assessed individually. The intended outcome is a a strategy for managing the risks

So, what is risk, and how is it handled by DRAMBORA?

  • Definition: risks describe challenges or threats that impede the achievement of repository objectives, obstruct activities, and prejudice the continued availability of essential assets.
  • In DRAMBORA, risks have several attributes: probability, impact, severity (a derived value, p*i), owner(s), and management strategies.

What does a risk look like, and how can we elaborate one? A large set of sample risks is provided in the Appendix to the original paper-based DRAMBORA methodology (registration required).

Slides 42-43 tabulate the anatomy of a risk. To try this out we have an exercise (slides 45-47), designed to take 1h. First, identify one risk (based on your own experiences wherever possible), and complete the DRAMBORA worksheet (.rtf format). In part 2, you will identify what steps your repository might take to manage and mitigate the identified risk over time. Or you can learn from one user’s experience of trying this exercise.

As an alternative to the paper-based process, try DRAMBORA Interactive, which guides users smoothly through the audit process, with built in features for reporting. The presentation provides a step-by-step guide to DRAMBORA Interactive (slides 52-90), or treat this as a revision for the first half of the presentation.

One of the more intriguing features of the Interactive version is snapshot view (slides 88-89). This records the state of a repository at a given moment in time, and can be compared with other audit snapshots to track changes.

DRAMBORA has interesting connections with DAF, the Data Asset Framework. Both were developed at the DCC and are “self-management tools to assess the effectiveness of approach to data management or preservation”. Where DAF emphasises the researcher-data axis, DRAMBORA is more repository-process focussed (slide 94).

Coming soon is the data management tool, a combination of DAF, AIDA, LIFE and DRAMBORA. The KeepIt course is indebted to them all for framing the course from start to finish, and for providing a rich set of tools for repository managers to tackle the challenge of digital preservation.

DRAMBORA is the final tool to be covered in this KeepIt course. All that remains to complete the course is revision and evaluation.

Posted in Uncategorized.

Tagged with , , .


KeepIt course 5: Tools for Assessing Trustworthy Repositories

KeepIt course module 5, Northampton, 30 March 2010
Tools this module: TRAC, DRAMBORA
Tags Find out more about: this module KeepIt course 5, the full KeepIt course
Presentations and tutorial exercises course 5 (source files)

Our primary tools in KeepIt course 5 are TRAC and DRAMBORA. We’ve seen passing references to these tools in earlier modules: in course 1 (slides 2-3), course 2 (or maybe not, slide 5; yes we did, slide 6), and course 4 (slide 7). This gives a sense of how integral these tools are to a structured approach to digital preservation. In this module we’ll find out why, beginning here with TRAC, Trustworthy Repositories Audit and Certification.

[slideshare id=3664564&doc=keepit-course5-trust-100408050615-phpapp02]

TRAC was designed by committee (slide 8). Its approach is not unique, and there are similar tools such as the nestor Criteria Catalogue. Since these tools have not fully aligned none stands yet as a full global standard for measuring trust of digital repositories.

As the TRAC name suggests, there are two parts to the process: audit, that is, assessing conformance against a set of criteria; and certification, demonstrating conformance to an independent agent.

As with the PREMIS preservation metadata dictionary covered in course 3, TRAC exhibits a clear and logical structure that quickly becomes apparent on inspection, both in the structure of the 84 individual entities (slide 13) and the structure of the full checklist (slide 14), which is designed to assess three primary areas within TRAC:

  1. Organizational Infrastructure
  2. Digital Object Management
  3. Technologies, Technical Infrastructure, Security

To get a feel for this structure and apply the approach to some TRAC entries, we devised a short exercise that anyone can try. Using the procedure shown in slide 15, decide whether you can certify your repository, or not, for some given entries. To avoid self-selecting entries designed to show your repository in a glowing light, download and use the randomisable spreadsheet which lists all the TRAC entries in order. Notice that column A consists of randomly generated numbers, so if you sort the spreadsheet on this column it will randomise the entries. Before you do this, decide how many entries you wish to apply in the exercise and pre-select this many numbers, from 1-84. These will be the line numbers corresponding to the randomised entries you will use in the exercise.

Not all repositories will need to be formally certified for trustworthiness, which can be an intense and detailed procedure, as illustrated by one of the few published case studies, in this case a major e-journal archive, Portico (slides 18-21). Credit to Portico for revealing this example, which shows an exhaustive process such as TRAC will raise issues even among the best prepared and managed archives today.

Closer to home, one of our KeepIt exemplar repositories, eCrystals, has applied TRAC and reported its initial recommendations:

  • TRAC is open-ended and exploratory, and therefore more suited to repositories with an established long-term archival and preservation mandate.
  • At the current stage of development of the eCrystals data repository we recommend self-assessment using the DRAMBORA toolkit as an instrument.
  • The audit process in many ways is more important than actual certification, since it allows repositories to analyse and respond to their archives’ strengths and weaknesses in a systematic fashion.

We can test these recommendations immediately, because the next session will involve an extensive exercise with DRAMBORA.

Posted in Uncategorized.

Tagged with , , , .


KeepIt course 5: Trust

KeepIt course module 5, Northampton, 30 March 2010
Tools this module: TRAC, DRAMBORA
Tags Find out more about: this module KeepIt course 5, the full KeepIt course
Presentations and tutorial exercises course 5 (source files)

“Say what you do. Do what you say. Show that you do what you say.

“All of this leads to trust.”

Colati and Shreeves, Digital Repository Management Uncovered, Webwise 2010

We have reached the final lap of our recap of the KeepIt course on Digital Preservation Tools for Repository Managers. The aim of this recap is to pull together the different resources employed for the course to make it suitable for independent study and use.

As usual, this module introduces readily-available tools through presentations and practicals, and our primary presenter is a one of the developers of the main tool we shall be covering.

[slideshare id=3664501&doc=keepit-course5-intro-100408044938-phpapp01]

The topic for this session is trust, and in the context of the course this is last by design, but certainly not least. Colati and Shreeves reveal very succinctly why trust matters.

Most people want to be trusted, but trust takes time to establish. The same applies for repositories, and there is a cost attached. For new repositories, therefore, trust can be an over-emphasised virtue, at least initially. But with maturity comes responsibility, and trust becomes a feature a repository will wish to measure and demonstrate, especially in the context of digital preservation. Any repository manager who has completed and applied this course will want to demonstrate the results and benefits – they will want to be seen to be trusted.

Trust is also a two-way issue. A repository may want to be trusted by its users; it also want to be able to trust the tools and services it uses.

KeepIt course 5 covers two tools to manage and measure trust, and its counterpart, risk. TRAC, Trusted Repository Audit and Certification, is a checklist of criteria for assessing the degree of trust attained by a repository, and is introduced with a short group exercise. Our main tool in this module, DRAMBORA, is covered more extensively, again, with practical work to the fore.

This module, and the course, completes with a full-course revision session.

Posted in Uncategorized.

Tagged with , , .


EdShare – Lessons from KeepIt

EdShare logo“Participation in the KeepIt Project has provided us with an excellent practical grasp of the realities of preservation and brought an apparently enormous and daunting area into the realms of approachable and practical possibilities.”

We have developed an approach which will enable others to make a start and to develop their own routes through preservation in their own ways.

EdShare, as the institutional learning and teaching repository of the University of Southampton, has grown from an engagement process across the institution, intended both to support educational excellence and a cultural change in approach within the institution. Our developmental work in the early stages of the JISC-funded EdSpace Project had its roots in early work for elearning led by senior academics and change agents across the University.

At the beginning of the EdSpace Project, we were already in a position to draw on processes similar both to the Data Asset Framework (DAF) approach (an outcome of the School exemplars we had identified and engaged with for our HEAcademy elearning benchmarking work in 2007/08) and the philosophy of Assessing Institutional Digital Assets (AIDA), itself similar in approach to the eMM toolkit approach we had utilised in our work for the whole elearning benchmarking activity.

This work provided additional support for working with both of these tools in module 1 of the innovative and highly successful KeepIt Course on Digital Preservation Tools for Repository Managers. This course has constituted a really significant aspect of the work of this project.

Module 2 covered the Keeping Repository Data Safe (KRDS) method and I blogged back in February providing an educational resources perspective on this.

Module 3 was a Primer on Preservation Workflow, Formats and Characterisation – the most technical of the training sessions we had covered so far. Steve Hitchcock’s introductory session made the day both accessible, engaging and interesting. In this technical area I could begin fully to appreciate both the complexity and scale of the task ahead, as well as the benefit of identifying strong technical specialists available in a large, research intensive university. Indeed, one of the great advantages of working in such a large institution is the relatively easy access we have here to advice (at the very least, and sometimes a great deal more!) from technical experts at the forefront of their fields.

“One solution is the provision of integrated services and tools linked to specific repository systems. When EdShare upgrades to version 3.2 of EPrints we will be in a position to take advantage of the newly-developed EPrints preservation apps”

Developing a good awareness and understanding of the scope of the file formats, characterisation and associated preservation issues is a key aspect of the repository manager’s role, but the most cost-effective and sustainable support for these areas must lie with the technical experts and software developers. Where possible, managers should develop good relationships with these people in their own organisations. For smaller institutions, or communities of subject-based educational repositories, an alternative solution may lie in collaborating on sourcing technical advice and support as well as benefiting from the future provision of integrated services and tools linked to specific repository systems. Thus, when EdShare upgrades to version 3.2 of EPrints later in September 2010, we will be in a position to take advantage of the newly-developed EPrints preservation apps to support our ongoing work in this area. In this way, good practice will be accessible to more people and a wider group in the community will be able to benefit.

“multiple format packages providing visual, audio and written presentations – these are not necessarily the everyday, simple learning resource filetypes we had originally anticipated hosting”

The significant characteristics work for module 3 was also a highly relevant element for EdShare: The motivation for many educators in using a specific software application or package when creating learning resources is precisely to take advantage of the functionality offered by that specific resource e.g. Camtasia, Adobe Captivate or Adobe Presenter. Such applications produce sophisticated, multiple format packages specifically to provide clear visual, audio and written presentations for a specific audience – they are not necessarily the everyday, simple learning resource filetypes we had originally anticipated hosting when we launched EdShare.

EdShare handles a wide array of complex file format types, even within single items

EdShare handles a wide array of complex file format types, even within single items

Nor are these resources straightforward filetype subjects for preservation actions. In terms of offering distinctive approaches to preservation, EdShare’s case was that for reasons of the diversity, richness and variability of filetypes involved, educational repositories (still very much the “overlooked” form of repository compared to research repositories) we provided an excellent exemplar for KeepIt.

In addition, we were interested to explore other aspects of preservation of concern to everyday educators – are they aware of preservation as an issue at all? If so, at what stage in the cycle of repository content, ingest and curation should these issues be raised? Are there concerns for preservation in other areas of the repository landscape: are institutional managers/legal services/senior policy makers concerned about preservation for other reasons?

“The modules neither required nor assumed sophisticated, technical knowledge nor experience. It would be quite unrealistic, particularly in the present climate of financial constraint and staff reductions, to increase the existing and preservational contribution of the manager’s responsibilities”

Module 4 looked at an aspect of these questions by offering an opportunity to develop a preservation plan for the repository. This, in turn, led to the identification of two additional pieces of work for EdShare:

  1. To investigate the typical or atypical nature of the filetype content in EdShare itself. (I report on this work and its findings in my next blog post.)
  2. To support the upgrade of EdShare’s underlying software to version 3.2 of EPrints; thereby offering the integration of the new EPrints preservation tools.

As an early example of an institutional educational repository, which has been developed according to the principles of co-design and collaboration with the local user community, a couple of areas of work which EdShare would have been interested in exploring are:

  1. Policy development for preservation, particularly, for educational repositories
  2. Identification of the specific role of the repository manager, educational developers, teachers and other stakeholders (established and emergent) in the preservation process.

“By the end of the KeepIt course I had achieved a much higher level of confidence with respect to preservation”

Without a doubt, the KeepIt course modules provide an excellent awareness raising and priming opportunity for the educational repository manager. The modules neither required nor assumed sophisticated, technical knowledge nor experience. Considerable emphasis was given to the advantages of adopting a team approach whereby either local or host technical expertise is drawn upon to provide the appropriate levels and extent of technical expertise required for reliable, preservation processes. In practical terms, this was enormously welcome: providing support for the educational repository is only one role in an increasing range of roles I have as a librarian in the University. It would be quite unrealistic, particularly in the present climate of financial constraint and staff reductions, to increase the existing and preservational contribution of the manager’s responsibilities with a more technical (or any other) role. There are sufficient aspects in which managers can contribute to preservation work in repositories without having to emphasise the technical element of this work.

By the end of the KeepIt course I had achieved a much higher level of confidence with respect to preservation – reassured that the technical capabilities of our software would be enormously enhanced with a software upgrade; confident that the affordances of technology could then support the additional aspirations we were developing in terms of supporting an improved awareness of preservation on the part of teachers and policy makers across the University. Aware that much of what we were already doing as a repository supported an approach which promoted preservation practices and respected the expectations of our content producers that their resources would be cared for and available over the long term for them to access, share and re-mix.

A JISC-sponsored report by Emmerson – Retention of Learning Materials: A Survey of Institutional Policies and Practice – had been an early resource we had drawn on in the EdSpace Project – reflecting both the trend across UK HE of low-awareness levels on the part of institutions to develop and enforce policies and practices, as well as alerting institutions to the potential wastefulness and vulnerability they exposed themselves to in continuing in this way. This area of work has given rise to a stronger focus on one of the additional areas of work for EdShare as identified above: to explore and understand the specific institutional concerns of the University of Southampton in the preservation of resources for learning and teaching. Indeed, this work will align very well with significant ongoing work to develop the “Southampton Learning Environment” – a framework for supporting, delivering and enhancing learning and teaching across the whole University community. We will need to address policy, practice as well as technical and storage implications for this work as we develop more robust institutional arrangements for this important area of work.

To summarise the impact of KeepIt on EdShare:

What we did

  1. Worked as an exemplar among a community of repository managers and explored common areas of interest;
  2. Worked as part of the JISC/HE Academy OER Project Programme – EdShare is developing instantiations to host the OER Collections of HumBox and SWAPBox

What we learned

  1. Digital preservation can be simplified by integrating tools and services with existing digital repository interfaces;
  2. Preservation is the responsibility of content creators, teachers and others contributing to the educational process and not just the repository manager;
  3. All the small actions contribute to the process, not just the large scale, ambitious and expensive ones

What others can learn

  1. Having in place reporting processes for identification of filetypes in a repository is a good starting point for preservation activities. It seems that many existing educational repositories do not have these processes in place.
  2. Maintaining an awareness of the rapid developments in this field will enable others (not necessarily EPrints users nor experts in preservation) to take advantage of the developments made in the KeepIt project and by the emerging community of practice around preservation work.

What we are doing next

  1. Upgrading EdShare to version 3.2 of the EPrints software puts us in a good position to take advantage of the work done by technical experts to provide apps which support automated preservation activities.
  2. From this work, we will implement policies and processes to support decisions on format types for preservation actions.
  3. Preservation of learning resources in an institution is only one aspect of the work required for the delivery of retention policies and practices required in a 21st Century University. We will work to understand more about this area and to put our own institution in a good position to support both its aspirations and obligations.

Posted in Uncategorized.

Tagged with , , , .


Connecting Plato with digital repository interfaces: #ipres2010 twitterstream

ipres10-logoThis is a copy of the live Twitter record (edited to remove duplicates and retweets) for the presentation:

Dave Tarrant, Connecting preservation planning and Plato with digital repository interfaces, iPres 2010, Vienna, Tuesday, September 21, 2010 (Day 2) Session 5a Preservation Planning and Evaluation (10:30-11:50)

See also the EPrints record for this presentation including the full paper on which this presentation is based and the slides.

“impressive piece of work – excellent talk too”

Plato polar bear logotimgollins #Ipres2010 #5a – @davetaz about to open the session with “Connecting preservation planning and Plato with digital repository interfaces”

mariekeguy #ipres2010 #5a @davetaz (David Tarrant) talking about Connecting preservation planning and Plato with digital repository interfaces

cardcc @timgollins well you won’t have time for tweeting if @davetaz sticks with his usual style! #Ipres2010 #5a

euanc #ipres2010: Session 5a- Preservation planning and evaluation starts. no tweets from @davetaz for the next while

timgollins #Ipres2010 #5a – @davetaz – Biggest Challenge – using and controlling the plethora of tools 2 Retweets

mariekeguy #ipres2010 #5a Tarrant: Lots of repository tools out there but repository managers unsure how to use then, need workflow

timgollins #Ipres2010 #5a – @davetaz Bit preservation is the start, then Identification, Characterisation, …. Point is useability 2 Retweets

timgollins #Ipres2010 #5a – @davetaz – gives tweet quote but I cat type fast enough !!

euanc #ipres2010 #5a @davetaz is telling us to tweet things 140 characters long and then not giving time to type them out, typical!

mariekeguy #ipres2010 #5a Tarrant: Need to lower the barriers to use of repository tools – make it fit 140 characters 😉

timgollins #Ipres2010 #5a – @davetaz Talking about e-prints – already at Torrent speed and speeding up 🙂

timgollins #Ipres2010 #5a – @davetaz Droid profile of a publication repository – thanks for the Plug Dave 🙂

mariekeguy Talks faster than me! RT @timgollins: #Ipres2010 #5a – @davetaz Talking about e-prints – already at Torrent speed and speeding up 🙂 1 Retweet

euanc #ipres2010 #5a @davetaz speaking too fast for my slow brain this morning. impressed by @mariekeguy & @timgollins tweet speet

timgollins #Ipres2010 #5a – @davetaz E-prints – already missed the point about Characterization using JHOVE through Plato

“Plato action plan XML read by Eprints for ‘action’. V.cool.”

euanc #ipres2010 @davetaz : only get factual information in registries, not institutionally subjective information

timgollins #Ipres2010 #5a – @davetaz Now on Risk assessment – risk is local – do the risk analysis – central registries nly have global risk factors

euanc #ipres2010 we need a tool that identifies intellectual entities that are made up of multiple computer files via indicators to be defined

timgollins #Ipres2010 #5a – @davetaz e-prints talking about sampling for characterization and then using Plato for Risk assessment 2 Retweets

euanc how do you test results of bulk migration? how do you confirm you have preserved whatever you were trying to preserve? #ipres2010

timgollins #Ipres2010 #5a – @davetaz – talking about executable preservation action plans that use linked data registries to identify tools 1 Retweet

timgollins #Ipres2010 #5a – @davetaz Plato – executable action plan contains decisions taken in creating the plan – preserve the plan in the repository 2 Retweets

timgollins #Ipres2010 #5a – @davetaz Trust, authenticity – declare the plan used to migrate the objects – transparency/provenance delivers trust ! 2 Retweets

“Good to see KeepIt project enabling Repo managers to really use tools”

euanc #ipres2010 #5a @davetaz – quotes from KeepIT exemplars “Much more time-consuming and complicated” “less confident but now knw what it means” 1 Retweet

mariekeguy #ipres2010 #5a Tarrant: KeepIt examples – repository managers realise dig pres not just for the techies – also part of their role 1 Retweet

JISC logoneilgrindley @davetaz talking about integrating preservation tools. Good to see KeepIt project enabling Repo managers to really use tools.#ipres2010 1 Retweet

mariekeguy #ipres2010 #5a KeepIt project http://bit.ly/92Qhf1 1 Retweet

timgollins #Ipres2010 #5a – @davetaz Presenting the e-prints users response – impressive piece of work – excellent talk too 🙂 2 Retweets

EPrints repository software logopjvangarderen #ipres2010 Tarrant: Plato action plan XML read by Eprints for ‘action’. V.cool. Need to see if this will work for Archivematica. 2 Retweets

jisckeepit Thanks @timgollins @euanc @mariekeguy for excellent commentary and summary on @davetaz Plato and repositories #ipres2010 #5a

Questions?

The short question and answer session that followed the presentation is summarised by the presenter, Dave Tarrant:

Q1: How did you organise the training courses in KeepIt and what materials did you use.
A1: We utilised connections with projects to invite key people to give presentations which were focused on repository managers, we collected the materials and these are available online (url tweeted http://www.ecs.soton.ac.uk/research/projects/640, source materials). Ed. Or try course blogs, course presentations (Slideshare)

Q2: Is there any format you can’t handle?
A2: In EPrints, no, in the scope of the tools, yes, as the tools do not cover all formats. However in the case of eCrystals they have started this process by providing the identification of their files to PRONOM-DROID. You can then follow the workflow presented to look at extracting characteristics and feeding these to a tool or developing a new characterisation tool.

Q3: I notice in the paper you give the plan an ID (ID/41). Is this a persistent ID?
A3: Yes! An eprints persistent ID is assigned to almost everything in the system. I simply removed the first part of the URI to fit it in the 2-column format. EPrints provenance (which uses the Open Provenance Model, OPM) wouldn’t work without it.

Summary

So what can we divine from this? The presentation was fast and furious, in true Tarrant style, but thanks to the tweeters we can see the main points were mostly effectively conveyed and the story well received and complete. An amazing response.

lescarr Following #ipres2010 where @davetaz presents the EPrints/DROID/Plato integration so quickly that tweets are too long & cumbersome to keep up

Posted in Uncategorized.

Tagged with , , .


KeepIt course 4: putting a preservation plan into EPrints

KeepIt course module 4, Southampton, 18-19 March 2010
Tools this module: Plato, EPrints preservation apps
Tags Find out more about: this module KeepIt course 4, the full KeepIt course
Presentations and tutorial exercises course 4 (source files)

So far in the practical work in KeepIt course 4 we have managed some storage services from our EPrints repository, then deposited some image files in the repository and performed preliminary format identification and risk analysis. We exported those files and used Plato to produce a preservation plan for that format.

These tools were only trivially connected by the export and use of common files. In other words, you could have done these courses on EPrints and Plato independently. Further, since the stated aim of this course module was to put preservation in the repository interface, you might ask how this is achieved when all we have done so far is copy content from the repository and instead used Plato.

Here is the clever part …

and it’s simple compared with the previous session. Recall the repository interface when we exported our GIF files. There are no presentation slides for this final session in course 4, just instruction sheets for our remaining practical work, so an illustration of this interface is reproduced here.

eprints-high-risk-objects

Under the preservation actions button we used to download and export our GIF files to Plato is another button, to upload a preservation plan. This is the plan we have just produced and saved from Plato, as an XML file. So select that file and with one click we are back in the repository, with a preservation plan. In this case our simple plan chooses to migrate GIF images, considered high risk (hypothetically) to PNGs, our choice of low-risk (hypothetically) image formats.

We can now follow the instruction sheets for this session to complete our work for this module.

  • Actions: find the original GIFs, upload the plan, view the result.

Each image record now contains a migrated version, the PNG, as well as the original GIF.

We can now update our earlier format risk screen, which shows the new PNGs as low risk objects and includes our GIFs in the same category, because they have migrated versions, but with a red bar to indicate they were originally considered to be high risk. Below this screen is shown the list of preservation plans relating to these objects.

A further short exercise demonstrates that when we deposit new objects in this format, we don’t have to reload the preservation plan. The appropriate plan is recognised and an Enact Plan button appears for our high-risk objects.

The exercise completes with a short demonstration of provenance. As we learned in KeepIt course 3, the provenance of an object is a verified record of its past history. Since our image files have been converted we have a provenance record for them. In EPrints this relational information is stored as Linked Data, and in this exercise we learn how to view these relations.

EPrints-Plato KeepIt course 4 summary

We said at the outset of KeepIt course 4 that we would put preservation in the repository interface, and we have done this for storage, risk analysis and preservation planning. We believe this is the first time this range of preservation support has been available in a major repository platform.

We also said that while you could do courses on EPrints and Plato separately, this is the only course to combine the two. We have shown here how the value multiplies by combining the different tools.

Versions of this EPrints-Plato course have been given in Corfu (ECDL, Sept. 2009) and Madrid (OR, July 2010), as well as this one in Southampton as part of the KeepIt course, and was just presented for the final time in Vienna (iPres, Sept. 2010).

It remains to me to thank our outstanding presenters for these courses: Andreas Rauber and Hannes Kulovits from PLANETS and Vienna University of Technology, and Dave Tarrant and Adam Field from the EPrints team at the University of Southampton.

Posted in Uncategorized.

Tagged with , , , , , .


KeepIt course 4: Preservation planning with Plato

KeepIt course module 4, Southampton, 18-19 March 2010
Tools this module: Plato, EPrints preservation apps
Tags Find out more about: this module KeepIt course 4, the full KeepIt course
Presentations and tutorial exercises course 4 (source files)

Preservation planning provides a workflow leading to a preservation plan. One of the problems with some approaches to digital preservation is they are too proactive and reactive where file formats are concerned. For example, take a file format, decide the consensus is against it (such as Microsoft Office formats), and migrate. There is very little formally to justify this process, just that it is possible and not very difficult. The longer-term consequences of this action are unknown. What was done with good intent may turn out to be detrimental.

Preservation planning seeks to give a more formal basis to such decisions, and in the process will help to automate and record the consequent actions.

For this session in KeepIt course 4 we welcome back Andreas Rauber and Hannes Kulovits from the Vienna University of Technology to provide an extensive presentation on preservation planning using a tool, Plato, which they developed as part of the PLANETS project. This will be followed by further practical work.

[slideshare id=3561816&doc=plato-preservation-planning-keepit-100326064543-phpapp01]

As a formal process, preservation planning requires, unsurprisingly but perhaps disconcertingly, a lot of preparation as it takes account of preservation policies, legal obligations, organisational and technical constraints, and user requirements (slide 8). Fortunately, participants in this course are well positioned to do this, since we have covered many of these issues already in KeepIt courses 1-3.

The preservation planning approach supported by Plato can be overlaid on the OAIS reference model (slide 10), and is shown in more detail in slide 12. Preservation planning with Plato involves four stages:

  • Define requirements (slides 12-33)
  • Evaluate alternatives (slides 34-45)
  • Consider results (slides 46-58)
  • Build preservation plan (an exercise)

The reader can explore the Plato workflow using the slides. Here we will highlight some of the critical stages.

In the KeepIt course, and indeed throughout the KeepIt project, we have begun our preservation approach with format identification, but now we go further. We have to relate our identification and other information about our digital objects to our requirements for those objects. This is where our understanding of the significant characteristics of digital objects from KeepIt course 3 becomes useful. Slide 25 shows a mindmap of the sort familiar, again, from course 3, and in slide 26 we encounter the Plato interface, the tree editor, for the first time. Flipping between mindmap and Plato, the following slides show how the requirements are elaborated and values added, illustrating how this information might be mapped to the Plato editor.

Once we have described our objects we want to know what we might do to preserve them. There is typically more than one choice, not just in terms of a preservation action (e.g. format migration) but also in how that action might be performed, and these alternatives need to be evaluated to serve our requirements. If you recall, at the end of the previous session in this course module, we deposited some GIF files in a test repository and then downloaded those files in readiness for this session. In slide 37 we see these files appearing in Plato for the first time. Plato now shows us what alternative actions are available for these files.

Now there is a decision to be made: go/no-go/deferred-go. To make an informed decision we need to run some experiments, that is, to run the alternatives and compare the results before we commit to any plan. Plato helps us to run and evaluate the experiments, in this case on our image files.

Having begun to get some results we could perhaps begin to think we have done the hard work, but there is still a tricky stage to negotiate. Before we can analyse the results we have to transform and weight the measured values from the experiments, that is, to normalise the values so different experiments are measuring the same thing (slide 48-49), and to set the level of importance for each of the factors in our requirements tree (slides 51-52).

Finally, Plato presents our results (slides 55, 57) and we can see the benefit of using this tool.

Summary and exercise

Before we start the exercise, slides 60-64 summarise the presentation so far. If there is one conclusion that I would highlight above the others, it is that

  • preservation planning is a basis for well-informed, accountable decisions (slide 64)

It is no longer necessary or acceptable to make ad hoc preservation action decisions. This has been a detailed and involved process, but the benefit for a large repository is that the resulting plans can be used across all content in the analysed formats, now and in the future.

Two exercises are set out in slides 66-70. Again, we use our imported GIF files. My impression from observing participants on the course was that this may have been the hardest exercise in the whole KeepIt course, especially exercise 1 which confronts the requirements, and the first encounter with the tools, including the Freemind mindmap tool. There is nothing like a steep learning curve to get the best out of people, and by the end of this session you could hear the sound of pennies dropping.

We now have a preservation plan, and in the next session we will put that plan to work in a repository.

Posted in Uncategorized.

Tagged with , , , .


KeepIt course 4: Physical Preservation with EPrints

KeepIt course module 4, Southampton, 18-19 March 2010
Tools this module: Plato, EPrints preservation apps
Tags Find out more about: this module KeepIt course 4, the full KeepIt course
Presentations and tutorial exercises course 4 (source files)

In this session of KeepIt course 4 we will use EPrints preservation apps to manage large-scale storage and display file format risks. In the course participants were provided with test repositories. Other users will need to download the apps (plugins) and install them to run with an EPrints 3.2 repository.

Our presenters for this session are EPrints experts (don’t omit the space or everyone sniggers) Adam Field from EPrints Services and Dave Tarrant from the KeepIt project, both members of the EPrints core code developer team.

EPrints is software designed to build digital repositories, in particular, repositories of research papers produced within institutions such as universities. Since that original goal the capability of the software has expanded to enable different types of digital content to be managed within a repository. In essence, it provides a series of interfaces for content management tasks for users and administrators, for example, depositing content, managing content, and finding content.

An important development stage was reached in 2007 with the release of EPrints version 3.0, which enabled applications created independently of EPrints to be used with it, without needing the authorisation of the code manager. Among the applications developed in this way are some intended to manage the preservation workflow, applications that have evolved from the JISC Preserv and KeepIt projects. Finally, with the release this year and ongoing upgrades of existing repositories to EPrints v3.2, these preservation tools can be distributed and used more widely, and with the next iteration (v3.3 in 2011) they will be available with simpler one-click installation from the EPrints Bazaar.

[slideshare id=3561319&doc=eprints-storage-100326052847-phpapp02]

Managing storage in the ‘cloud’

First we consider storage with Adam Field. The range of services providing digital storage is changing and expanding. Disc storage attached to machines and local network storage have been staples that are are increasingly being supplanted by distributed storage on the Internet, or ‘cloud’ storage as it is often called, for large content volumes. There are advantages and disadvantages to each (slides 7-10).

Another approach is to combine all storage services optimally, or ‘hybrid’ storage (slide 11). To support this EPrints has introduced a storage controller. In this way different types of content or different versions of content can be stored in different places depending on cost, value, how critical the content is, etc. Storage policies can be written to manage storage automatically based on the selected criteria (slide 14). For example, a document is to be stored locally if it is a volatile version, or in a cloud service if not (slide 15).

The storage controller provides an interface to manage and move content from the locations selected initially (slide 17).

In our first exercise in this session we use the storage interfaces and learn how to modify storage policies. First, within our repository (or test repository; must be EPrints v3.2 or later) we need to find the familiar EPrints admin screen, and then open the tab for Config. Tools. Here we will find a button for the Storage Manager. This displays the different storage options available to the repository manager, and indicates the number and volume of files stored in each location, with buttons to move and delete content simply from each location.

Next we will modify the storage policy. This involves some simple code editing. To access the file we return to the Config. Tools screen and open the View Configuration button to reveal an XML file (storage/default.xml). There follows three very short exercises designed to modify this XML code and change the storage policy. After each exercise you can return to the storage manager screen to review the changes you have made. If there are any problems an example solution is provided on the final page of the exercise sheets.

File Formats and Risk Analysis

Next Dave Tarrant begins to explore support for managing file format risks using the EPrints preservation apps, and it will become apparent why we spent time introducing the preservation workflow in KeepIt course 3 (and recapped in this course module).

[slideshare id=3561575&doc=eprints-formats-risk-100326060440-phpapp01]

Immediately we can see elements of the preservation workflow in this presentation. The key feature here, however, is to show how a format risks management interface is built into EPrints. First we have a format classification screen (slide 4) but without any risk scores. By slide 6 we have classified these formats, hypothetically, into three broad risk categories: high, medium and low risk objects.

Actual risk scores are problematic at this stage. Although we can identify risk factors, as we saw in KeepIt course 3, we don’t yet have databases that are sufficiently complete to quantify the risks. Although a community-based linked data way forward has been proposed, such an approach still has to be adopted and developed further.

Nevertheless, we can see in principle how our interface might be used were real scores available. Slide 7 identifies some ‘medium’ risk objects in our small test repository. Should we wish to migrate these objects to a lower-risk format – and note at this stage we have not decided whether this is really necessary; we will come to that in our next session on preservation planning with Plato – the right-hand side of our screen evaluates some migration options in terms of the tools and target formats available.

For our second exercise using EPrints preservation apps we will import some test files, then identify and classify the file formats and apply risk scores.

Once again we begin with the EPrints admin screen in our (test) repository. This time we go to the Editorial Tools tab, where we find a button marked Formats/Risks. But there are no files to classify, so we import a test set (of 20 pdf files, in this case). This time using the Formats/Risks button we have 20 files classified as high risk because these have not yet been identified. After using the Classify Object button as shown in the handout sheets, we can confirm the files are pdfs, but we don’t have a risk score. Next, for variety we deposit some GIF images, provided in a zip file, in our repository.

Now we want to add some risk scores. Again we have to access and edit code in an EPrints Config. file, this time to reset a file that mimics a format risk analysis tool, PRONOM, from the UK National Archives. We have worked with TNA for many years and this is a way of enhancing its publicly available tool in anticipation of new services such as risk scoring. We can see the effect of this code editing by reloading our Formats/Risks screen.

Note, again, these risk scores are hypothetical and are used here to illustrate the process. To emphasise this point, the next part of the exercise shows how to alter the boundaries of the risk classification. In other words, just by altering this file we can affect the risk classification.

By this stage we should have deposited six GIF images from the original zip file, now reclassified as high risk files. We can examine the metadata for these files, and on the same screen we have a form to select a number of these files for download ready for the next session on preservation planning with Plato.

Posted in Uncategorized.

Tagged with , , , , , , .