Skip to content

Categories:

DP people to watch: David Rosenthal

Digital preservation has long been more art than science, but that has been changing. Although you will still find much hand waving on the topic, someone who transcends this with a fundamentally quantitative approach is David Rosenthal, one of the instigators of LOCKSS.

Rosenthal’s plenary at the Coalition for Networked Information (CNI) Spring Task Force meeting (April 2009, Minneapolis) – How Are We Ensuring the Longevity of Digital Documents? – described by Cliff Lynch as “absolutely extraordinary”, is a good place to start finding out about his approach to digital preservation.

Summary: it’s not about preserving individual digital objects but about networks of connected online objects, and the problems are scale, costs and rights.

The presentation appears on Rosenthal’s blog, which is also worth exploring for its occasional gems, and you’ll discover links to those as you progress through the talk.

In the KeepIt project we are looking for those people who can extend the reach of digital preservation practice by bringing new insights that can make a difference. I’ve certainly been recommending Rosenthal to project colleagues as just such a person to follow.

Posted in Uncategorized.

Tagged with , .


Arts repositories in the spotlight

For those with an interest in emerging arts repositories, the Repositories Support Project held a timely meeting earlier this week, Open Access and Repositories in the Arts. Only two actual repository examples were presented, the Kultur repository at University of the Arts London, a KeepIt project partner, and PRIMO (Practice as Research in Music Online), which wants to be a music journal within a repository framework. Both gave interesting if somewhat conventional talks without capturing the distinctive and novel character of the repositories.

The meeting reminded me that the JISC Start-up and Enhancement Projects Training (SUETr) had covered similar ground, in the guise of Multimedia Deposits, in March this year. As well as PRIMO, the meeting included presentations on managing moving and static images in repositories. Slideshare versions of these presentations are embedded in page linked above.

Another noteworthy SUETr event, because it included presentations from KeepIt project partners, covered repository policy. Andrew Gray from UAL, again, and Miggie Pickton from Northampton are among six presentations on a difficult and often-avoided topic but which is an essential precursor to effective digital preservation planning.

Posted in Uncategorized.

Tagged with , , , , , .


KeepIt in half a minute

Presented as part of the half-minute madness session at the JISC Infomation Environment and VRE programme meeting (Twitter #inf11): “prepare your 30 second introduction to your project. No powerpoint slides permitted, should be aimed at a general audience.”

Steve Hitchcock, KeepIt project

Last week David Hockney was on TV painting Yorkshire landscapes in watercolour, a new medium to him.

Ohers might use photography, or even words, to picture such scenes.*

Our canvas is digital and presents many new media, and some old.

So our repositories are at once gallery, laboratory and academy all in one.

Just as for Hockney, people would expect no less than to be able to see the works today and tomorrow.

Deposit it, store it, use it, keep it. KeepIt project: keeping stuff safe in digital repositories for the arts, science and teaching.

* Line omitted at last minute to fit severe time limit. For other unintended deviations from script see forthcoming video recording.

Posted in Uncategorized.

Tagged with .


Eating our own dogfood? Yes we can!

Having been questioned about whether or how we might preserve our project outputs in the form of blogs, slideshows, videos, some colleagues in the EPrints developer team have revealed how they are rising to the challenge and creating tools to support inclusion of new online content forms in repositories.

Chris Gutteridge, who first raised the question, offers his initial thoughts and suggestions:

“Twitter and WordPress are not very compatible with the repository model. I’m not sure what the solution is there, beyond keeping a local record of the data even if the primary source is on 3rd party servers. It creates the interesting idea of “growing” eprint records, where a record could be “every blog post from XXX” or the twitter user “Y”

“Youtube and slideshare, on the other hand present a different challenge. They provide a damn useful service, which is easy streaming slides/video and easy to embed in your own site. I think that we can address this from two directions:

  1. Improve the functionality of repositories to lure the data back. Streaming FLV (Adobe Flash) video for EPrints (not too hard to do if you limit the formats accepted), and HTML slideshowy goodness from PPT files (cue Les (Carr) and his scary OSX scripts, perhaps?) These would be cool features anyway.
  2. Encourage the use of something the equal-opposite of the official_url field (option on related URL?) to indicate the social media/web 2.0 (?) URL for the item. That way it can be included in the metadata for all time and if youtube goes away, people referencing the youtube URL could still be resolved to another location.”

On video and slide functionality, two repositories are making progress: EdShare and Language Box (“where students and teachers of languages can publish and share their learning materials, resources and links on the web”).

EPrints project developer Patrick McSweeney explains: “edshare and language box have been pitching for some time that the reason people use these services is because they offer something the repository doesnt. We’ve had a go at adding these things (to language box):

“the server does all the conversions itself using eprints convert plugins written by me and seb (Sebastien Francois). the one for videos uses is a bit of hack to make a job queue for 3.1.x . we use Mencoder (open source) to do the video conversion and im yet to find a video file it cant handle. The powerpoint one uses open office to do the conversions just like a normal convert plugin. It also works for docs. The reason the one in the link looks a bit grainy is because flash isnt displaying the image at its native resolution.

“currently displays in coverflow (so if you have multiple files it does the left and right stuff) but i personally think it looks poo. If you are at all interested I would be very up for making a modified version of these plugins so that they dont use coverflow but allow you to do a preview inline still with a bit of Javascript.” (contact: pm5 AT ecs.soton.ac.uk)

Sebastien adds: “If anything, we’ve only struggled to present a wide variety of formats (slides, flash video player
) on a single interface (on the summary page). Those web 2.0 sites have good interfaces but they only support one format type.

“On EdShare, you can “share a link”, what we do is wget-ting the content and we cache it. We then offer the visitors to view either versions, like google images does: cf. http://www.edshare.soton.ac.uk/1233/, click on ‘View’ below any “Internet Links”.

Chris Gutteridge, who started the debate, sums up: “For both these examples, embedding code + some reassurance that the embedded media would continue to work for several years would be nice!”

Posted in Uncategorized.

Tagged with , , , .


Eating our own dogfood?

KeepIt is a preservation project, about preserving digital repositories. But not preserving anything else, it seems. So we stand accused by Chris Gutteridge of not eating our own dogfood. The evidence is here, in this blog. What are we doing to preserve the content of this blog, the embedded content (Slideshare, YouTube) and the twitters?

Chris says: “these sites, while amazingly cool and useful, have no contract or duty of preservation. The universities involved could always keep their own information, but the “primary” URL for each item is likely to be the one on the above site. That means if the plug is pulled on youtube (is it making money? can it?) then all those URLs could just go away.”

The obvious answer is we are distinguishing the project and communications about the project from the object of the project, the repositories. Should we be preserving the project and its outputs as well? Yes. And that’s a point that goes to the heart of the project’s approach. You cannot preserve content effectively unless you know what it is you want to preserve, i.e. you need a plan. When it comes to the day-to-day activity of the project as reflected here – rather than the boiled-down reports and papers that are presented, added to repositories, published, and thus more actively ‘preserved’ (or managed) – we don’t know what is worth preserving or what should be. We don’t yet have a plan. What follows instead are some thoughts on the preserving the formal vs the informal, and trying to identify where these might meet in the new online continuum.

WordPress blog. Most heavily used service so far in the project. We are using a blog service hosted in our university school (ecs.soton) rather than a public service, so there is a chance to do something about preserving that, linking to the repository perhaps. Maybe there could be a closer association between repositories and blogs. Having said that, while there are some students using the ECS blog service, I’m not sure many academics are, and there could be a message there.

(Note. We will be setting up a project wiki soon, and will again use the in-house hosted wiki.)

YouTube/Slideshare. We are using these principally for the embed function, to display in the blog. Of the four items embedded in the blog to date (3 slides e.g., 1 video), one is also in a repository, and the others we must assume were not considered formal enough by the authors for repository deposit. Those are the two angles on these types of material and repositories: display functionality vs scope for deposit.

Twitter. I’ve only been using it for a few days, so I can’t comment yet on its instant ephemerality vs long-term value. I wouldn’t rule out the latter in terms of realising some academic value, but my immediate impression is it’s not there yet and would have to be heavily filtered. The practice is not there yet, nor the filter mechanism.

So the project is focussed on preservation of repository content. To what extent are we seeking to preserve what is in repositories, to shape content creation practices for better preservation, or to shape repository policy to accept and therefore preserve a wider range of content types such as considered here? This is an open question, and one that we need to try and answer in the remaining year and months of the project.

The general fact is that practice in digital preservation is always trying to keep up with content creation practices. McLuhan said the content of a new medium is an old medium. Hence pdf. Blogs, Twitter (to come, Google Wave), etc., are the leading-edge content forms for the new online medium. If it seems obvious and inevitable to state that digital preservation is always reacting, never leading, Chris is saying this doesn’t have to be the case. Content creation, repository support tools, repository management and preservation are all part of the same continuum. We all face the same problems. It’s good to be reminded of that.

Posted in Uncategorized.

Tagged with , .


90mph preservation

Yesterday Dave Tarrant from KeepIt was speaking at the latest Sun PASIG meeting in Malta (#pasig on Twitter). Chris Rusbridge, head of DCC, was tweeting from the event all day and had this to say about Dave’s presentation:

  • Tarrant at 90 mph getting ePrints 3.2 data directly into a couple of clouds (Amazon & Sun). Also SWORD-based deposit from OOXML.
  • P2-registry: make registries linked data, includes PRONOM-wrapper but includes others like DBPEDIA. So PDF1.3 goes from 4 to 50 tools
  • Dave Tarrant: integrating preservation into ePrints.org; spinoffs from PRESERV2 & new Keepit project. Risk data re file types dist’d

The presentation has over 30 slides and 3 screenshot videos (although the videos are not included here). All for a 20 min presentation! Two points to note:

  1. Dave claims the speed was because the meeting was running late. He later added: “it should have been 2 presentations not one!”
  2. Chris Rusbridge pretty well summed it all up in three tweets.

More tweets on Tarrant:

  • hochstenbach Dave Tarrant (U. Southhampton) provides fancy demo’s uploading data via EPrint into the Amazon Cloud
  • babkot47 Dave Tarrant talking (fast) about using hybrid storage for eprints
  • dkeats EPrints is the most flexible platform for building high quality, high value repositories. Looks inpressive, installing now.

There is a clear theme here – speed changes everything.

Posted in Uncategorized.

Tagged with , , , .


Preserving arts repositories: exceedingly good slides

It’s an exciting possibility that arts repositories will not be the same as conventional institutional repositories. That will also bring new challenges in terms of managing data today and tomorrow, and it’s why Kultur is an important exemplar for the KeepIt project.

I discovered some of those challenges when I first met Jess Crilly and Andrew Gray, who manage the UAL Kultur repository. It was something of a fortunate coincidence that on the same day self-styled (and now ex-) repository rat Dorothea Salo offered a slide show on IRs for digital arts and humanities, from a data curation summer school.

[slideshare id=1466324&doc=digcurinst-key-090520141920-phpapp01]

I fully recommend Dorthea’s slideshow. It’s not just about arts and humanities, but many of the issues raised by Jess and Andrew chime with Dorothea’s points.

It’s also a good antidote to the digital preservation propaganda (scary, expensive) I warned about in my recent presentation at the project meeting. Dorothea is much more reasoned and practical.

So that makes the project easy, because now we know all about repository preservation. Actually, it’s a good starting point. At least we know what the problems are, but what do we do next? If the KeepIt repositories are to become true exemplars they have to show what needs to be done going forward. Ultimately all repositories will have to deal with and overcome slide 47. As Dorothea frequently says, good luck with that!

Don’t forget to see the comments on EPrints added to the original Slideshare version. EPrints is not as limited on formats and user interface as the presentation suggests, but most EPrints users will know that already.

Posted in Uncategorized.

Tagged with , , .


First project meeting: presentations

The first KeepIt project meeting for partners took place on 2 June and led with three presentations: to prompt discussion on why we should be wary of the big issues in digital preservation, on new tools to support repository preservation, and on the experience of one exemplar repository that has already investigated digital preservation.

First, project manager Steve Hitchcock on how to spot typical digital preservation propaganda.

[slideshare id=1598183&doc=hitchcock-firstpartnersmeetingv2-090617111943-phpapp01]

Dave Tarrant, the project developer, reprised a presentation on the project from the recent Open Repositories OR09 international conference in Atlanta. He describes how the KeepIt project is providing training, development and deployment in three key areas of digital preservation for repositories: storage, risk analysis and preservation action.

[slideshare id=1607722&doc=keepitor09-090619041223-phpapp02]

Not all repositories are new to digital preservation. Chemist Simon Coles has been working on research data repositories for many years, and describes the experience and challenges of data repository preservation, including planning, metadata and other issues affecting dissemination.

[slideshare id=1608282&doc=coles-keepit-kickoff-090619063135-phpapp02]

Posted in Uncategorized.

Tagged with .


Preserving URLs

There are many dimensions to digital preservation. Typically we think about the individual objects, the digital bits that form the object and our ability to render, present or perform the object for a human to view or for another machine to use.

Another aspect is locating and retrieving the object. In file and archival systems a file address system is maintained. On the Web, URLs are the equivalent addresses. Here is a problem: within a managed file system it is possible to maintain consistency of addresses through changes and upgrades. The Web is a big open system where URLs are published but not always maintained, as we know through broken links. In such an open system those managing the object cannot manage the proliferation or the publication of its URL. What they can do is manage the connection to the original URL whatever changes may take place at the server end.

One of our colleagues on the KeepIt project, Chris Gutteridge, has strong views on maintaining, or preserving, URLs. Chris is lead developer of the EPrints repository software, so his views translate to practical application and affect systems running EPrints. In his inimitable style, Chris has recorded a video (6 May 2009) on the importance of preserving URLs.

Posted in Uncategorized.


EdShare: the wider angle

EdShare is one of the repository exemplars in the KeepIt project, and was recently profiled on this blog. For a broader view of this repository Debra Morris, manager of EdShare, has written an article in Ariadne (confusingly labelled as the April 2009 issue, but it has only just been announced):

Encouraging More Open Educational Resources with Southampton’s EdShare

It’s part history of recent elearning technologies, VLEs and MLEs, at the University of Southampton and part repository description.

The following section is a helpful supplement to our blogged profile of EdShare:

The benefits can be described as follows:

  • Content providers would be able simply and speedily to deposit any kind of file (or collection of files) and describe them to their chosen level of detail using metadata or free-form text;
  • Content providers would be able to control the access levels to their files (typically ‘institution-only’ or ‘open access’);
  • EdShare would allocate a unique and permanent URL which could be referenced by other systems, such as the institutional VLE;
  • The free-form text and metadata descriptions of all files lodged in the repository would be found and indexed by search engines;
  • Users would be able to browse all items in the repository and download those lodged for open access without the need to be a registered user of the repository;
  • Users browsing the repository would see Web 2.0-like tagging and annotation features.

Posted in Uncategorized.

Tagged with , .