Having been questioned about whether or how we might preserve our project outputs in the form of blogs, slideshows, videos, some colleagues in the EPrints developer team have revealed how they are rising to the challenge and creating tools to support inclusion of new online content forms in repositories.
Chris Gutteridge, who first raised the question, offers his initial thoughts and suggestions:
“Twitter and WordPress are not very compatible with the repository model. I’m not sure what the solution is there, beyond keeping a local record of the data even if the primary source is on 3rd party servers. It creates the interesting idea of “growing” eprint records, where a record could be “every blog post from XXX” or the twitter user “Y”
“Youtube and slideshare, on the other hand present a different challenge. They provide a damn useful service, which is easy streaming slides/video and easy to embed in your own site. I think that we can address this from two directions:
- Improve the functionality of repositories to lure the data back. Streaming FLV (Adobe Flash) video for EPrints (not too hard to do if you limit the formats accepted), and HTML slideshowy goodness from PPT files (cue Les (Carr) and his scary OSX scripts, perhaps?) These would be cool features anyway.
- Encourage the use of something the equal-opposite of the official_url field (option on related URL?) to indicate the social media/web 2.0 (?) URL for the item. That way it can be included in the metadata for all time and if youtube goes away, people referencing the youtube URL could still be resolved to another location.”
On video and slide functionality, two repositories are making progress: EdShare and Language Box (“where students and teachers of languages can publish and share their learning materials, resources and links on the web”).
EPrints project developer Patrick McSweeney explains: “edshare and language box have been pitching for some time that the reason people use these services is because they offer something the repository doesnt. We’ve had a go at adding these things (to language box):
- powerpoint in slideshare style
- video in youtube style
“the server does all the conversions itself using eprints convert plugins written by me and seb (Sebastien Francois). the one for videos uses is a bit of hack to make a job queue for 3.1.x . we use Mencoder (open source) to do the video conversion and im yet to find a video file it cant handle. The powerpoint one uses open office to do the conversions just like a normal convert plugin. It also works for docs. The reason the one in the link looks a bit grainy is because flash isnt displaying the image at its native resolution.
“currently displays in coverflow (so if you have multiple files it does the left and right stuff) but i personally think it looks poo. If you are at all interested I would be very up for making a modified version of these plugins so that they dont use coverflow but allow you to do a preview inline still with a bit of Javascript.” (contact: pm5 AT ecs.soton.ac.uk)
Sebastien adds: “If anything, we’ve only struggled to present a wide variety of formats (slides, flash video player…) on a single interface (on the summary page). Those web 2.0 sites have good interfaces but they only support one format type.
“On EdShare, you can “share a link”, what we do is wget-ting the content and we cache it. We then offer the visitors to view either versions, like google images does: cf. http://www.edshare.soton.ac.uk/1233/, click on ‘View’ below any “Internet Links”.
Chris Gutteridge, who started the debate, sums up: “For both these examples, embedding code + some reassurance that the embedded media would continue to work for several years would be nice!”
On the topic of preserving blogs, I should note a companion JISC project, ArchivePress:
“ArchivePress is a blog-archiving project being undertaken by the University of London Computer Centre and the British Library Digital Preservation department
“The project will explore practical issues around the archiving of weblog content, focusing on blogs as records of institutional activity and corporate memory. As an alternative to the web crawling/harvesting approach of the Internet Archive and the UK Web Archive, ArchivePress will test the viability of using RSS feeds and blog APIs to harvest blog content (including comments, embedded content and metadata). The archived content will be stored and managed using instances of WordPress, thereby maintaining the blogs’ native data structures, formats and relationships.”
http://archivepress.ulcc.ac.uk/
Let’s hope this is a ‘yes we can!’ project as well.