Skip to content

Categories:

Repository deposit turns to CRUD

crudThere’s no more elegant way of putting it really. What’s at the heart of DepositMO? CRUD

Create, Retrieve (or Read), Update and Delete (CRUD), the four basic functions of persistent storage. This is what differentiates the project from current capabilities for remote deposit of content to repository services. I’m using CRUD to write this blog post in WordPress. As I write I have two action buttons to the right of the content pane: Save Draft, and Publish. Both allow me to update the content to the storage server, the difference being whether the post is made public or not.

So there is nothing new about CRUD, except that it is not yet directly applicable to many of today’s digital repositories, which tend to support single publishable item deposit with subsequent versioning should changes be needed or if updated versions are produced. In other words, there is no concept of a repository workspace – or a connected workspace – that allows the simple incremental updates widely supported by other computer authoring services and described by CRUD, or applications that go beyond this.

“For authors it is often suggested that content might be deposited once by filling data in a Web form, but too much effort is involved for the process to be repeated for another repository. Better is multiple simultaneous deposit under the control of the author.”

It became clear that we should do more to emphasise the role of CRUD in this project following a short, branched exchange on the American Scientist Open Access Forum mail list. A recurrent theme on the list had returned – the apparent tension between deposit in central, subject-based repository collections such as arXiv and distributed institutional repositories (IRs). The question is where to deposit; the aim to maximise the volume of open access content.

Currently the answer probably depends on which subject area the research to be deposited covers, e.g. physical or biomedical sciences will most likely deposit centrally, due to the strength of the repositories serving these areas. Other disciplines will deposit institutionally, but on a much lower scale. There is the crux of the open access problem.

The answer proposed, notably by Stevan Harnad on the AmSci list, is for institutions to mandate deposit of published research papers in the local repository. That is, all papers, not just those not already deposited in a subject-based open access repository elsewhere.

For authors it is often suggested that content might be deposited once by filling data in a Web form, but that too much effort is involved for the process to be repeated for another repository. One approach to reduce the perceived workload, given that all these repositories are open and allow open harvesting of data using OAI-PMH, is to deposit once and then harvest the content to other repositories as required.

Another approach might be multiple simultaneous deposit. To save authors effort, data for deposit is entered into a form once, and then copied to the designated repository destinations. One tentative suggestion to emerge in the latest round of list discussion was that deposit to an IR be accompanied with login details for a central subject repository for subsequent deposit. This is fraught with security problems, as we pointed out to the list.

SWORD logoEnter SWORD, for it was suggested that this be the mechanism for sharing deposit and logins in this case. It turns out that the organisation developing SWORD has a case study that looks quite like that proposed.

Separately, this is what arXiv says about using SWORD for deposit:

“This interface is primarily intended for use by conference organizers, proceedings and journal editors, etc. for programmatic bulk upload of pre-vetted material to arXiv for long term archival and dissemination. It is assumed that this is done with the (implied or explicit) approval of the authors of individual contributions or on their behalf.

“Individual authors may prefer arXiv’s interactive web upload for personal use, because it provides better feedback mechanisms, but in principle the deposit API can be used for one-at-a-time deposit to arXiv by individual authors, too. We envision integration of the deposit process into authoring tools for efficient upload from the desktop.”

So third-party deposit is just about acceptable, perhaps, without being wholly endorsed. The last sentence points indirectly towards the work of DepositMO, and Simeon Warner of arXiv was a co-author of the project’s short debut paper at the Open Repositories 2010 international conference (OR10).

As this paper shows, better than deposit-once and subsequent deposit elsewhere by another agent is multiple simultaneous deposit under the control of the author. It turns out that SWORD has this covered as well.

In fact, there are quite a few SWORD implementations connecting different applications (sources) and repositories (destinations). If you look closely, one of those implementations listed is Microsoft Article Authoring Add-in for Word 2007/2010 – allows repository deposit direct from Word. Within DepositMO we have made some claims about enabling repository deposit from popular applications such as MS Office, and in the project we shall be working with Microsoft to enhance this tool.

Have we made the USP for DepositMO clear in the documentation to date? It’s not SWORD, or deposit or even multiple deposit, or deposit from specified applications. The answer begins with CRUD.

Among this welter of deposit applications, you are probably asking what exactly will be DepositMO’s unique contributions? No. Well I was. At least, I was beginning to wonder if we had made our USP clear in the documentation to date. It’s not SWORD, or deposit or even multiple deposit, or deposit from specified applications. The answer, as we have already indicated, begins with CRUD.

The project proposal talked of ‘an effective culture change mechanism’. That’s a wider issue for another time. On more technical issues the proposal describes the aim to ‘extend the capabilities of repositories to exploit desktop and authoring environments’. More specifically it refers to components for the Microsoft Office authoring environments and enhanced SWORD interaction.

No reference to CRUD-like features here. Nor in the OR10 paper – at least, not using these terms – but the direction is clearer. The paper starts by specifying the motivations for multiple deposit.

Today the use case for repository deposit is write the content with a typical computer desktop application and save it somewhere, but not in the repository yet – the equivalent of the blog Save Draft button. When the work is complete it can be packaged and delivered to the repository using SWORD, the same as the Publish function in the blog. The OR10 paper puts it like this:

“Currently SWORD is a one-way protocol, meaning that a repository can either accept a record, or reject it; there is no middle ground. Adding a lightweight mechanism to desktop applications to enable negotiation on what is sent in a SWORD package would go some way to bridging this gap.”

This facility should become available in SWORD v2.0, and developers from the project are contributing directly to this activity since there is a vested interest in the outcome.

It would open new deposit possibilities. An admittedly complex and possibly unusual, but nevertheless feasible, case is suggested in the OR10 paper where an author of a research paper pulls information from other linked sources, such as a contacts list and a citation manager:

“At the point the document is submitted all this valuable information (such as author identities disabiguated by email address and structured citation listing) is lost.”

All this is more sophisticated than CRUD and points the way forward, but first implementing CRUD features using SWORD as a mediator between applications and multiple repositories would represent serious progress.

What might follow from this is ‘culture change’ or, more immediately, dialogue between author and repository. The OR10 paper puts it more prosaically:

“Our proposal is to enable a simple yet powerful set of negotiations to occur between the desktop application and multiple repositories such that a single familiar submission workflow (in the style of the author’s application) can be presented to the user.”

So as a starting point the aim in DepositMO is to activate the repository as a storage service for the iterative Save Draft action in an authoring application.

In the next post we will consider the practical implications of this approach and look at an early sketch of a possible interface design.

Just as long as we all understand and can share in what is new and where we are heading. Otherwise we may just find ourselves talking about a more familiar form of crud.

Crud. Bin

Posted in Uncategorized.

Tagged with , , , , , .


One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Continuing the Discussion

  1. SWORDv2 and CRUD » SWORD linked to this post on October 26, 2010

    […] case you haven’t seen it, there is a great post on the DepositMO blog entitled ‘Repository deposit turns to CRUD‘.  The post provides a good introduction to CRUD, explains how this fits in the with the […]



Some HTML is OK

or, reply to this post via trackback.