Skip to content

Categories:

DepositMO and the future of SWORD

This post explores the relationship between the SWORDv2 project and the DepositMO project, and how they have influenced each other.

SWORDv2 officially began in late 2010, and DepositMO started at around the same time, alongside a number of other JISC deposit related projects including DURA and RePosit. When SWORDv2 set up the Technical Advisory Panel in early 2011, representatives from these projects were invited to join to share their deposit scenarios and technical expertise. Of the 3 projects, DepositMO was by far the most closely aligned to the goals of SWORDv2 and also the most technical project.

The SWORDv2 mission was to support the full range of CRUD (Create, Retrieve (Read), Update, Delete) operations for scholarly systems, and to maintain the use-cases which had driven SWORDv1, including mediated deposit and most importantly of all Packaging. DepositMO, meanwhile, was focussing on desktop-to-repository deposit and two-way synchronisation.

This meant that DepositMO would need to take full advantage of the CRUD protocol operations offered by SWORD, although both mediated deposit and packaging were not so relevant. The project would therefore be a valuable resource for a number of critical aspects of SWORDv2:

  • A sounding board for core profile developments: DepositMO had a vested interest in the CRUD protocol operations and had explicitly no interest in the packaging aspects. This meant that the protocol operations would go through thorough review while the need for every packaging related concept would be questioned as to its necessity. As such, the SWORDv2 profile received extensive and sustained review throughout the project which has hardened it against many counter-arguments.
  • A testing base for software development: DepositMO is being implemented against DSpace and EPrints, as is SWORDv2. Since it was clear from the outset that DepositMO is providing some extensions to SWORDv2, the majority of the codebase for both systems is the same. Extensions to SWORDv2 have been developed for both repositories by the project though. This means that not only has the core software been through some important testing, but its capacity to be extended has also been examined.
  • Representation of a critical use case: the desktop-to-repository use case is one of the most under-developed in our community, so it was very interesting to have a project focussing on it to represent that use case throughout discussions. With services such as Dropbox and Ubuntu-One becoming common, the deposit applications developed by DepositMO will no doubt be an important demonstrator as to the way academics will interact with repositories in the future.

In addition to the base operations provided by SWORDv2, the DepositMO project has also specified the following extensions:

  • That Collections be able to list their content items which belong to the authenticated user.
  • That individual files behave RESTfully and in line with the rest of the SWORDv2 specification. This means that replace and delete operations can be carried out on the individual files in an object.

The SWORDv2 project therefore considers it to be a realistic possibility that a SWORD 2.1 specification may be produced in the future incorporating the DepositMO extensions in addition to any suitable extensions from other projects using the protocol.

Posted in Uncategorized.

Tagged with , , , , , , , , .


Will repository deposit interfaces support collaborative authoring?

Support for collaborative authoring would appear to be a natural consequence of work done in DepositMO, but is not something that has been addressed directly so far. In other words, we can talk about principles rather than practice, and we don’t have test data to back up any assertions we might wish to make on this point. So in this post we will consider expectations for collaborative authoring by looking at services that already support it.

The question was prompted recently when Stuart Lewis, SWORD Community Manager, invited us to contribute to a series of deposit use cases being coordinated by SWORD together with SONEX. We submitted a short use case on desktop to repository deposit, but Stuart posed another intriguing but harder to answer case on support for collaborative authoring.

One place to look might be Dropbox, since we have already profiled that service and compared it with DepositMO, noting again the distinction that this and other cloud services provide facility for storage and content management rather than, as repositories do, for publication and access. As we found then, this distinction has consequences for the ability to support certain services, and that might also apply to collaborative authoring.

Coincidentally, ReadWriteWeb just highlighted Chatbox, an application that supports collaboration in Dropbox. Chatbox adds the ability to leave comments on, or chat in real-time about, particular files and folders, and everything is stored in Dropbox.

That RWW article also refers to other document collaboration services, Box and Huddle, as well as, in this respect, the venerable Google Docs, the exemplar that Stuart mentioned in his original enquiry. Another service often cited in this and many other contexts is Microsoft Sharepoint, but this supports sharing within an organisation and is harder to set up for use outside an organisation’s computing firewalls than many cloud services.

Looking at these services there appear to be two primary requirements for collaboration: recording and presenting conversations about the object of the collaboration, and tying that conversation to the object. According to RWW again, Google has identified the common document collaboration problem: how to manage comments and conversations around a document.

Chatbox aside, it seems that efforts to tie comments to the document can lose the chronology and immediacy of the conversation, while solutions to that problem often end up detaching the conversation from the document. Google’s approach is to make its new comments system work “like a conversation thread on a Facebook, complete with @replies. When someone is tagged in a conversation, they will receive an e-mail notification. The user can then either click-through to the document, or simply respond to the e-mail. All the conversation is captured and stored in Google Docs with the document.”

RWW notes that Box does something similar with its discussion tools. On its site Box describes its collaboration feature like this: Create an online workspace where you can share project files, manage files with version history, add comments, assign tasks, or create new content.

Huddle‘s promotion of its collaboration feature emphasises more conventional concerns of security and access control: “Using Huddle, relevant people, inside and outside of your organization, can access all content relating to a project or campaign in a secure online environment. Apply granular permissions and control who can and cannot view specific information.”

I have used Google Docs to coauthor documents in realtime, but without the comment features now mooted. Otherwise my approach to collaborating on documents where I am lead author has been typically to circulate the document, usually by email, to wait for comments and update the document accordingly. It seems this approach is somewhat behind the curve in terms of modern document coauthoring, and behind the expectations of users of leading cloud services. If these requirements are reflected by repository users, then repositories have some catching up to do. DepositMO is further away from this form of support for collaborative authoring than I had thought when first responding to Stuart’s prompt.

Salvation may lie with SWORD v2 again, as it has with our efforts to increase the functionality of repository deposit. Colleagues have been putting the case to apply the Google Docs model in the next version of SWORD, and their arguments have been heard, at least as far as testing is concerned. Such an approach suggests the possibility at least of mimicking the approach of Google Docs to collaborative authoring.

SWORD is simply a technical specification for a Web service, however. As always, user-facing services and interfaces, whether for deposit or collaborative authoring, need to follow if we are to exploit changes to the specification.

Posted in Uncategorized.

Tagged with , , , , , , , .


Stepping back from the edge: rescuing the project plan

DepositMO hasn’t fallen off a cliff, although it might look like it from the recent lack of blog posts.

Actually, one partner in the project has fallen. Our original partners from Edinburgh University have left the project. That was their choice. It would be glib to say these things happen, but this has not happened on any other project I have managed, so I am sorry it happened here.

After some sensitive discussions with Balviar Notay, our programme manager at JISC, I am pleased to report that we will be taking the work due to be done by Edinburgh forward, and we have recruited a leading expert to help with that. There will be a short extension to the project, to the end of September, to allow this part of the work to be completed. In other words, nothing will be lost from the original project plan as a result of this local problem.

It has to be said that Balviar’s approach throughout these difficulties has been to enable all the project work to be completed positively, and she has shown great goodwill and flexibility to facilitate this.

We have a full project meeting scheduled for next week to assess the ramifications of these developments for the whole project team, and to assure that we have covered all angles in our revised project plan. It would be premature to blog more details ahead of that meeting,

Beyond the project management issues, there has been progress on the technical front. We have been running preliminary tests on two deposit tools: a pop-up interface for Word 2010 to enable direct deposit in a selected repository for the work being prepared in the application; and a more general Dropbox-like drag-and-drop tool that works with desktop file management systems such as Windows Explorer or Mac Finder. The conclusion of those tests is that we can move on to more formal and substantial user tests.

Again, it might be premature to reveal too much about these tools ahead of the tests, so that all users are starting from the same point, of no prior experience. (Not that our target users, all recommended by our repository and disciplinary content partners, are likely to be reading this blog.)

Perhaps a bigger problem is how to present these tools here. At a recent JISC Repository Deposit programme meeting (Birmingham, 1 March, see this report on the meeting) my presentation consisted of a live demonstration of one tool, a short video of the other (had time permitted), all held together loosely by a few slides. Before we overload this post, I’ll promise we will give these tools plenty of coverage in future posts, trying all reasonable representations, so that more people can understand and get to try for themselves.

Posted in Uncategorized.

Tagged with , , , , , .


Dropbox: lessons for repository deposit

@CameronNeylon Dropbox solves problem scientists know they have. Repositories don’t 4 most part. Ideal would be eg SWORD app against dropbox API (24 Jan 2011)

Dropbox is a data storage and retrieval service that is typical of the new breed of network or ‘cloud’ services. What’s more interesting is the apparent popularity and wide use of this service among research scientists, at least according to those on our project user panel and the correspondent above. This post will look at Dropbox, to see what we can learn from its deposit method and interface, and to consider how it compares with the repository deposit model.

Dropbox immediately affirms its credentials as a Web 2.0 service on its front page, with its simple call to find out (video) or use (download). More unusually, there is no catchy slogan to sum up the service, unless you think the name does that already.

The video animation never stresses the technical aspects, but seeks to contextualise the service as a ‘magic pocket’ within a traveller scenario for synchronised cross-device access and sharing with others. We are asked to believe that our storybook user treats Dropbox as a “home for all of his stuff” and urges “you can stop worrying about managing files and backups and get on with your next adventure”. Less enticingly perhaps, but just as importantly, we are reassured the service provider “takes the security and privacy of your files very seriously”. In terms of cost, the service is free up to 2 GB of storage, with prices to expand this up to 100 GB.

There we have it, a pretty broad characterisation of the service, but not without the obligatory “You’re done.”

Alternatively, Dropbox is slow to upload and size-limited if you have a number of x-large files to upload (e.g. 30 GB), and expensive relative to a local institutional repository, say – I expected a withering response from the project developers to our user input, but not that this would be their target.

@depositMO Dropbox vs repository? Since reminded of distinction between research data cycle and publication cycle, but these now harder to separate (24 Jan 2011)

So a repository, especially an institutional repository of research papers, has a different purpose of access and dissemination centred on formal publication, often in conjunction with publication elsewhere, such as in a journal (the “authoritative” source). The first thing to note here is that in the case of such content the repository is not the only source, or even the primary source, but the free and open source, and unlike the content targetted by Dropbox that would otherwise simply reside on the author’s hard drive, this must have an effect on author motivations to deposit in a repository. This purpose also has consequential effects on author versus administrator/repository control of the content to be deposited, where Dropbox instead can leave the user in control.

Where repositories target content that is less likely to have alternative means of dissemination or publication, such as arts materials or teaching and learning content, then this has produced encouraging content growth, after careful consideration and redesign of the default repository deposit interfaces.

It might be expected that authors will write papers on a personal computer or over a number of such machines, and the text, especially in sciences, will be shared with co-authors working on other similar machines. What’s less clear is to what extent this process is mediated by other types of, e.g. mobile, devices. So the sharing and collaboration feature exhibited by Dropbox, if less so the cross-device support, might have repository application. Such support does not exist in repositories now, but that is what we might be seeking to add in the DepositMO project, leading to quite lively and heated discussion centred on support for workspace, sharing and deposit/publication. The need to maintain the distinction between a private, if shared, workspace and public space is something that repositories, but not Dropbox, have to give special attention.

By exhorting us to our next ‘adventure’ Dropbox seeks to get the service out of the user’s way, to minimise its impact on the user as far as possible. Depositing content is a necessary chore to be tolerated in lieu of more fun activities, it suggests. Repositories, in contrast, excepting the examples such as those above, rather than minimising the deposit process have tended to expand it in the quest for enhanced metadata, resulting in the widespread reversion from author deposit to mediated deposit by repository administrators. This was never the intention in the original conception of institutional repositories.

To what extent Dropbox achieves its claimed functionality in practice rather than in marketing spin requires experience of real use, but some initial feedback suggests that a simple drag-and drop method of placing saved files in the Dropbox ‘magic’ folder in the user’s file management tool (such as Windows Explorer) is simple and effective. This has encouraged us to ramp up our efforts to offer a similar approach in DepositMO, and an initial demo has been shown ahead of further testing.

Repositories and Dropbox (and no doubt other cloud storage services) appear to have similar and overlapping features but are not the same thing and are offering different services for different purposes. While recognising the differences, repositories may be able to learn something about reaching out to users in terms that seek to address their problems.

Posted in Uncategorized.

Tagged with , , , , , , .


Specifying user requirements: workflow to deposit use cases

What came first, the chicken or the egg? Or in the case of a technical project such as this one, who came first, the developer or the user? In my experience, working in a computer science group that produces an unending stream of talented developers, it’s invariably the former. In a twist on this theme, here we seek to inform the design and requirements collection for enhancing repository deposit by working with users, content creators such as scientists, and content curators such as repository managers.

Developers work on instinct and an unflagging belief that what they do will work (after a bit of debugging). They turn complex problems into solutions in ways that can amaze the rest of us. Towards the end of the work, however, often there can be a lack of time for full user testing, or little interest in the results.

Does what it says on the tin, photo by chris5aw

If it does what it says on the proverbial tin, who specified the problem, and whose requirements are being addressed? Many developers will argue they know the intended use case and the users (for it is they). This can work well for original, highly innovative products or for projects sans users. In other cases, where users are key, or which are refinements and extensions of known products, the outputs will have to be measurable against clearly specified requirements, ideally with reference to real users. This project falls squarely into the latter category. Repository deposit interfaces are hardly new – how could they be – so what we seek to add has to be measured against what is already there.

It is important this is addressed before devising a plan for test and evaluation of the product, otherwise you may end up testing against a limited set of (developer) requirements – it works exactly as the developer said it would – or it doesn’t work with reference to a wider set of (user) requirements, because that’s not what the developer was building.

DepositMO has been a developer-driven project, with a team of developers working for it. As you can tell from entries to this blog, unless we missed something big we have some concepts so far, but not yet implementations for users to try and for us to report. We have requirements, set out in the project proposal, but technical frameworks can be a moving target. For example, we are working with the technical advisory group to produce the next version of SWORD, v2, which we expect will be a core component of what the project produces. That has yet to be finalised.

So at risk of a wilting response from the developers – “you can’t do that”, or “that won’t work” – we convened our user panel, sans developers (well, one crept in, but with a different hat on), to establish a preliminary plan for user testing and evaluation of the products and services from the DepositMO project, and to sketch out some use case scenarios based on identified scientist/author/repository workflows. It’s the use case scenarios we will concentrate on here (Table 1).

Represented on this panel, ultimately, are scientists at the University of Southampton from a number of disciplinary areas – archaeology, chemistry and materials science, as well as representatives of different institutional repositories, notably EdShare, which presents teaching and learning materials, and ePrints Soton, the university’s main institutional repository, which focuses on research papers and presentations. Table 1 shows feedback from a meeting of some from this panel held on 21 January.

Table 1. Use case scenarios leading to deposit for sciences and repositories

Chemistry

Content papers and theses, includes data-graphs-images (embedded data uses bespoke software)

Apps typically produced with Word

Author workflow often multiple authors, shared by email

Update work list and find papers by author, use ResearcherID

Materials

Content data types inc. text files, 3D data

Formats, apps XML, LaTeX, Word

Workflow share using Dropbox

EdShare repository

Content e.g. complex interacting files video

Repository workflow adds:

Automated metadata, add metadata using document properties

Creative Commons (although issues over institutional ownership)

Usage requirements reuse content, controlled distribution

ePrints Soton repository

Content journal articles, conference presentations, book chapters, theses, data types inc. images

Repository workflow inc:

Copyright concerns

What does Table 1 reveal? It confirms that scientists are creating a variety of digital content and formats, not just text, using different applications but noting that Word remains in wide use, which is good for our project approach. Prior to publication this content is shared informally between team members using a range of methods. In terms of sharing, Dropbox emerged as the joker in the pack. It was suggested that this service is popular among the colleagues of our scientists. We shall investigate this service further in a later blog post, but one feature to be commented on is the use of Windows Explorer as a file deposit tool for use with Dropbox. A file management interface is on the agenda of our project developers.

Table 1 also shows that the repository requirements are not the same as those of the authors, and that by virtue of their different collection requirements, so the requirements of the two repositories will differ. These all have implications for the design and functionality of the proposed deposit interfaces, which connect authors with their chosen repositories. One requirement not recorded here is for single-click deposit to multiple repositories – that is already a design requirement and one of the original promises of SWORD – but perhaps not a straightforward task given even the brief repository findings here.

The need to find and list works by an author for later update has already been identified as a developer requirement, but mention of ResearcherID introduces a specific service providing author identification.

From Table 1 we can expand a possible use case: etheses. One of the problems, we are told, in making etheses available open access via repositories is the need to embargo sensitive or commercial content. It is usually the case, however, that not all the text need be embargoed, just certain sections, but because the thesis typically exists as a single document the embargoed material cannot be separated from the rest. It was suggested that a deposit approach might encourage a thesis to be deposited in sections, or ‘chunks’, allowing an embargo to be set for affected sections but with access to the rest.

BTW, in view of our emerging table of requirements an additional case study might be drawn from a recent paper by colleagues at Southampton (and elsewhere; 16 authors!) which describes scientists’ workflow, reuse, and publication in virtual research environments (VREs) such as myExperiment (rather than repositories): Why Linked Data is Not Enough for Scientists.

We shall return to the task of building a test plan at the next meeting of the user panel. Watch this space.

I am grateful to Kate Walker, Mark Scott, Debra Morris and Simon Coles for their contributions to this work.

Posted in Uncategorized.

Tagged with , , , , , .


Visualising extended repository deposit interfaces

SWORD, or Simple Web service Offering Repository Deposit, has won praise for its innovation, but users may find gaps in their knowledge and experience of SWORD. That might be because, as SWORD Community Manager Stuart Lewis says: “to date there has not been a great deal of use of SWORD. One of the reasons is a lack of SWORD clients that can deposit items into repositories.” Stuart noted this by way of introduction to EasyDeposit, his SWORD deposit tool creator. This will help but it is aimed at developers who first have to use it to create new deposit interfaces.

DepositMO has some SWORD-compliant deposit interfaces in development but yet to arrive in front of users. So I’ve been looking around to find some example SWORD interfaces to give the project’s user testing panel an illustration of what to expect. Here are a couple of examples that use SWORD to embed a deposit process in popular applications, Facebook and Word, and a short video showing, in a more technical interface for now, how this might be extended.

First, Stuart Lewis (again) works through a series of selected screenshots of the Facebook SWORD client, allied to code snippets that reveal what is actually happening: “The Facebook client is one of the most complete demonstration clients that there is, and as such ‘hides’ a lot of the work that goes on behind the scenes.” Exactly what users are looking for, no doubt.

Next we have a short annotated screenshot video, put together by the project’s lead developer David Tarrant, showing how the Microsoft Article Authoring Add-in for Word can be used to deposit an article written in Word, just in case you haven’t seen or used this before; presumably most haven’t unless they are preparing articles for submission to journals or repositories (PubMed Central) that require the NLM’s prescribed XML format.

If you cannot view this video in an embedded player on this page, try reloading the page (the player can be temperamental) or go straight to the original page - submitting from MS Word to a repository via SWORD using the Article Add-in beta 3.

Please note, this functionality is provided by Microsoft as an additional feature of Word, and is not something we have produced in the project, although we hope to develop this functionality further. If you wish to try it for yourself you can download the Add-in, but you will need to be using Word 2007 or Word 2010 (but not on a Mac, it seems). If you want to know more there is an extensive user guide, although only a tiny part of it is concerned with SWORD deposit (see the Publishing button, p 34).

This version-of the Add-in supports what is called a 'fire and forget' approach to repository deposit, which is largely the default mode for all repository deposit, currently. As the term implies, the interface will let you deposit your document but won't let you do much else subsequently. You will have to use other methods for that. Or wait for the outputs of DepositMO and the next version (v2.0) of SWORD. Then, we hope deposit interfaces will support a series of iterative actions with feedback.

The critical engineering requirement here is completing a feedback loop in SWORD. As an example of how this will work, David has produced another short screenshot video showing this process in a code-based screen, but nevertheless it demonstrates the principle in practice. In this sense it will inform the ongoing technical discussion around the next version of SWORD.

If you cannot view this video in an embedded player on this page, try reloading the page or go straight to the original page - completing the feedback loop in SWORD.

This example shows depositing an item, getting back the URI and then securely requesting the application/atom+xml of this item to find the status (uses a command line SWORD client, so looks more technical and less transparent; essentially, in the next stage the developers have to present this process in a more user-friendly interface. BTW, REST, seen in the opening annotation screen, stands for Representational State Transfer, "a simple way to organize interactions between independent systems", such as author applications and repositories in our case, and is most commonly used in conjunction with HTTP, the protocol by which machines communicate over the Web).

Posted in Uncategorized.

Tagged with , , , , , .


Repository deposit from authoring apps: conformance and interfaces

Sketch of Word author panel: 1 Choose a repository or open from repository

Sketch of Word author panel: 1, select repository for deposit

Conformance, level 0, level 3, process interaction, addin, plugin, CRUD. These are not all terms we typically associate with repository deposit, but they are key to understanding the DepositMO project.

My mother’s preference for flat-pack kitchens and furniture, admittedly financially motivated, gave me a lifelong aversion to self-assembly instructions. I prefer the articles finished by craftsmen.

So it was with some trepidation that I came to be reminded of this as I – until recently, the errant project manager for DepositMO, a project seeking to connect repository deposit with popular authoring applications – reassessed the documented work to date of the project and came across this array of terms.

One of the principal architects of this project is Dave Tarrant. I have worked with Dave for some years, so I am well aware that his approach – part vision, part inspiration, part technical ability, part technical audacity – does not always immediately or easily lend itself to simple interpretation. Thus we have to try and construct a view of the piece for ourselves using the information available, only to find later that the craftsman version was there all along.

For now, with DepositMO we want to understand the story so far. Here goes at telling this story, noting that it is an early-stage work-in-progress.

Embedding repository processes into the authoring environment

The essential idea of the project is to motivate more content to be deposited in open access institutional repositories by creating a deposit interface for the tools that authors use to create their content. By some margin these tend to be office applications – word processors and spreadsheets, for example – notably the suite of such applications produced by Microsoft.

One simple approach is for the ubiquitous ‘save’ and ‘save as’ functions in these applications to connect with repositories in the same way they currently save content to a local hard disc, on the user’s personal computer. This is CRUD – Create, Retrieve (or Read), Update and Delete – the approach described in an earlier blog post. Although CRUD is not new, it represents a new approach to repository deposit.

In that post we promised more on the practical implications of this approach, so here it is.

Typically, to deposit content in a repository an author will find the repository in which they wish to deposit, where they have a login, access a deposit Web form presented by the repository, fill in the necessary metadata and instruct the repository to upload the content, thereby creating a record linked to the content.

Is this process flexible enough for authors, given the multiple repositories and types of repository available now where they may wish, or be required, to deposit?

Is this process too difficult for authors? More specifically, does it take too long? While it has been shown that the deposit time can reduce to a few minutes for regular repository depositors, slow deposit seems to be a perennial problem, to the extent that most repository deposit now seems to be mediated by administrators. While this can work for relatively low levels of deposit, it is not clear it will scale with anticipated repository growth, certainly not for data and other types of resources now being collected in institutional repositories.

There is also the problem of a disconnect. This can be seen to begin with the disconnect between applications and repository, and is extended by deposit mediation. Thus the real disconnect is between the author and the repository.

Four layer conformance model

“Based on this conformance model, it ought to be possible for new platforms to adopt this approach to repository deposit, starting at level 0 and working upwards.”

From this point it becomes more technical. We have various author tools and we want to connect content created by authors using these tools with at least two types of repository, in this case DSpace and EPrints. Development will involve working with the author tools, the author’s computing desktop and operating system as well as the repositories, so to coordinate this we have a conformance model, currently with four layers.
DepositMO 4-layer conformance model

DepositMO four-layer conformance model

At the lowest level 0 we start with SWORD. This seems reasonable; it’s in the name Simple Web-service Offering Repository Deposit, it’s established and becoming widely used. We noted previously that further work on SWORD is needed, probably leading towards v2.0, to support full CRUD. For now we can use the current version 1.3 with one proviso, that the identifier <atom:id> in the return receipt MUST BE a persistent URI by which the object can be re-located. Thus Level 0 covers the current deposit procedure as supported by SWORD 1.3 with the added specification on what the receipt must contain.

If level 0 is the C in CRUD, level 1 covers the RUD. Having deposited an item we want to Retrieve it for further work, or retrieve it in some other form, or we want to retrieve information about the item, all from the URI specified in level 0. These different forms of the object might be called ‘serialisations’, for example, RSS and atom feeds used for current awareness services are serialisations. We can envisage other types of serialisation containing many varied items of metadata pertaining to that item such as current status (in the publishing lifecycle), number of views, downloads, citations, etc. Similarly, Update and Delete can now be supported by PUT/POST to the URI from level 0.

To support improved serialisations in one of the DepositMO repository platforms, an app store following the model of Apple, the EPrints Bazaar, is helping to abstract many of the current serialisation formats from the core EPrints code, so these can be updated more easily, and new ones added.

One of the reasons for slow repository deposit might be the degree of metadata required to describe the object being deposited. Ultimately this is down to the design of the deposit interface by individual repositories. To an extent this can be ameliorated by SWORD and some automated capture of data from the object, but in the deposit process there still has to be scope to collect additional data required by the repository or to improve metadata. This is the focus of level 2 in the conformance model.

Not only does this require a forms-based process interacting with the repository, it gets more complex when more than one repository is involved, such as multiple repository deposit. As we have seen, supporting multiple deposit is one of the key goals of DepositMO. Typically multi-repository deposit requires a depositor to interact individually with each repository, filling in the same metadata for each manually before being able to deposit, thus taking even more time with duplicate data entry. Clearly this is not ideal. Level 2 conformance is thus not only about allowing single click multi-deposit/update/delete, but about exposing the repository metadata requirements in the client application (using the client’s familiar interface) and hiding duplicate data entry. If the same metadata item (e.g. Title, Authors, Abstract) is requested by both repositories then it is only asked for once.

Where level 2 involves interaction between repository and user, level 3 enables the repository to interact with client applications. Typical operations might include format conversion or feature extraction. There could be many potential applications. This is the more ambitious end of the proposed implementation, and it may not be possible to implement this fully within the project timescale.

Currently this model is being fleshed out in technical documentation aimed at the respective developers. Videos showing early versions of some of these processes in action have been produced, and we will present those in another post.

Based on this conformance model, it ought to be possible for new platforms to adopt this approach to repository deposit, starting at level 0 and working upwards.

Emerging user interface

So that’s the functional description of the features we believe are required to support repository deposit directly from authoring applications. How do we present this to authors? There are services in Microsoft Office applications that could be used for this, but for flexibility and scope we anticipate that initially this might be better served by a separate interface, the author panel. The functionality of this panel has been sketched progressively to reflect the layered model, starting with the basic version, panel 1, that appears at the head of this blog post, through to the more complete version below.

Thus we can see in panel 1 the facility for the author first to select the required repository for deposit. Alternatively, if work is to be done on an existing document, that can be opened from the repository. By panel 4 we can begin to see support for multi-repository deposit.

Sketch of Word author panel: 4, Full functions, with example of multi-repository

Sketch of Word author panel: 4, Full functions, with example of multi-repository deposit

Clearly these are sketches, and many issues remain to be resolved. For example, synchronisation between repositories where editorial buffers moderate deposit – in which case the individual repository Update buttons will take precedence over the Update All button. From an interface perspective, how to present lists of an author’s past papers for updating is another challenge.

It’s exciting designing new features for services that you really believe in. Might these prove a tipping point in spurring wider usage? Could do, but we don’t know yet. We do know that much remains to be done to make all this work optimally for the typical user.

Posted in Uncategorized.

Tagged with , , , , .


Repository deposit turns to CRUD

crudThere’s no more elegant way of putting it really. What’s at the heart of DepositMO? CRUD

Create, Retrieve (or Read), Update and Delete (CRUD), the four basic functions of persistent storage. This is what differentiates the project from current capabilities for remote deposit of content to repository services. I’m using CRUD to write this blog post in WordPress. As I write I have two action buttons to the right of the content pane: Save Draft, and Publish. Both allow me to update the content to the storage server, the difference being whether the post is made public or not.

So there is nothing new about CRUD, except that it is not yet directly applicable to many of today’s digital repositories, which tend to support single publishable item deposit with subsequent versioning should changes be needed or if updated versions are produced. In other words, there is no concept of a repository workspace – or a connected workspace – that allows the simple incremental updates widely supported by other computer authoring services and described by CRUD, or applications that go beyond this.

“For authors it is often suggested that content might be deposited once by filling data in a Web form, but too much effort is involved for the process to be repeated for another repository. Better is multiple simultaneous deposit under the control of the author.”

It became clear that we should do more to emphasise the role of CRUD in this project following a short, branched exchange on the American Scientist Open Access Forum mail list. A recurrent theme on the list had returned – the apparent tension between deposit in central, subject-based repository collections such as arXiv and distributed institutional repositories (IRs). The question is where to deposit; the aim to maximise the volume of open access content.

Currently the answer probably depends on which subject area the research to be deposited covers, e.g. physical or biomedical sciences will most likely deposit centrally, due to the strength of the repositories serving these areas. Other disciplines will deposit institutionally, but on a much lower scale. There is the crux of the open access problem.

The answer proposed, notably by Stevan Harnad on the AmSci list, is for institutions to mandate deposit of published research papers in the local repository. That is, all papers, not just those not already deposited in a subject-based open access repository elsewhere.

For authors it is often suggested that content might be deposited once by filling data in a Web form, but that too much effort is involved for the process to be repeated for another repository. One approach to reduce the perceived workload, given that all these repositories are open and allow open harvesting of data using OAI-PMH, is to deposit once and then harvest the content to other repositories as required.

Another approach might be multiple simultaneous deposit. To save authors effort, data for deposit is entered into a form once, and then copied to the designated repository destinations. One tentative suggestion to emerge in the latest round of list discussion was that deposit to an IR be accompanied with login details for a central subject repository for subsequent deposit. This is fraught with security problems, as we pointed out to the list.

SWORD logoEnter SWORD, for it was suggested that this be the mechanism for sharing deposit and logins in this case. It turns out that the organisation developing SWORD has a case study that looks quite like that proposed.

Separately, this is what arXiv says about using SWORD for deposit:

“This interface is primarily intended for use by conference organizers, proceedings and journal editors, etc. for programmatic bulk upload of pre-vetted material to arXiv for long term archival and dissemination. It is assumed that this is done with the (implied or explicit) approval of the authors of individual contributions or on their behalf.

“Individual authors may prefer arXiv’s interactive web upload for personal use, because it provides better feedback mechanisms, but in principle the deposit API can be used for one-at-a-time deposit to arXiv by individual authors, too. We envision integration of the deposit process into authoring tools for efficient upload from the desktop.”

So third-party deposit is just about acceptable, perhaps, without being wholly endorsed. The last sentence points indirectly towards the work of DepositMO, and Simeon Warner of arXiv was a co-author of the project’s short debut paper at the Open Repositories 2010 international conference (OR10).

As this paper shows, better than deposit-once and subsequent deposit elsewhere by another agent is multiple simultaneous deposit under the control of the author. It turns out that SWORD has this covered as well.

In fact, there are quite a few SWORD implementations connecting different applications (sources) and repositories (destinations). If you look closely, one of those implementations listed is Microsoft Article Authoring Add-in for Word 2007/2010 – allows repository deposit direct from Word. Within DepositMO we have made some claims about enabling repository deposit from popular applications such as MS Office, and in the project we shall be working with Microsoft to enhance this tool.

Have we made the USP for DepositMO clear in the documentation to date? It’s not SWORD, or deposit or even multiple deposit, or deposit from specified applications. The answer begins with CRUD.

Among this welter of deposit applications, you are probably asking what exactly will be DepositMO’s unique contributions? No. Well I was. At least, I was beginning to wonder if we had made our USP clear in the documentation to date. It’s not SWORD, or deposit or even multiple deposit, or deposit from specified applications. The answer, as we have already indicated, begins with CRUD.

The project proposal talked of ‘an effective culture change mechanism’. That’s a wider issue for another time. On more technical issues the proposal describes the aim to ‘extend the capabilities of repositories to exploit desktop and authoring environments’. More specifically it refers to components for the Microsoft Office authoring environments and enhanced SWORD interaction.

No reference to CRUD-like features here. Nor in the OR10 paper – at least, not using these terms – but the direction is clearer. The paper starts by specifying the motivations for multiple deposit.

Today the use case for repository deposit is write the content with a typical computer desktop application and save it somewhere, but not in the repository yet – the equivalent of the blog Save Draft button. When the work is complete it can be packaged and delivered to the repository using SWORD, the same as the Publish function in the blog. The OR10 paper puts it like this:

“Currently SWORD is a one-way protocol, meaning that a repository can either accept a record, or reject it; there is no middle ground. Adding a lightweight mechanism to desktop applications to enable negotiation on what is sent in a SWORD package would go some way to bridging this gap.”

This facility should become available in SWORD v2.0, and developers from the project are contributing directly to this activity since there is a vested interest in the outcome.

It would open new deposit possibilities. An admittedly complex and possibly unusual, but nevertheless feasible, case is suggested in the OR10 paper where an author of a research paper pulls information from other linked sources, such as a contacts list and a citation manager:

“At the point the document is submitted all this valuable information (such as author identities disabiguated by email address and structured citation listing) is lost.”

All this is more sophisticated than CRUD and points the way forward, but first implementing CRUD features using SWORD as a mediator between applications and multiple repositories would represent serious progress.

What might follow from this is ‘culture change’ or, more immediately, dialogue between author and repository. The OR10 paper puts it more prosaically:

“Our proposal is to enable a simple yet powerful set of negotiations to occur between the desktop application and multiple repositories such that a single familiar submission workflow (in the style of the author’s application) can be presented to the user.”

So as a starting point the aim in DepositMO is to activate the repository as a storage service for the iterative Save Draft action in an authoring application.

In the next post we will consider the practical implications of this approach and look at an early sketch of a possible interface design.

Just as long as we all understand and can share in what is new and where we are heading. Otherwise we may just find ourselves talking about a more familiar form of crud.

Crud. Bin

Posted in Uncategorized.

Tagged with , , , , , .


Repository effectiveness: how DepositMO can help

Email inbox button, by smilla4Repository interfaces do not make it obvious enough, attractive enough or easy enough to perform their primary function, to enable authors to deposit content. This was the suggestion that began a recent discussion on the American Scientist Open Access Forum, an email list forum for discussing open access and repositories, and prompted this response from Les Carr of the DepositMO project. The full context for this discussion can be found in the list archives.

On 18 Sep 2010, at 21:59, Velterop wrote:

  • Make a repository easy to find (a Google search for “University of X repository” more often seems to produce a link to an article or press release about the repository than a link to the repository itself, at least on the first few pages of the search results – repositories often have names or acronyms that make them difficult to find if you don’t know the name)
  • Draw attention, unambiguously and very clearly, on the repository home page, to the possibility of submitting a paper/manuscript (e.g. a brightly coloured “submit now!” button)
  • Make the deposit procedure very, very easy and intuitive. Involve UX experts where possible.
  • Make deposit the *prime* focus of the repository. Repositories and their contents can be searched in a variety of ways and via many routes, but submission of articles can only take place via the repository’s own web site.

On 19 Sep 2010, at 09:45, Leslie Carr wrote:

I’d like to take this opportunity to mention the new JISC DepositMO project whose aim is to increase the ease of deposit into repositories chiefly by allowing direct deposit from word processors, office programs and the computer desktop (“save as…” and “send to…” directly into EPrints or DSpace). Although the repository’s web interface should be a useful and advantageous environment for the author as well as the reader, the fact is that depositing is An Extra Thing to add to the author’s workflow, and it might help to woo some recalcitrant professors if it appeared to be the same thing as “saving a new copy” and it could be achieved in the familiar interface of Microsoft Word.

I don’t think that technology changes alone will stimulate more Self Archiving (improve the repository! make it more friendly! make it faster! make it more useful!) There has to be a combination of social, management and technological advances all pressing in the same direction. Make Open Access policies mandatory, make open access practices a key part of your institutional business activities and make open access technology as useful as possible.

Posted in Uncategorized.

Tagged with , , .


Training Objectives

The training objectives of the DepositMO project are to:

  • Develop training materials for one-to-one training
  • Develop a Virtual Deskside Coaching Kit to train the trainers
  • Develop a sustainable programme of evaluation to supplement the existing e-prints deskside coaching service

Southampton University’s E-Prints service currently offers deskside coaching, which provides an on-demand, one-to-one training service to users.  The DepositMO training programme will be embedded into this deskside coaching service.

During the DepositMO project, we will create training materials to support one-to-one training, and develop a Virtual Deskside Coaching Kit, which will initially be used to train the trainers.  This kit will be tested on trainers in an iterative process, updating and improving the material based on feedback and on developments in the software.  Ultimately the Virtual Deskside Coaching Kit could also be used as an alternative or supplement to face-to-face coaching.

Southampton University currently runs two deskside coaching services, one for e-prints and one for subject-based queries.  The subject-based deskside coaching service is centrally co-ordinated; requests are forwarded to appropriate staff, and are then followed up for evaluation once the training has been completed.  We will use the evaluation model of the subject-based service to assess the impact of the training programme on users of the repository.  There is also the potential to transfer the administration of the e-prints deskside coaching service to the subject-based deskside training, in order to benefit from a more formalised administrative approach while retaining the specialist knowledge of the e-prints deskside team.

Posted in Uncategorized.

Tagged with , , , , , , .