Sep 28

Now that the season of mists and mellow fruitfulness is upon us, most of our colleagues are busy preparing for the new intake of undergraduates. We’ve also been busy on dotAC, with a number of parallel threads coming together nicely, and some interesting discussions with our counterparts on related projects.

The frontend for the coreference service now resembles something that we’re happy to inflict on our trial users, thanks to Marcus’s sterling work. In the screenshot below, you can see a number of bundles (groups of possibly-related resources) displayed as connected graph components. Each node represents a resource with a unique URI, and the edges between resources are a representation of the equivalence statements that allow us to stitch together the scattered parts of the Web of Linked Data.

In this example, we’ve retrieved a number of bundles that relate to people with the surname Williams. The bundles you see are the result of automated processes that have identified likely coreferent resources; unfortunately, bitter experience has taught us that such automated techniques are defeasible, so it’s frequently necessary for people to check over the data for inconsistencies. In the centre of the screen is a bundle of resources that we think represent Sandra Williams. The resource labelled “11” is used as the canonical resource for Sandra, and is shown as larger than the rest of the resources. However, this bundle contains a ringer: the resource labelled “8” refers to someone other than Sandra, so we’re disconnecting it from the rest of the bundle.

Coreference Service UI

When we’re happy with the state of the bundles in the editor, we can synchronise our session back with the CRS, for other applications (including the explorer interface that we’re building on top of RKBExplorer) to use.

Last week, Nick travelled up to Bristol for a workshop at ILRT organised by Nikki Rogers of the ResearchRevealed project. Also present were members of BRII and Readiness4REF – the link between these projects is that we’re all looking at using CRIS data in CERIF format.

Slideshare plug-in provided by rob

The experiences of the other projects seem rather to mirror our own; plenty of people are talking about CERIF, but very few seem to be using it, partly due to the lack of documentation, and partly due to perceived weaknesses in the underlying model (as implemented in the EuroCRIS-supplied database schema). The likelihood that CERIF or similar ends up being used for the REF now looks increasingly remote; there probably isn’t sufficient time for both implementation and the necessary shakedown period after the HEFCE guidance is issued in early 2010.

This said, the workshop was very helpful. Despite its flaws, CERIF is addressing the right domain, and there was interest in mapping CERIF’s model onto other formats, particularly those based on RDF. Ben O’Steen of BRII gave a presentation that mirrors our design decisions for this mapping; rather than invent yet another ontology, best practice (at least as far as the Linked Data Web is concerned) is to reuse fragments of whatever widely-used ontologies seem to fit. Ben’s name for this – Frankenstein ontologies – is apt enough that we’ve been using it ourselves.

Sep 03

Hello from Manchester, and the JISC Rapid Innovation in Development workshop in Manchester (or to be more precise, the City of Manchester Stadium in all its sky blue glory).

Progress on dotAC so far has been good, but we’re aware that we’re still laying a number of foundations in parallel for the final service.

We’ve joined EuroCRIS, the organisation that develops the CERIF data format (we’ll be posting a commentary and critique of CERIF in the near future) and have developed a mapping from CERIF to a selection of common Semantic Web vocabularies (FOAF, BIBO and Dublin Core for a start). We’re now working on a translator that will take the XML serialisation of CERIF and produce an equivalent RDF description; it isn’t clear how many funding bodies will be exporting data in CERIF within the lifetime of the project (it’s being used within the EC in places, NERC have provided us with some data, and EPSRC have said that they’re evaluating it), but we intend to be ready for them when they start.

On the coreference side, we’re working on a graphical frontend for our coreference service that should make it easier for repository managers to identify the scattered instances of a given researcher (our preliminary discussions with librarians at Southampton suggests that this alone would be a useful outcome for dotAC), which can then feed back into the repository.

On the repository side, we’re finalising our RDF export from EPrints 3 (predominantly to the same ontologies used for CERIF).

We’ll be publishing further details of these deliverables both here and on the main project website as they’re completed – it’s getting to the point where these separate threads are naturally converging.