Skip to content

Local or Canonical URIs?

Here’s a little question; when should you ‘mint’ a URI for something and when should you just use a canonical one?

The idea of a canonical URI is that which is defined by the agent with authority for that resource. Very obvious in terms of organisations, people, journals and events. Less obvious for places. Really not obvious at all for income tax or rice puding.

The University of Southampton does not yet have a formal URI for itself, but lets make one up for this example;

This question applies to any RDF dataset with resources with resolvable URIs, but I’ll use EPrints as the example as I know it best.

In an EPrints repository, a passing reference to an organisation (event, person, place..) has a URI generated by hashing the data about it. For example; in this RDF record about a thesis it has generated such a URI for the Univesity of Southampton, the organisation the thesis was submitted to. This URI is:

… which if resolved will tell you every fact the EPrints server knows about the University of Southampton.

So here’s my dilemma

Assume that I have a reliable way to know that and are the same thing. (possibly a human process).

Option One
We add an owl:sameAs predicate to the EPrints RDF to point at the cannonical university URI. As we control the university URI too we can make it describe itself as sameAs the eprints URI.

This is currently what we do for people in ECS as we can reliably identify our own staff. Have a look again at that RDF record and you’ll see the sameAs link right at the bottom.

Pros: URI on the eprints server is resolvable to get all information it holds about that resource.
Cons: Creates a new URI for a resource with a known existing URI. Requires library staff to map onto other URIs inside (and maybe outside) their university.

Option Two
Rewrite the URIs in eprints so that if it replaces certain ext- style URIs with their known canonical version. So that in the RDF always refers to its preferred URI. The ext- style URI is still resolvable and any time a preferred URI is referenced in an RDF document a seeAlso link will be included to the ext- URI which will redirect to more information on the resource.

In this way an RDF record would look something like this:

        dct:creator <> ;
	dct:issuer <>;
	dct:title "Social Niche Construction: Evolutionary Explanations for
 Cooperative Group Formation"^^xsd:string .
	dct:hasPart epid:org/ext-fc4fc6f35c57c793ab9034d542fa9406;
	foaf:name "University of Southampton"^^xsd:string;
	rdf:type foaf:Organization ;
        rdfs:seeAlso epid:org/ext-6aadb868f6af7a581b985086216d8d28 .
	foaf:name "Simon T. Powers"^^xsd:string;
	foaf:type foaf:Person;
        rdfs:seeAlso epid:person/ext-24320 .

Pros: Reduces the proliferation of URIs, and makes the data more truely and usefully Linked.

Cons: Not all resources will have known preferred URIs so they will probably still use the ext- style URIs. This would be inconsistant but could be mitigated by still including the rdfs:seeAlso predicate on themselves so that every record had the same structure.

Option Three
Use an external service to fight your semantic battles for you. To every entity add a seeAlso to an external service. eg.

	foaf:name "University of Southampton"^^xsd:string;
	rdf:type foaf:Organization ;
        rdfs:seeAlso <
uri=> .

	foaf:name "Simon T. Powers"^^xsd:string;
	foaf:type foaf:Person;
        rdfs:seeAlso <
uri=> ;
	owl:sameAs <> .

The theoretical magicbibservice would return a list of URIs it believed were the same as the one you link to it referencing.

Pros: Can be added to the code once and it’ll just get on with it. Can be used in conjunction with options one or two.

Cons: Doesn’t stop proliferation of URIs. Depends on 3rd party for quality and reliability. No such service currently exists.


What EPrints is doing out of the box is OK but the only way to make it ‘linked’ out of the box is option 3, which relies on a service I’ve made up, so not really an option, yet.

I think we should recommend option 2 to be used where possible. It would be pretty easy to configure for a resource type (people, places etc…). You would just override the URI generator and add a seeAlso triple.

Anyone got any alternate ideas or comments on how this would improve/harm the usefulness of the data?

Posted in Best Practice, RDF, Repositories.

4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Dave Dupplaw says

    Option 1 has to be a no-no. It relies too heavily on people making the links back and forth and you know they just won’t. Option 2 and option 3 combined sounds the best to me. I think if there is an obvious canonical URI for an entity then it should be used. (I would say Geonames was that reference for places, btw). Option 3 is only good if there are such services, but at least it could be added in the future should a reliable one arise.

  2. Christopher Gutteridge says

    One thing I’m sure of, no system which relies on humans entering URIs will ever scale.

    The way do identify organisations is almost certainly going to be their homepage. While a bNode would be more elegant modelling, I’m considering the idea of predicates of the form ‘publishedByOrganisationWithHomepage’ or ‘createdByThePrimaryTopicOf’.

  3. Hugh Glaser says

    Just bumped into this Chris 🙂
    In fact, what you describe as Option 3 is exactly what does (and was doing at the time you wrote this!)
    So you can replace you magic with and it just works:
    And you could have a server restricted to any domain or site.

Continuing the Discussion

  1. Rant About URL Shorteners… « OUseful.Info, the blog… linked to this post on October 26, 2010

    […] sort of loosely related, ish, err, maybe 😉 Local or Canonical URIs?. Chris (@cgutteridge) also made the point that “It’s vital that any twitter (or […]

Some HTML is OK

or, reply to this post via trackback.