Skip to content


What If a URL wasn’t always a URI?

What if a Uniform Resource Locator (URL) wasn’t automatically assumed to also be a Uniform Resource Indicator (URI)?

In the current URI/URL system if you resolve <http://example.org/xyz> and get 200 OK and an English HTML document, you assign that document the URI <http://example.org/xyz>. Where it gets weird is that if you use content-negotiation and get back an XML file in German, that XML file also has the URI <http://example.org/xyz> WHAT THE HELL?

I’ve been following (well, attempting to follow) the discussion over on the WWW-Technical Architecture Group (WWW-TAG) and I’m not sure if this idea is exactly what they are discussing but it’s got me quite excited.

The web is vague, and reading meaning is dangerous. RDFa users tend to use the current document URI as an identifier for what the document is about. This decoupling would allow this, but it means that <> no longer means “current document” but rather “the identified by the URI used to retrieve this document”… I’m sure there’s a better way to phrase that…. maybe “this thing”.

To assign an explicit URI to a document being returned, you would use the HTTP headers. Without it all you can safely state is that the document was once returned by resolving said URI as a URL. It might not be there in 5 minutes time…

Such HTTP Link headers could also list a URL to discover the description of the current thing, using the same system as HTML <link>, which is great, but only in HTML documents.

Example 1.

<http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.jpg> returns a JPG of a photograph of the Mona Lisa. If the author wanted to add some metadata, in the HTTP header he would say that the document returned does indeed have the URI <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.jpg> and is described by <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.rdf> or <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.json>.

Example 2.

<http://users.ecs.soton.ac.uk/cjg/> returns an HTML document about me, but I decide to use it as my URI. In the HTTP Link header (or just in the <html> Link) I just need to say that the current URI is a Person called Chris. Which is what RDFa things tend to do anyway. Chris is identified by <http://users.ecs.soton.ac.uk/cjg/> and described by the document located by <http://users.ecs.soton.ac.uk/cjg/>. Content negoitation now makes sense as I will be described by any document located by that URL.

If falls apart if you save the document somewhere else, of course, as then the location on your local hard-drive file:///home/cjg/Documents/cjg.html also becomes an identifier for me so long as the file is in that location. But file:// URIs are not Cool URIs.

If this became the way of the web, the other problem would be that you can no longer safely assign a URI to a document you’ve downloaded so all the triple stores would get sad when they did

LOAD <http://example.org/foo> .

As without a handy Link: header assigning the document a URI, they won’t know what URI to assign the graph. But that is already full of broken as a URI should not really identify 2 things, and a graph is not the same as the thing it describes. So the simple solution for SPARQL is to make GRAPH a URL which is the source of the document.

Any how, this is all speculation. I’m going to wade back into the mailing list discussion and see if I can get a bit more grasp of what they’re talking about, wish me luck…

Posted in HTTP, RDF, Triplestore.


One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Neil Crookes says

    I always ensure my documents have unique Uris so my url schema for any site or app or piece of content, whether it’s static or dynamically generated takes the form

    Protocol://hostname/path and path includes locale information if I’m serving multi-locale content, and if it’s available in different formats, an extension. E.g

    /en-GB/posts
    /en-GB/posts.rss

    I think it’s a good approach that you can apply to all documents and applications. Much safer than relying on content-type and charset request headers.



Some HTML is OK

or, reply to this post via trackback.