Skip to content


Nobody needs a 303

Ian Davis (@iand) has just written this rather challenging blog post about the future of the 303 redirect. He’s onto something, but the idea needs work and I have an idea…

Background

A key point of the semantic web is that you can use URIs (which look exactly like URLs) to represent concepts which are not resolvable into a sequence of 1’s & 0’s. A URI represents a single thing, be it an HTML document about Rice Pudding, and RDF document about Rice Pudding or the concept “Rice Pudding” itself.

Concept: http://dbpedia.org/resource/Rice_Pudding
RDF: http://dbpedia.org/data/Rice_Pudding.xml
HTML: http://dbpedia.org/page/Rice_Pudding

If you use an HTTP request to ask for the concept, it can’t serve you rice pudding over a TCP/IP stream so rather than tell you “200 OK” (and give you pudding) or “404 Not Found” it tells you “303 See Other” and gives a redirection to the RDF document. If it’s being extra clever, it listens to what format your client prefers (your HTTP client expresses a preference when it makes a request) and redirects you to a URL with the most palleteable data format for you.

(Side note, in many ways HTTP response 418 might make more sense in this case if there was no document available).

To watch this in action try (on a Linux command line, or Terminal on OSX):

curl -v http://dbpedia.org/resource/Rice_Pudding

then

curl -v -H’Accept: application/rdf+xml’ http://dbpedia.org/resource/Rice_Pudding

The problem is that this is a pain to configure on a webserver, and makes things complicated in general. Also when you ask a person “what’s your URI?” they stare at you blankly. It’s non-trivial to get URIs out of linked data experts, if we want Linked Data to take off, it must be achieveable by people who don’t really care.

Enter Ian Davis

Ian has just written this blog post: http://iand.posterous.com/is-303-really-necessary. I really want to disagree, just to make the Fatboy Slim refererence, but I think he’s onto something. He is, if I’ve understood correctly, suggesting that when you resolve a URI you should expect to get a “200 OK” and a document about that subject. This does make things more simple, but means that the URI for a document is now different for the URL of that document.

It’s going in the right direction, and really helps solve the problem of how to ask a layman for a URI, but I’ve some ideas of how to make it work, they could either or both be used.

Making it clear what’s going on in the HTTP response

Add a new HTTP return code “208 Metadata” which indicates what you are getting is data about the resource you requested rather than the resource itself. This could also be achivied by putting specific triples in the returned document, but this feels much cleaner. However it still has the issue of requiring special server configuration.

The thing Ian is going for (correct me if I’m wrong) is to allow someone to just place an RDF document in a directory and have it served over the web and you’re done. That’s OKish, but to make apache serve it with the correct mimetype it will need a ‘.rdf’ suffix, which means making a URI for ricepudding like http://data.totl.net/puddings/ricepudding.rdf which feels wrong to me. You can make it work pretty easily with PHP;

curl -v http://graphite.ecs.soton.ac.uk/experiments/208.php

… but now I’ve got a .php suffix! I can’t find any handy apache .htaccess config that lets me set the HTTP response code for a directory, the best I can find quickly is

ForceType text/n3
Header set Semantic Metadata 

Which would at least mean your could create a directory of suffixless files, and indicate to an aware semantic header aware tool that this was not the requested resource, but rather metadata about it.

curl -v http://graphite.ecs.soton.ac.uk/experiments/setmime/ricePudding

Extending the URI syntax to indicate “Subject of URI”

I really quite like this one; which is to add something to the way you write the URIs. Put a symbol, let’s say “!”, at the start of the URI to indicate it represents the subject of the document at the given URL. This feels a little like the use of & in C code.

<!http://users.ecs.soton.ac.uk/cjg/> foaf:name “Christopher Gutteridge” .

<!http://users.ecs.soton.ac.uk/cjg/> foaf:homepage <http://users.ecs.soton.ac.uk/cjg/> .

I really think this could work! Most semantic systems just treat URIs as strings so this in no way breaks their ability to reason and process data. The only time it would matter is when they come to resolve the URI. Resolving the URI would not work for clients that didn’t understand the syntax, but that’s not a big deal, they’ll be easy to fix and just won’t get extra data — their loss.

I’ve done a couple of experiments to see how the ARC2 parser copes with this;

The result isn’t great. It is valid, but it’s treated the URI as relative to the current document. So we can’t really put anything at the front, but we could put something on the end… What’s a character which is in basic ASCII but explicitly not legal at the end of a URI? (we want one which can never conflict with a real URL, so things like “#topic” or “!” are out as they could be legal URLs)…

% as the suffix?

That’s really really not legal to put by itself in a valid URI as it must be followed by two hex digits. So let’s try using <http://users.ecs.soton.ac.uk/cjg/%> to represent the subject of that page (ie. me)

That works much better, but I can no longer put it into a namespace definition as it’s appended rather than prepended. If a naive client tries to resolve it, it will get back a 400 HTTP response, a smart client can understand to strip the trailing %. A nice webserver might add a plugin which spots requests ending in % and if there is a valid URL without the % send a 303 See Other, so that would enable most existing RDF libraries to keep working, unless they were super touchy about the URI being valid before requesting it.

One thing that doesn’t work is using the % suffix in predicates in RDF+XML as that requres you to write <test:foo%>Testing</test:bar%> which is not valid XML. You also can’t use it in the shortcut for class names, eg. <test:Bar% rdf:about=”…”> but that’s not a problem as you just describe it using <rdf:type …>.

In this way, a URI for the Web Science Trust is just <http://www.webscience.org/%>

Using ‘#’ as a suffix?

One other option would be to describe a URI ending in # as referring to the subject of the document without the ‘#’. This has some big advantages over using ‘%’ as it is a legal URI, still, and like any # URI, it will resolve to the source document.

I now that foo#bar indictes a sub part of foo, in XML/HTML it’s the element with id=’bar’. However, what does foo# mean? The element with id=”? or can we safely set a standard that this is the _subject_ of the URL without the #?

You still won’t be abe to use it RDF:XML predicates as <foo:bar#> isn’t legal.

Using this idea, the URI for the Web Science Trust is <http://www.webscience.org/#> which is rather elegant.

UPDATE: Dammit, Steve Harris has pointed out that URIs of the format <http://example.com/foo#> are used to identify namespaces. Dang.

Posted in RDF.


5 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Ian Davis says

    Apache can deal with the file extension ugliness just fine. Add Options +MultiViews to your .htaccess and Apache will respond to /foo as well as /foo.rdf

    I still don’t get your point about documents having URIs different to their URLs. The document URL is its URI – they are synonymous terms. URL as a term is actually deprecated in favour of URL so I find what you are saying rather confusing.

    • Christopher Gutteridge says

      OK. I’ll see if I can explain myself better. If I’ve understood you correcly, your approach means a webserver will say OK and return a document but that document will *not* be the concept identified by the URL.

      So if I have some RDF which I’ve just resolved from
      http://example.com/id/toucan
      How do I make statements about it? ie. who wrote it and what the copyright of it is.

      I think maybe all we need to do is to break the link between a URL and the data you get when you resolve it. Basiclally, what you’re saying is that we should lose the assumption that if you resolve URI X then the document you get back with 200 OK is not necisarily identified by the URI X.

  2. Damian says

    I’ve suggested content-location before. So you could resolve http://example.com/id/toucan and get the content-location http://example.com/id/toucan.rdf (or whatever). Make the statements about that URI.

  3. Ian Millard says

    I don’t really want to get involved with a 303 or not to 303 argument, other than to point out that a huge amount of effort went into agreeing the principles of linked data so that it fits entirely within all specifications and standards.

    No special client slide behaviour is expected, and nothing breaks.

    A single URI should not be used to identify or refer to more than one thing.

    With non-information and information resources you can clearly distinguish between metadata concerning the thing/object/concept, and the document(s) which describe that thing/object/concept.

    I believe using suffix-less (for the non-info resource) and format-based suffixes (for one or more information resource formats) is the easiest and clearest way of explaining this to novice users.

    Trying to infer special behaviours based on munging extra special-case characters is a bad idea IMHO, URIs are supposed to be opaque.

    Proposing things that produce invalid URIs is totally out of the question, as is anything that requires a browser/client to behave differently.

    Note that it is not uncommon to use # to distinguish between a NIR and IR, most notably in foaf profiles (ie #me is the non-info resource, without fragment is info resource). You don’t need any redirects, but again I think this is a bit of a hack as it is difficult to have more than one information resource.

Continuing the Discussion



Some HTML is OK

or, reply to this post via trackback.