Ian Davis (@iand) has just written this rather challenging blog post about the future of the 303 redirect. He’s onto something, but the idea needs work and I have an idea…
Background
A key point of the semantic web is that you can use URIs (which look exactly like URLs) to represent concepts which are not resolvable into a sequence of 1’s & 0’s. A URI represents a single thing, be it an HTML document about Rice Pudding, and RDF document about Rice Pudding or the concept “Rice Pudding” itself.
Concept: http://dbpedia.org/resource/Rice_Pudding RDF: http://dbpedia.org/data/Rice_Pudding.xml HTML: http://dbpedia.org/page/Rice_Pudding
If you use an HTTP request to ask for the concept, it can’t serve you rice pudding over a TCP/IP stream so rather than tell you “200 OK” (and give you pudding) or “404 Not Found” it tells you “303 See Other” and gives a redirection to the RDF document. If it’s being extra clever, it listens to what format your client prefers (your HTTP client expresses a preference when it makes a request) and redirects you to a URL with the most palleteable data format for you.
(Side note, in many ways HTTP response 418 might make more sense in this case if there was no document available).
To watch this in action try (on a Linux command line, or Terminal on OSX):
curl -v http://dbpedia.org/resource/Rice_Pudding
then
curl -v -H’Accept: application/rdf+xml’ http://dbpedia.org/resource/Rice_Pudding
The problem is that this is a pain to configure on a webserver, and makes things complicated in general. Also when you ask a person “what’s your URI?” they stare at you blankly. It’s non-trivial to get URIs out of linked data experts, if we want Linked Data to take off, it must be achieveable by people who don’t really care.
Enter Ian Davis
Ian has just written this blog post: http://iand.posterous.com/is-303-really-necessary. I really want to disagree, just to make the Fatboy Slim refererence, but I think he’s onto something. He is, if I’ve understood correctly, suggesting that when you resolve a URI you should expect to get a “200 OK” and a document about that subject. This does make things more simple, but means that the URI for a document is now different for the URL of that document.
It’s going in the right direction, and really helps solve the problem of how to ask a layman for a URI, but I’ve some ideas of how to make it work, they could either or both be used.
Making it clear what’s going on in the HTTP response
Add a new HTTP return code “208 Metadata” which indicates what you are getting is data about the resource you requested rather than the resource itself. This could also be achivied by putting specific triples in the returned document, but this feels much cleaner. However it still has the issue of requiring special server configuration.
The thing Ian is going for (correct me if I’m wrong) is to allow someone to just place an RDF document in a directory and have it served over the web and you’re done. That’s OKish, but to make apache serve it with the correct mimetype it will need a ‘.rdf’ suffix, which means making a URI for ricepudding like http://data.totl.net/puddings/ricepudding.rdf which feels wrong to me. You can make it work pretty easily with PHP;
curl -v http://graphite.ecs.soton.ac.uk/experiments/208.php
… but now I’ve got a .php suffix! I can’t find any handy apache .htaccess config that lets me set the HTTP response code for a directory, the best I can find quickly is
ForceType text/n3
Header set Semantic Metadata
Which would at least mean your could create a directory of suffixless files, and indicate to an aware semantic header aware tool that this was not the requested resource, but rather metadata about it.
curl -v http://graphite.ecs.soton.ac.uk/experiments/setmime/ricePudding
Extending the URI syntax to indicate “Subject of URI”
I really quite like this one; which is to add something to the way you write the URIs. Put a symbol, let’s say “!”, at the start of the URI to indicate it represents the subject of the document at the given URL. This feels a little like the use of & in C code.
<!http://users.ecs.soton.ac.uk/cjg/> foaf:name “Christopher Gutteridge” .
<!http://users.ecs.soton.ac.uk/cjg/> foaf:homepage <http://users.ecs.soton.ac.uk/cjg/> .
I really think this could work! Most semantic systems just treat URIs as strings so this in no way breaks their ability to reason and process data. The only time it would matter is when they come to resolve the URI. Resolving the URI would not work for clients that didn’t understand the syntax, but that’s not a big deal, they’ll be easy to fix and just won’t get extra data — their loss.
I’ve done a couple of experiments to see how the ARC2 parser copes with this;
The result isn’t great. It is valid, but it’s treated the URI as relative to the current document. So we can’t really put anything at the front, but we could put something on the end… What’s a character which is in basic ASCII but explicitly not legal at the end of a URI? (we want one which can never conflict with a real URL, so things like “#topic” or “!” are out as they could be legal URLs)…
% as the suffix?
That’s really really not legal to put by itself in a valid URI as it must be followed by two hex digits. So let’s try using <http://users.ecs.soton.ac.uk/cjg/%> to represent the subject of that page (ie. me)
That works much better, but I can no longer put it into a namespace definition as it’s appended rather than prepended. If a naive client tries to resolve it, it will get back a 400 HTTP response, a smart client can understand to strip the trailing %. A nice webserver might add a plugin which spots requests ending in % and if there is a valid URL without the % send a 303 See Other, so that would enable most existing RDF libraries to keep working, unless they were super touchy about the URI being valid before requesting it.
One thing that doesn’t work is using the % suffix in predicates in RDF+XML as that requres you to write <test:foo%>Testing</test:bar%> which is not valid XML. You also can’t use it in the shortcut for class names, eg. <test:Bar% rdf:about=”…”> but that’s not a problem as you just describe it using <rdf:type …>.
In this way, a URI for the Web Science Trust is just <http://www.webscience.org/%>
Using ‘#’ as a suffix?
One other option would be to describe a URI ending in # as referring to the subject of the document without the ‘#’. This has some big advantages over using ‘%’ as it is a legal URI, still, and like any # URI, it will resolve to the source document.
I now that foo#bar indictes a sub part of foo, in XML/HTML it’s the element with id=’bar’. However, what does foo# mean? The element with id=”? or can we safely set a standard that this is the _subject_ of the URL without the #?
You still won’t be abe to use it RDF:XML predicates as <foo:bar#> isn’t legal.
Using this idea, the URI for the Web Science Trust is <http://www.webscience.org/#> which is rather elegant.
UPDATE: Dammit, Steve Harris has pointed out that URIs of the format <http://example.com/foo#> are used to identify namespaces. Dang.


