So there’s a call for suggestions to fix/replace http range14.
If you’re not familiar with this. The basic sitution is that we use URIs to represent real world things, and if you resolve them they give a “303 See Other” to redirect you to a document of interest, presumably about the subject you asked about.
eg.
I, the person, am identified by URI: http://id.ecs.soton.ac.uk/person/1248
My profile page, an HTML document, is identified (& located) by URL: http://www.ecs.soton.ac.uk/people/cjg
My FOAF Profile, an RDF+XML document containing machine-readable facts about me is identified (& located) by URL: http://rdf.ecs.soton.ac.uk/person/1248
If you resolve my URI in a web browser, it’ll pop up my profile page. You can see how by typing this on the UNIX or OSX terminal:
curl -I http://id.ecs.soton.ac.uk/person/1248
The response will be more or less like this:
HTTP/1.1 303 See Other Date: Wed, 29 Feb 2012 23:52:16 GMT Server: Apache X-Powered-By: PHP/5.3.3 Location: http://www.ecs.soton.ac.uk/people/cjg Connection: close Content-Type: text/html; charset=utf-8
Which means go look at the URL in the “Location:” bit. The 303 indicates “See Other” rather than the normal “Moved”, which implies it might not actually be the same thing.
For added complexity, if you tell the web server you prefer to get RDF+XML documents, by typing
curl -H'accept:application/rdf+xml' -I http://id.ecs.soton.ac.uk/person/1248
You get back
HTTP/1.1 303 See Other Date: Wed, 29 Feb 2012 23:54:39 GMT Server: Apache X-Powered-By: PHP/5.3.3 Location: http://rdf.ecs.soton.ac.uk/person/1248 Connection: close Content-Type: text/html; charset=utf-8
This is bloody hard for people to get their heads around, and not obvious unless you really grok how the web was designed. However, for me, the real screw-up in all of this is using “http:” to represent something which isn’t a document… It’s not like we weren’t already using http: https: gopher: ftp: urn: mailto: tel: etc. (OK, nobody remembrs gopher)
I think it’s daft to use the same protocol to unqiuely identify real-world objects AND documents on the web. I have to explain this again and again to each person learning RDF, and it won’t take off if people can’t figure it out for themselves, like HTML, JSON, XML etc.
Schema.org
If you’ve not yet seen schema.org; it’s a website which presents a schema for information friendly to search engnges. It mostly doesn’t idenify ‘things’ at all, just defines a structure and literal properties of items in that structure (eg. start time of an event, name of a person). I hear it uses URLs to identify things which isn’t as crazy as it sounds if you define the relationships correctly. eg.
<http://www.soton.ac.uk> *hasMember* <http://www.ecs.soton.ac.uk/person/cjg> .
That’s an utterly reasonable statement if *hasMember* is defined as meaning “the group or organization which is the primary topic of the first document, has a member which is the primary topic of the second document”. It’s ugly, but entirely semantically sane. In slightly more formal terms;
?X *hasMember* ?Y
implies
?X foaf:primaryTopic ?X-topic .
?Y foaf:primaryTopic ?Y-topic .
?X-topic foaf:member ?Y-topic .
My proposal; infra:
UPDATE 3: So it turns out that just like my ‘primaryTopic.net’ namespace idea, this is also an idea that’s been suggested before, in far more careful detail: tools.ietf.org/html/draft-masinter-dated-uri-10
So my analysis stands, but as regards the tdb: (thing-described-by) system described in the above link.
and I admit I’ve not got the 10 years of literature review as some of the community, but can’t we just do:
infra:http://www.ecs.soton.ac.uk/person/cjg
and specify that http://www.ecs.soton.ac.uk/person/cjg is assumed to be a document about that thing, and it could optionally content-negotiate if it wants.
Effectively, there’s a standing definiation that <XYZ> foaf:primaryTopic <infra:XYZ> .
NOTE: My first draft used “resource:” not “infra:” but that was very muddling to type in an RDF+XML document. I don’t really care about the choice of name, just the approach.
Pros:
- Visible distinction between Document & Non-Information URIs
- Does not invalidate http: URIs, just provides a better method
- Allows URIs to be created from popular websites without formal buy in; eg. infra:http://www.imdb.com/title/tt0133093/ or infra:http://xkcd.com/327/
- Should not break existing software, such a triple stores.
- Allows a bridge to the schema.org approach (refer to things by a URL which describes them)
- You can still use content negotiation on the URL to give back HTML or RDF.
- Provides similar functionality to “&” and “*” operators in C
- Allows existing URLs to be cleanly used as identifiers in a semantically correct way.
- Works with # elements in documents, eg. infra:http://en.wikipedia.org/wiki/University_of_Southampton#Malaysia_Campus
Cons:
- Will require some trivial changes to existing systems to allow them to resolve these URIs into additional data.
- Current URIs may still confuse new users as they start with http://
- It is entirely reasonable to have infra:infra:infra:http://totl.net/ but that’s going to tramatise anybody who didn’t absorb C pointer de-referencing through the skin in their formative years.
- Obviously, my abilitiy to identify cons is limited by proximity.
- People might just slap “infra:” on the front of everything, even standard URIs.
- “it’s not great for sites with high traffic; tends to encourage conflation with REST. Be nice if could message intent.” – from @derivadow
I doubt I’ve got the whole picutre, but in that statement lies much of the problem. I’m now definitely an expert, and I still don’t ‘get’ the subtle issues. If the linked-data-web is going to work we’ve got to make it workable by the hacky pragmatists who didn’t make their RSS feeds valid XML, just made sure they worked in a few major readers. They aren’t jerks, they just have different priorities to us university types!
UPDATE 1:
I’ve created an example FOAF profile using this approach. It uses a mixture of ‘traditional’ URIs and normal URIs, and ARC2 and Graphite seem to be fine with it, but a stricter test is the W3 validatior & it passes that too!, so won’t break existing software, except for requiring a quick fiddle to make the URIs resolvable, which should be simple enough.
I’ve also edited the proptocol name from “resource:” to “infra:”
The code to create the implied triples from infra: URIs is trivial; running the previous FOAF example through a scrap of PHP produces this version with primaryTopic realtions injected.
UPDATE 2:
On reflection ‘primary topic’ might be too loaded and a different predicate may be more appropriate. It doesn’t really matter to the basic idea.
So what would happen if you started implementing your preferred solution?
If the solution works pragmatically and conveniently, then others may follow. And if it also happens to be “provable” then it’ll keep the formalists (grudgingly at least) happy too…?
See http://thing-described-by.org/ and tdb in http://tools.ietf.org/html/draft-masinter-dated-uri-08
See also “Converting New URI Schemes or URN Sub-Schemes to HTTP”,
http://dbooth.org/2006/urn2http/
Hmmm. My main issue with the whole thing is that nobody except linked data programmers need to resolve non-information URIs, and having HTTP as the protocol is bloody confusing. Also, there’s no easy way to just say the URI this page is about, except for using http://t-d-b.org/?http://totl.net/ which is really bloody ugly and makes it hard for people to pick up, which is why we’re still in early adopter stage.