Southampton Web and Data Innovation Team

Ideas and Tips from the Team

Categories:

Advertising
AI
Apache
Best Practice
Bitcoin
Command Line
Community
Conference Spam
Conference Website
Data
- Research Data
Database
dev8d
Doug Englebart
Drupal
Events
Gateway to Research
GDPR
Geo
HESA
HTTP
Internet Archive
Intranet
Javascript
Jisc
Management
- Recruitment
Minecraft
Open Data
Open Source
ORCID
OSX
Outreach
Perl
PHP
Programming
python
RDF
- 4store
- Graphite
- SPARQL
- Triplestore
Repositories
Sharepoint
SQL
Team
Templates
Terms and Conditions
testing
Tips
Training
Tutorial
twitter
Uncategorized
web management
Wordpress

Firing Range-14

So there’s a call for suggestions to fix/replace http range14.

If you’re not familiar with this. The basic sitution is that we use URIs to represent real world things, and if you resolve them they give a “303 See Other” to redirect you to a document of interest, presumably about the subject you asked about.

eg.

I, the person, am identified by URI: http://id.ecs.soton.ac.uk/person/1248

My profile page, an HTML document, is identified (& located) by URL: http://www.ecs.soton.ac.uk/people/cjg

My FOAF Profile, an RDF+XML document containing machine-readable facts about me is identified (& located) by URL: http://rdf.ecs.soton.ac.uk/person/1248

If you resolve my URI in a web browser, it’ll pop up my profile page. You can see how by typing this on the UNIX or OSX terminal:

curl -I http://id.ecs.soton.ac.uk/person/1248

The response will be more or less like this:

HTTP/1.1 303 See Other
Date: Wed, 29 Feb 2012 23:52:16 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Location: http://www.ecs.soton.ac.uk/people/cjg
Connection: close
Content-Type: text/html; charset=utf-8

Which means go look at the URL in the “Location:” bit. The 303 indicates “See Other” rather than the normal “Moved”, which implies it might not actually be the same thing.

For added complexity, if you tell the web server you prefer to get RDF+XML documents, by typing

curl -H'accept:application/rdf+xml' -I http://id.ecs.soton.ac.uk/person/1248

You get back

HTTP/1.1 303 See Other
Date: Wed, 29 Feb 2012 23:54:39 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Location: http://rdf.ecs.soton.ac.uk/person/1248
Connection: close
Content-Type: text/html; charset=utf-8

This is bloody hard for people to get their heads around, and not obvious unless you really grok how the web was designed. However, for me, the real screw-up in all of this is using “http:” to represent something which isn’t a document… It’s not like we weren’t already using http: https: gopher: ftp: urn: mailto: tel: etc. (OK, nobody remembrs gopher)

I think it’s daft to use the same protocol to unqiuely identify real-world objects AND documents on the web. I have to explain this again and again to each person learning RDF, and it won’t take off if people can’t figure it out for themselves, like HTML, JSON, XML etc.

Schema.org

If you’ve not yet seen schema.org; it’s a website which presents a schema for information friendly to search engnges. It mostly doesn’t idenify ‘things’ at all, just defines a structure and literal properties of items in that structure (eg. start time of an event, name of a person). I hear it uses URLs to identify things which isn’t as crazy as it sounds if you define the relationships correctly. eg.

<http://www.soton.ac.uk> *hasMember* <http://www.ecs.soton.ac.uk/person/cjg> .

That’s an utterly reasonable statement if *hasMember* is defined as meaning “the group or organization which is the primary topic of the first document, has a member which is the primary topic of the second document”. It’s ugly, but entirely semantically sane. In slightly more formal terms;

?X *hasMember* ?Y

implies

?X foaf:primaryTopic ?X-topic .
?Y foaf:primaryTopic ?Y-topic .
?X-topic foaf:member ?Y-topic .

My proposal; infra:

UPDATE 3: So it turns out that just like my ‘primaryTopic.net’ namespace idea, this is also an idea that’s been suggested before, in far more careful detail: tools.ietf.org/html/draft-masinter-dated-uri-10

So my analysis stands, but as regards the tdb: (thing-described-by) system described in the above link.

and I admit I’ve not got the 10 years of literature review as some of the community, but can’t we just do:

infra:http://www.ecs.soton.ac.uk/person/cjg

and specify that http://www.ecs.soton.ac.uk/person/cjg is assumed to be a document about that thing, and it could optionally content-negotiate if it wants.

Effectively, there’s a standing definiation that <XYZ> foaf:primaryTopic <infra:XYZ> .

NOTE: My first draft used “resource:” not “infra:” but that was very muddling to type in an RDF+XML document. I don’t really care about the choice of name, just the approach.

Pros:

Visible distinction between Document & Non-Information URIs
Does not invalidate http: URIs, just provides a better method
Allows URIs to be created from popular websites without formal buy in; eg. infra:http://www.imdb.com/title/tt0133093/ or infra:http://xkcd.com/327/
Should not break existing software, such a triple stores.
Allows a bridge to the schema.org approach (refer to things by a URL which describes them)
You can still use content negotiation on the URL to give back HTML or RDF.
Provides similar functionality to “&” and “*” operators in C
Allows existing URLs to be cleanly used as identifiers in a semantically correct way.
Works with # elements in documents, eg. infra:http://en.wikipedia.org/wiki/University_of_Southampton#Malaysia_Campus

Cons:

Will require some trivial changes to existing systems to allow them to resolve these URIs into additional data.
Current URIs may still confuse new users as they start with http://
It is entirely reasonable to have infra:infra:infra:http://totl.net/ but that’s going to tramatise anybody who didn’t absorb C pointer de-referencing through the skin in their formative years.
Obviously, my abilitiy to identify cons is limited by proximity.
People might just slap “infra:” on the front of everything, even standard URIs.
“it’s not great for sites with high traffic; tends to encourage conflation with REST. Be nice if could message intent.” – from @derivadow

I doubt I’ve got the whole picutre, but in that statement lies much of the problem. I’m now definitely an expert, and I still don’t ‘get’ the subtle issues. If the linked-data-web is going to work we’ve got to make it workable by the hacky pragmatists who didn’t make their RSS feeds valid XML, just made sure they worked in a few major readers. They aren’t jerks, they just have different priorities to us university types!

UPDATE 1:

I’ve created an example FOAF profile using this approach. It uses a mixture of ‘traditional’ URIs and normal URIs, and ARC2 and Graphite seem to be fine with it, but a stricter test is the W3 validatior & it passes that too!, so won’t break existing software, except for requiring a quick fiddle to make the URIs resolvable, which should be simple enough.

I’ve also edited the proptocol name from “resource:” to “infra:”

The code to create the implied triples from infra: URIs is trivial; running the previous FOAF example through a scrap of PHP produces this version with primaryTopic realtions injected.

UPDATE 2:

On reflection ‘primary topic’ might be too loaded and a different predicate may be more appropriate. It doesn’t really matter to the basic idea.

Posted in RDF.

rev="post-819" 4 comments

By Christopher Gutteridge – March 1, 2012

4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Tony Hirst says

So what would happen if you started implementing your preferred solution?

If the solution works pragmatically and conveniently, then others may follow. And if it also happens to be “provable” then it’ll keep the formalists (grudgingly at least) happy too…?

March 1, 2012, 5:59 am Reply
Herbert Van de Sompel says

See http://thing-described-by.org/ and tdb in http://tools.ietf.org/html/draft-masinter-dated-uri-08

March 4, 2012, 5:45 pm Reply
David Booth says

See also “Converting New URI Schemes or URN Sub-Schemes to HTTP”,
http://dbooth.org/2006/urn2http/

March 7, 2012, 3:40 pm Reply
- Christopher Gutteridge says
  
  Hmmm. My main issue with the whole thing is that nobody except linked data programmers need to resolve non-information URIs, and having HTTP as the protocol is bloody confusing. Also, there’s no easy way to just say the URI this page is about, except for using http://t-d-b.org/?http://totl.net/ which is really bloody ugly and makes it hard for people to pick up, which is why we’re still in early adopter stage.
  
  March 7, 2012, 4:15 pm Reply

« With apologies to Faith Lawrence All-Things-Of-Type-X an Anti-pattern? »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

Firing Range-14

Schema.org

My proposal; infra:

Pros:

Cons:

4 Responses

Authors

Recent Posts

Meta

Blogroll

Tags

Firing Range-14

﻿Schema.org

My proposal; infra:

Pros:

Cons:

4 Responses

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags

Schema.org