Southampton Web and Data Innovation Team

Ideas and Tips from the Team

Categories:

Advertising
AI
Apache
Best Practice
Bitcoin
Command Line
Community
Conference Spam
Conference Website
Data
- Research Data
Database
dev8d
Doug Englebart
Drupal
Events
Gateway to Research
GDPR
Geo
HESA
HTTP
Internet Archive
Intranet
Javascript
Jisc
Management
- Recruitment
Minecraft
Open Data
Open Source
ORCID
OSX
Outreach
Perl
PHP
Programming
python
RDF
- 4store
- Graphite
- SPARQL
- Triplestore
Repositories
Sharepoint
SQL
Team
Templates
Terms and Conditions
testing
Tips
Training
Tutorial
twitter
Uncategorized
web management
Wordpress

What If a URL wasn’t always a URI?

What if a Uniform Resource Locator (URL) wasn’t automatically assumed to also be a Uniform Resource Indicator (URI)?

In the current URI/URL system if you resolve <http://example.org/xyz> and get 200 OK and an English HTML document, you assign that document the URI <http://example.org/xyz>. Where it gets weird is that if you use content-negotiation and get back an XML file in German, that XML file also has the URI <http://example.org/xyz> WHAT THE HELL?

I’ve been following (well, attempting to follow) the discussion over on the WWW-Technical Architecture Group (WWW-TAG) and I’m not sure if this idea is exactly what they are discussing but it’s got me quite excited.

The web is vague, and reading meaning is dangerous. RDFa users tend to use the current document URI as an identifier for what the document is about. This decoupling would allow this, but it means that <> no longer means “current document” but rather “the identified by the URI used to retrieve this document”… I’m sure there’s a better way to phrase that…. maybe “this thing”.

To assign an explicit URI to a document being returned, you would use the HTTP headers. Without it all you can safely state is that the document was once returned by resolving said URI as a URL. It might not be there in 5 minutes time…

Such HTTP Link headers could also list a URL to discover the description of the current thing, using the same system as HTML <link>, which is great, but only in HTML documents.

Example 1.

<http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.jpg> returns a JPG of a photograph of the Mona Lisa. If the author wanted to add some metadata, in the HTTP header he would say that the document returned does indeed have the URI <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.jpg> and is described by <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.rdf> or <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.json>.

Example 2.

<http://users.ecs.soton.ac.uk/cjg/> returns an HTML document about me, but I decide to use it as my URI. In the HTTP Link header (or just in the <html> Link) I just need to say that the current URI is a Person called Chris. Which is what RDFa things tend to do anyway. Chris is identified by <http://users.ecs.soton.ac.uk/cjg/> and described by the document located by <http://users.ecs.soton.ac.uk/cjg/>. Content negoitation now makes sense as I will be described by any document located by that URL.

If falls apart if you save the document somewhere else, of course, as then the location on your local hard-drive file:///home/cjg/Documents/cjg.html also becomes an identifier for me so long as the file is in that location. But file:// URIs are not Cool URIs.

If this became the way of the web, the other problem would be that you can no longer safely assign a URI to a document you’ve downloaded so all the triple stores would get sad when they did

LOAD <http://example.org/foo> .

As without a handy Link: header assigning the document a URI, they won’t know what URI to assign the graph. But that is already full of broken as a URI should not really identify 2 things, and a graph is not the same as the thing it describes. So the simple solution for SPARQL is to make GRAPH a URL which is the source of the document.

Any how, this is all speculation. I’m going to wade back into the mailing list discussion and see if I can get a bit more grasp of what they’re talking about, wish me luck…

Posted in HTTP, RDF, Triplestore.

1 comment

By Christopher Gutteridge – March 28, 2012

One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Neil Crookes says

I always ensure my documents have unique Uris so my url schema for any site or app or piece of content, whether it’s static or dynamically generated takes the form

Protocol://hostname/path and path includes locale information if I’m serving multi-locale content, and if it’s available in different formats, an extension. E.g

/en-GB/posts
/en-GB/posts.rss

I think it’s a good approach that you can apply to all documents and applications. Much safer than relying on content-type and charset request headers.

March 28, 2012, 7:27 pm Reply

« Data.Soton Enterprise Edition Linked Open Data Mission to HESA (Higher Education Statistics Agency) »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

What If a URL wasn’t always a URI?

One Response

Authors

Recent Posts

Meta

Blogroll

Tags

What If a URL wasn’t always a URI?

One Response

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags