Skip to content


Fear, Uncertainty and Doubt

I thought it would be fun to compile a list of all the various reasons we’ve heard for not publishing data…

  1. What if terrorists use the data?
  2. It probably breaks data protection or something
  3. People will find out our data is a bit crap
  4. It’s too complicated for the public to understand
  5. It’s too complicated for you to understand
  6. It’s really big (100,000 of rows of data, oh my!)
  7. We might want to sell it later
  8. We think we might not own some of it
  9. We already make the information available via an internal website so why duplicate effort?
  10. We’ll get spam
  11. Our software has no API (because it’s enterprise, unlike that crappy interoperable stuff)…
    1. ..so tough, you can’t have it
    2. ..and the SQL is too difficult to understand, it’s all like “table233.fieldA3”
    3. ..we’ll have to pay the people who make the software a fortune to get the data
  12. We’re just too busy with project X (then project Y…)

And the special new category for facilties & equipment sharing data:

  1. If people know about our stuff we’ll get people ring us up. We don’t want people bothering us.
  2. We don’t want other people using our stuff because they’ll break it
  3. We don’t want other people touching our precious things because we think our stuff should be just for us

Some of these are reasonably valid, as in the case of not being confident you hold the copyright, or the spam comment is legit but I find it hard to give it much credit.

Any I should add?

Posted in Uncategorized.

Tagged with .


London 2012 Website Restrictions

These are my own views and not necessarily those of my employer. I am not a lawyer.

http://www.london2012.com/terms-of-use/

Yikes, just… yikes.

They start off with the good old “by using this site, you agree to our terms”, which always feels a bit sneaky when you have to read the site to read their terms, ah well.

There’s a couple of really nasty bits to follow. The first is in clause 5. “Linking Policy”, they say you can’t link to their site with an image. There’s a list of other ways you can’t link to them, “false and misleading” well… that’s sort of reasonable, “derogatory” … er, that’s not, and … “otherwise objectionable” is so darn vague that they can just claim a link breaks their policy if they object to it.

For example if I was to create the following link; The London 2012 Olympic Games appear morally bankrupt and the organisers seem incompetent. Then that would certainly break their terms of use. It’s certainly derogatory, but I don’t believe that’s illegal.

That’s just the warm up to clause 6(v); in which they retain the right to buy any “User Generated Content” you upload. From what I can tell, they tell you how much they think is reasonable, and if you don’t think it’s reasonable you have to prove some other organisation is willing to pay more and prove it, at which point they have 30 days to choose to buy it on the same terms.

Basically, they can decide your amazing photo is worth £2 and buy the rights from you. You agreed to this by uploading it to their site.

In summary, if you generally apply a creative commons license to things you produce online, you should read think twice before uploading ANYTHING to the london2012.com website.

Normally I’d be inclined to give people the benefit of the doubt, and just assume that it’s lawyers being zealous and they won’t use the license against people, but the games have demonstrated impressive proactivity in “protecting their interests”, so we have to assume they’ll have no mercy in this matter either.

**UPDATE**

I’ve given it some thought and looking at

(1) you will give us written notice by email setting out the proposed terms of any bona fide third-party offer to acquire any exclusive rights in such UGC, (2) we shall have a 30-day period in which to acquire the same on the same terms

If this bites anybody, I will  (on request) make a bona fide third-party offer to you ;

I (Christopher Gutteridge, private UK citizen) will offer to acquire exclusive rights to your olypmic UGC (user, for which I will pay *nothing* (£0.00, $0.00) and I will place the work in question into the public domain or license it under a creative commons license of your choice (with the attribution to the creator of the work, not me). The London2012 website then has 30 days to acquire your content on the same terms. If I aquire your rights, you will be expected to upload the content to a suitable website (Flikr, YouTube, or one of your choice).

Please contact me if this is of use to you.

Posted in Terms and Conditions.


The Institutional Web Managers Workshop 2012

This week I have been at the Institutional Web Managers Workshop. I am no stranger to web management but I am new to institutional web management and this was a good place to come and get orientated. The event brings together a mixture of techies, content creators and web team managers from across the HE sector to share ideas and experiences from the business end of institutions.

This year the words on everyone’s lips seemed to be “Responsive Design” and the message seems to be “you should do some of that”. Responsive design is about having one website which is responsive to the device the user is viewing on. So if you are looking at the webpage through a smart fone you get a smooth experience for a small screen driven by chubby fingers and when you are on a big screen you get controls designed for mouse level precision. I think 4 of the sessions i attended were advocating responsive design in some sense.

Points of interest to note were Bradfords experience going responsive and E.A Draffans responsive design improves site accessibility. Bradford noted that going responsive wasnt too hard to do, the results were pleasing to users but they did not mind being pushed between responsive and unresponsive parts of the site. This is good news for large web presnces because you dont have go responive all in one go. E.A pointed out users are choosing to come to your site on mobile and not giving them responsive design was effectively reducing the quality of user experience. ECS is a very technical place and we have always felt that have a rich and interesting website is completely vital as a way of demostrating the technical skills in the Faculty to prespective students.

Another session of some note was Phil Barker talking about microdata and schema.org It seems microdata is Googles indexing weapon of choice and they do some pretty fun stuff with it. Unlike RDFa the schema is fixed by schema.org which means that not everything can be represented but common themes can be. The top level of heirachy is fairly broad. I think that these tools are good for helping Google and others index your site but not a sensible way to transfer data for use else where. For that you need a data document of some sort.

Kevin ashley talked about the problems of science data and the EPSRC guidelines. This is something I care about a lot but I have not yet thought of the killer solution, only unimaginative ones. Tony Hirst and Martin Hawksey showed us the power of visualizing simple data and I expect this is connected to the data repositories killer app. Ferdinand von Prondzynski had a good rant about how over complicated all university front pages are. On his hit list of crimes were the scrolling gallery, the news, lists of events, and pretty much everything which wasnt a link to 8 lower websites covering everything in a much more targeted way. The bottom line is university front pages are trying to serve too many masters and succeeding at serving none of them very well. When I looked at the Southampton front page I realised we were no where near as bad as his case study examples and I commend both the iSolutions web team and the branding company for keeping our front page fairly clean.

Over all the event was very on topic and I picked up some neat tricks.

Posted in Team, web management.

Tagged with .


Research Contest Conditions

It’s becoming more common for companies and other groups to run contests to flush out new ideas in a field, for example, YarcData are offering $100,000 prize money for a contest about doing stuff with big RDF graphs.

This contest is pretty tempting, but the conditions below worry me a little.

7. YarcData’s and Cray’s Ownership of Submissions: Each Submission, including, without limitation, all contents, concepts and ideas embodied therein, becomes the exclusive property of YarcData and Cray, may be used by YarcData and Cray for marketing and other promotional purposes, and will not be returned by YarcData or Cray. By entering a Submission, each Entrant hereby irrevocably assigns to YarcData and Cray all right, title and interest in and to such Submission, including, without limitation, all Rights related to the Submission without expectation of compensation or acknowledgement (other than the prize, if any, that is awarded as set forth in these Official Rules). Entrant waives any and all artistic and moral rights associated with his/her/their Submission.  YarcData and Cray shall have no obligation to make attribution with respect to the Submission, retain any of the Submissions, or maintain any information or ideas contained therein, as confidential or proprietary.

8. Right to Use Name, Likeness, and Other Identifying Information: By entering a Submission, each Entrant hereby irrevocably consents to the unlimited reproduction, distribution, display, performance, and other use by YarcData and Cray and their respective successors and assigns of his/her/their name; image; likeness; voice; Submission; biographical information; statements and quotes; stories and anecdotes provided; all of his/her/their other personal or commercial attributes or identifying features; and any interior or exterior photographs or other depictions of the Entrant’s home or office, which may or may not contain images of the Entrant’s family, friends, or pets (each, a “Likeness”), for any purposes directly or indirectly related to this Competition, and in any format or medium now existing or later developed. Each Entrant hereby waives any and all rights of publicity and rights of privacy associated with YarcData’s and Cray’s use of a Likeness. YarcData and Cray may, in their sole discretion, and without providing notice to or receiving consent from an Entrant, modify, change, adapt, or otherwise alter a Likeness. Entrants shall have no right of approval, no claim to compensation, and no claim (including, without limitation, claims based on invasion of privacy, defamation, or right of publicity) arising out of any use, blurring, alteration, or use in composite form of a Likeness. The rights granted under this paragraph are without compensation or notification to the Entrant of any kind, except as required by law, and shall extend to all Submissions, all other materials submitted by Entrant, and all other materials developed in connection with the Competition, regardless of whether they are developed by the Entrant or another person or entity.

I’m not a lawyer, but the above implies to me that if you enter the contest and win nothing they can still use your ideas and not even credit you. That’s not a good situation to end up in.

You assign them your rights, which sounds like you give up your own title to the submission. Does this mean you couldn’t enter it somewhere else as they now own it?

The reason for my concern is that I work with the students on the Electronics and Computer Science courses at Southampton and they enter and win contests.

They also have ideas at University that make their careers; AudioScrobbler is now part of Last.fm and that began life as a 3rd year project here!

My concern is that we may need to have some basic rules about what contests we are willing to promote to our students.

Any ideas? Of the top of my head;

  • Their work should not be used without attribution
  • Transfer of ownership should not be a criteria of entry (although maybe a criteria for being awarded the prize)
  • It is acceptable for the contest to require the work is open licensed (creative commons, open source etc), as that benefits the entire community and is a clear choice for the student to do or not do.

Can anyone suggest anything better? Should we have a guide for companies of what is or is not OK?

Posted in Best Practice.


Mapping Lat/Long in our SPARQL endpoint into UK Ordnance Survey Easting and Northing

This is a command-line php script which creates UK Easting & Northing values from the lat long values. I plan to run it once a day on data.southampton.ac.uk — I know that the big local taxi firm uses Easting and Northing in their database so it might be interesting to see if I can generate a report for them which benefits our members by giving the taxi services better data.

The libraries it uses are my sparqllib (LGPL) and phpcoord-2.3 (GPL)

You’ll need the latest github version of sparqllib, if you want to  include the “cgiParams” line, which tells 4store to give all the results — we’ve got more than 2000 lat/long pairs, and by default we’ve set 4store to set a limit of 2000 results on queries.

As this script uses a GPL library (not LGPL), I guess that makes it GPL, although it’s just a slip of script, I’m sure other UK open data services might want to consider something similar.

#!/usr/bin/php
<?php

require_once( "/var/wwwsites/tools/phpcoord/phpcoord-2.3.php" );
require_once( "/var/wwwsites/tools/PHP-SPARQL-Lib/sparqllib.php" );

$sparqlh = sparql_connect( "http://data-dev.ecs.soton.ac.uk:8002/sparql/" );
$sparqlh->cgiParams( "soft-limit=-1" );

# Limit lat and long to the same graph to try to minimise the
# number of 'multiplying out' we get with multiple lat/long
$result = $sparqlh->query( "SELECT DISTINCT * WHERE {
 GRAPH ?g {
 ?thing <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
 ?thing <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .
 }
}" );
if( !$result ) { print $db->errno() . ": " . $db->error(). "\n"; exit; }

$fields = $result->field_array( $result );

while( $row = $result->fetch_array() )
{
 $ll2w = new LatLng($row["lat"],$row["long"]);
 $ll2w->WGS84ToOSGB36();
 $os2w = $ll2w->toOSRef();

 print "<".$row["thing"]."> <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/easting> "";
 print round($os2w->easting).""^^<http://www.w3.org/2001/XMLSchema#integer> .\n";
 print "<".$row["thing"]."> <http://data.ordnancesurvey.co.uk/ontology/spatialrelations/northing> "";
 print round($os2w->northing).""^^<http://www.w3.org/2001/XMLSchema#integer> .\n";
}


**UPDATE**

It’s been pointed out to me by Yang Yang (a PhD researcher in WAIS) that it’s a bit of an antipattern to publish data derived from other people’s datasets, and there’s no real value to me in providing Easting Northing data for bus-stops and a few places in wikipedia we import. I’ve made a rather elegant solution;

SELECT DISTINCT * WHERE { 
 ?dataset <http://rdfs.org/ns/void#dataDump> ?graph . 
 ?dataset <http://purl.org/dc/terms/publisher> <http://id.southampton.ac.uk/> .
 ?dataset a <http://purl.org/openorg/AuthoritativeDataset> .
 GRAPH ?graph {
   ?thing <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
   ?thing <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .
 }
}

This now only finds lat/long pairs in graphs which are listed as published by the University of Southampton AND Authoritative. That works pretty well.

Posted in 4store, Geo, PHP, SPARQL.

Tagged with .


What I learnt from moving old perl…

One of my tasks this week was moving bookings.ecs.soton.ac.uk to a new home. Bookings was perl cgi software written in 1996 by Julian Field to manage the booking of computers in our electronics and computing labs. It is completely bespoke from a time before perl’s CGI module let alone a reusable open source solution a general resource booking problem. It previously lived on one of our internal “DNS and misc” servers called stork which has been due for decommissioning for a few years now. The problem was stork is an old solaris 8 box (horrible setup)  running perl 5.6  and we were moving to Linux and perl 5.10. More worryingly the code was written for perl 5.003 several people had tried to move it previously with little success. My perl is fairly good but my perl history isn’t great what follows is a few old dark arts mainly lost from the annals of the web which I thought I would share for the benefit of anyone else who has to move old code to pastures new.

  1. My first tip is stay calm. The perl community is very proud of how backwards compatible it is and so very few of your problems are likely to be from changes to the syntax or core libraries. I only had to make a few very simple changes but the debugging process was a little tricky. Back in the day perl strict was not in common use and the now accepted good practices had not been laid down or were not proliferated widely across the web. That means there will be unfamiliar syntax and a distinct lack of debugging information.
  2. #!/usr/bin/perl the opening gambit which foxed me for nearly 20 minutes. On the solaris system I moved from perl lived in #!/usr/local/gnu/bin/perl the apache log reports file not found and script terminated prematurely. The file which was not found wasn’t actually the perl interpreter not the .pl file. I mistakenly assumed my apache virtual host configuration was wrong. The premature termination means because the interpreter wasn’t found the script crashes before printing http headers. All I had to do to fix it was go through each file changing the #!/ line.
  3. Back in the day there was a $* variable in perl. When $* = 1 regex matches over multiple lines. It’s the only piece of perl syntax which was no longer supported in the whole code base and the error log even told me as much. The problem then became working out what it was $* actually did. Its not that easy to google and I did not find helpful results. In the end I pulled an old perl textbook off the shelf and found the explanation straight away. The solution is remove $* and add the /s modifier to the regex.
    Before:
    $* = 1;
    $foo =~ m/booking room/;
    After:
    $foo =~ m/booking room/s;
  4. opendbm versions. I had never head of opendbm but its actually quite a neat tool. It lets a perl program sync a hash with a file, bookings was using it instead of a database. opendbm kept reporting “unable to open file”. I made the files chmod 777 but it still didnt work. After nearly 2 hours of debugging I finally realised the internal syntax opendbm was using had changed from the old version so I couldnt open my old data. Not a problem no one cares about past booking anyway. Simply backup the bookings file and delete it and the script creates a new one no problem.

And that was it, simple as pie…. pie which took most of a day to cook. But then again what language could you take 16 year old code and move it across operating systems and interpreter versions and have it still working by the end of the day. I was talking to some friends in the pub and we concluded “not many”. Java would have made you wish for death, even if you had the source code. Well done perl, well done Jules and well done me.

Posted in Uncategorized.


Recovering from broken php code in a “PHP Code” block in Drupal 7

I love to live a little dangerously, and we allow our research group webmasters to put custom php code in their Drupal sites. This lets them do nice things like have blocks with “latest 3 publications” or “next seminar”. I keep meaning to write standard code for these but there’s millions of things on the TODO list as usual.

Anyhow, the webmaster for a research group put a typo in the PHP code for a block that displayed on every page on his site. This meant he couldn’t get into the site at all as no pages could render (we just saw a white page with a scrap of text generated before the php error)

I couldn’t find a blog post about how to recover from the situation so I figured I should write one.

Solution:

Go into the MySQL database for the site, and issue the following command:

mysql> update block_custom set format='full_html' where format='php_code' ;

Admittedly it’s a bit of a blunt instrument and stops php working in all blocks, to be a bit more subtle you could do.

mysql> select bid,info,format from block_custom ;
+-----+-------------------+---------------+
| bid | info              | format        |
+-----+-------------------+---------------+
|   2 | Student Quotation | full_html     |
|   3 | Footer Links      | full_html     |
|   4 | Latest News       | php_code      |
|   5 | Seminars          | php_code      |
|   6 | Demo HVLab Link   | filtered_html |
|   7 | LeftBar           | filtered_html |
|   8 | Lower Left        | php_code      |
|   9 | Featured News     | php_code      |
|  10 | Featured Project  | php_code      |
|  11 | t                 | filtered_html |
+-----+-------------------+---------------+
10 rows in set (0.00 sec)
mysql> update block_custom set format='full_html' where bid=6;

Where bid is the number of the block you want to reset.

Prevention:

To avoid getting into this situation is quite easy. It only happens if the broken block is on all pages, so make a page for testing new blocks and limit the PHP block to only show on that page while you’re working on it. If that page breaks you will still be able to go through the homepage to get to the admin interface.

Posted in Drupal.

Tagged with .


First week on the web team

Hello,

It is my first week in my new job as “Web Systems and Data Manager” or Dave Chalis Mk2, according to the office door. So far most things seem to be HTTP 1.1/200 Ok. I have been learning about the systems which Web Team run and maintain. It seems to be no small part of the reason this academic unit is so effective is the incredible suite of monitor tools which we have available. I can reports on loads of stuff very easily. It is a good feeling to know about problems before the users do. The downside is there is a reasonably long list of administrative niggles with my name on them. At the moment I am quite enjoying just working my way through the list but I suspect that it will quickly become part of the humdrum of the day to day work in the office.

From what I can tell from various people I have spoken to my new role has three strands.

  1. Keep all the plates spinning. That is mainly what I have been doing the last few days, making sure all of the server info is up to date, everything is running and has an owner. That includes provisioning new websites, wikis, blogs and so on.
  2. Clearing out the clutter. Web Team’s position in the University has changed with the restructuring. As a result some things are becoming our responsibility and there things which we giving up responsibility for. Some things we run simply are not used enough to keep them running. These sort of things need to gradually retired to try and keep our workload balanced.
  3. Systems innovation. The bulk of our job is to innovate new systems for use around the University that makes it easier for staff to do their job effectively. This is complete end to end systems, while this often means technical systems it also means the processes that make up those systems and cutting back on stuff we are doing unnecessarily. This is the really fun stuff, from what I can tell there are quite a few systems we would like to build and things like Data Southampton which need “productizing”.

For people who do not know me, my name is Patrick, I enjoy loads of stuff, we probably have something in common. I can not stand the tele and anything else which serves only as a barrier to getting things done. My previous work can mainly be seen on the OneShare blog or on my portfolio, most people around still think of me as “the EdShare guy” but I have done a fair bit of other stuff since then. Hopefully I’ll do more stuff here 🙂

Posted in Team.


Linked Open Data Mission to HESA (Higher Education Statistics Agency)

Executive summary:

HESA are making positive noises about some limited open data and defining URIs to help UK data projects produce linked data. Don’t expect all their data to appear under an open license in the next few days, but they had no objection (in principle) to making the high-level data they already openly release into 5* linked data.

Last week I went up to Cheltenham. I was invited to talk to HESA about Linked Open Data, which is something which makes me very happy. HESA have lots of juicy data, but they also have an infrastructure of identity off which much more data could be linked.

My first impression was that my presentation was very well attended, and from a variety of job types. My second, was that this was a friendly crowd. Mostly new to the technology, but interested in innovation and practical ideas to doing Good Stuff.

I gave them my usual RDF, URI/URL, Linked Data intro, which I’ve been performing here and there for the last 18 months, then some information on what Southampton has done with it and some other demos. Secifically we looked at the Ordnance Survey postcode URIs (they asked if it was still worth paying for the data…), We looked up HESA on DBPedia, and a few other neat things.

The most interesting part was learning what data HESA had which they could easily and painlessly create URIs and triples for. As the ECS webteam now controls data.ac.uk this gives us some interesting possibilities in creating long term URIs for things. Some of the ideas put around included:

organisations.data.ac.uk — HESA have information about publicly-funded HEIs in the UK. With the advent of KIS, they’ll also have data on all the professional organisations which accredit degrees. On a side note, apparently ‘accreditation’ is a bit of an overloaded term, luckily we’ve got this semantic web thing to be explicit about the meaning of our relationships. HESA have some ‘headline’ data about organisations which they already make public in various forms so hopefully we can get this as fully open data, eg. the student body size of each university each year.

Also they have the number of heads of cattle per HE institution. Want to guess who has the most cattle? *see end of article.

jacs.data.ac.uk — The JACS codes, which are currently ‘co-owned’ by UCAS & HESA but should not belong in either domain really as they are not integral to that organisation. Using a data.ac.uk URI scheme would protect the URIs against government reorganisation and the like.

One of the things that’s a bit more outside their comfort zone is publishing the deeper data under an open license (although it’s already in the spreadsheets on unistats.org, the license does not permit reuse. What is possible is to make such things available as linked data but not open data. I told them that personally I’m an advocate of fully open data, but I wasn’t going to take them to task about this with my professional hat on. They could still publish the vocabulary which means other people could chose to use their ways of dividing up cohorts of students — full time/part time, mature/young etc. and use the same semantic definitions.

One interesting idea is that we should maybe have URIs of the type <http://academic-year.data.ac.uk/2012-1013>. Each university does have their own (more or less) strictly defined dates each year, but there’s also the national concept which is what matters to HESA, UCAS etc. I was asked how I might relate a University of Southampton academic year to a wooly data.ac.uk one, and off the top of my head using skos:broader/narrower sound like the right relation. I think this is a great idea and will implement it soon if the data-ac-uk mailing list thinks it sounds sane.

There were other ideas kicked around but I really appreciated that the HESA staff seem to be happy to embrace the idea of ‘fail fast‘, or maybe a better way to put it in this context is ‘we are going to make mistakes, so lets get on and make them so we can get past them’. One of the HESA staff commented that what we were doing with data felt like webpages in 1992, which I think is entirely fair. A few brave organisations have data sites and can see that it’s quite probably the future, but none of us can guess what we’ll learn about linked data publication in the next decade to alter and improve what we do.

I’m really impressed how fast people picked up the ideas and ran with them. Don’t bombard them with demands, they’re just starting out, but the clear impression was that they wanted to do what they could to support linked data.

Just to be clear; there’s absolutely no formal plan at this stage, but plenty of enthusiasm.

A good day.

* The HE Institution with the most head of cattle** is… Reading. Who knew?

** is there a predicate for linking an organisation to how many cattle it has? maybe that domesdaybook project has one?***

*** Nope, they’ve just got a JSON API. Ah well.

Posted in RDF.

Tagged with .


What If a URL wasn’t always a URI?

What if a Uniform Resource Locator (URL) wasn’t automatically assumed to also be a Uniform Resource Indicator (URI)?

In the current URI/URL system if you resolve <http://example.org/xyz> and get 200 OK and an English HTML document, you assign that document the URI <http://example.org/xyz>. Where it gets weird is that if you use content-negotiation and get back an XML file in German, that XML file also has the URI <http://example.org/xyz> WHAT THE HELL?

I’ve been following (well, attempting to follow) the discussion over on the WWW-Technical Architecture Group (WWW-TAG) and I’m not sure if this idea is exactly what they are discussing but it’s got me quite excited.

The web is vague, and reading meaning is dangerous. RDFa users tend to use the current document URI as an identifier for what the document is about. This decoupling would allow this, but it means that <> no longer means “current document” but rather “the identified by the URI used to retrieve this document”… I’m sure there’s a better way to phrase that…. maybe “this thing”.

To assign an explicit URI to a document being returned, you would use the HTTP headers. Without it all you can safely state is that the document was once returned by resolving said URI as a URL. It might not be there in 5 minutes time…

Such HTTP Link headers could also list a URL to discover the description of the current thing, using the same system as HTML <link>, which is great, but only in HTML documents.

Example 1.

<http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.jpg> returns a JPG of a photograph of the Mona Lisa. If the author wanted to add some metadata, in the HTTP header he would say that the document returned does indeed have the URI <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.jpg> and is described by <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.rdf> or <http://cdn.gunaxin.com/wp-content/uploads/2011/04/mona-lisa.json>.

Example 2.

<http://users.ecs.soton.ac.uk/cjg/> returns an HTML document about me, but I decide to use it as my URI. In the HTTP Link header (or just in the <html> Link) I just need to say that the current URI is a Person called Chris. Which is what RDFa things tend to do anyway. Chris is identified by <http://users.ecs.soton.ac.uk/cjg/> and described by the document located by <http://users.ecs.soton.ac.uk/cjg/>. Content negoitation now makes sense as I will be described by any document located by that URL.

If falls apart if you save the document somewhere else, of course, as then the location on your local hard-drive file:///home/cjg/Documents/cjg.html also becomes an identifier for me so long as the file is in that location. But file:// URIs are not Cool URIs.

If this became the way of the web, the other problem would be that you can no longer safely assign a URI to a document you’ve downloaded so all the triple stores would get sad when they did

LOAD <http://example.org/foo> .

As without a handy Link: header assigning the document a URI, they won’t know what URI to assign the graph. But that is already full of broken as a URI should not really identify 2 things, and a graph is not the same as the thing it describes. So the simple solution for SPARQL is to make GRAPH a URL which is the source of the document.

Any how, this is all speculation. I’m going to wade back into the mailing list discussion and see if I can get a bit more grasp of what they’re talking about, wish me luck…

Posted in HTTP, RDF, Triplestore.