HESA are making positive noises about some limited open data and defining URIs to help UK data projects produce linked data. Don’t expect all their data to appear under an open license in the next few days, but they had no objection (in principle) to making the high-level data they already openly release into 5* linked data.
Last week I went up to Cheltenham. I was invited to talk to HESA about Linked Open Data, which is something which makes me very happy. HESA have lots of juicy data, but they also have an infrastructure of identity off which much more data could be linked.
My first impression was that my presentation was very well attended, and from a variety of job types. My second, was that this was a friendly crowd. Mostly new to the technology, but interested in innovation and practical ideas to doing Good Stuff.
I gave them my usual RDF, URI/URL, Linked Data intro, which I’ve been performing here and there for the last 18 months, then some information on what Southampton has done with it and some other demos. Secifically we looked at the Ordnance Survey postcode URIs (they asked if it was still worth paying for the data…), We looked up HESA on DBPedia, and a few other neat things.
The most interesting part was learning what data HESA had which they could easily and painlessly create URIs and triples for. As the ECS webteam now controls data.ac.uk this gives us some interesting possibilities in creating long term URIs for things. Some of the ideas put around included:
organisations.data.ac.uk — HESA have information about publicly-funded HEIs in the UK. With the advent of KIS, they’ll also have data on all the professional organisations which accredit degrees. On a side note, apparently ‘accreditation’ is a bit of an overloaded term, luckily we’ve got this semantic web thing to be explicit about the meaning of our relationships. HESA have some ‘headline’ data about organisations which they already make public in various forms so hopefully we can get this as fully open data, eg. the student body size of each university each year.
Also they have the number of heads of cattle per HE institution. Want to guess who has the most cattle? *see end of article.
jacs.data.ac.uk — The JACS codes, which are currently ‘co-owned’ by UCAS & HESA but should not belong in either domain really as they are not integral to that organisation. Using a data.ac.uk URI scheme would protect the URIs against government reorganisation and the like.
One of the things that’s a bit more outside their comfort zone is publishing the deeper data under an open license (although it’s already in the spreadsheets on unistats.org, the license does not permit reuse. What is possible is to make such things available as linked data but not open data. I told them that personally I’m an advocate of fully open data, but I wasn’t going to take them to task about this with my professional hat on. They could still publish the vocabulary which means other people could chose to use their ways of dividing up cohorts of students — full time/part time, mature/young etc. and use the same semantic definitions.
One interesting idea is that we should maybe have URIs of the type <http://academic-year.data.ac.uk/2012-1013>. Each university does have their own (more or less) strictly defined dates each year, but there’s also the national concept which is what matters to HESA, UCAS etc. I was asked how I might relate a University of Southampton academic year to a wooly data.ac.uk one, and off the top of my head using skos:broader/narrower sound like the right relation. I think this is a great idea and will implement it soon if the data-ac-uk mailing list thinks it sounds sane.
There were other ideas kicked around but I really appreciated that the HESA staff seem to be happy to embrace the idea of ‘fail fast‘, or maybe a better way to put it in this context is ‘we are going to make mistakes, so lets get on and make them so we can get past them’. One of the HESA staff commented that what we were doing with data felt like webpages in 1992, which I think is entirely fair. A few brave organisations have data sites and can see that it’s quite probably the future, but none of us can guess what we’ll learn about linked data publication in the next decade to alter and improve what we do.
I’m really impressed how fast people picked up the ideas and ran with them. Don’t bombard them with demands, they’re just starting out, but the clear impression was that they wanted to do what they could to support linked data.
Just to be clear; there’s absolutely no formal plan at this stage, but plenty of enthusiasm.
A good day.
* The HE Institution with the most head of cattle** is… Reading. Who knew?
** is there a predicate for linking an organisation to how many cattle it has? maybe that domesdaybook project has one?***
*** Nope, they’ve just got a JSON API. Ah well.