I was recently in London for a meeting with some folks from UNIT4 about their recent forays into linked data with their Agresso Business World software.
They have the advantage of having a large installed base (>90 local councils, and >250 Futher and Higher Education institutions in the UK), so can hopefully provide a mass of data without customers having to set up or install additional systems/infrastructure.
They’ve initially been looking at local council data, with a view to widening this later (Universities are an obvious choice, especially with the growing interest and deployment of institutional open data).
Local Councils
Local councils will have to comply with the Prime Minister’s call to publish financial transactions over £500 from January 2011. Being able to do this simply, with an existing system (which already holds their financial information), makes a lot of sense to the council, while providing the data to the community in an open way.
The Guardian‘s Data Blog has a great summary of how this has been done to date: Local council spending over £500: full list of who has published what so far.
The whole list ranges from one to three stars of the Linked Open Data star scheme. Being one of the first councils to move up to four or five stars certainly couldn’t hurt…
UNIT4 are currently running a pilot with the borough of Windsor and Maidenhead, who already make a lot of their data open (1-3* Excel/CSV/PDF mostly). UNIT4’s plan is to take them up to 5* data, with a view to using the same techniques, software and lessons learned to do the same for other council.
From 3* to 5* Data
They’ve been looking at workflows for converting from existing financial data to RDF using the Payments Ontology, aiming to generalise to the process so that the same software and techniques can be applied to non-financial data an organisation might have.
Other ontologies used include VoiD and RDF Data Cube.
Redaction is obviously an important feature here, which it seems Agresso supports natively. The Payments Ontology also supports redaction (and I think there’s also an extension to it which supports redaction in a more fully featured way). This is something which can’t easily be automated though, and will still require human effort to clean up data before it gets opened.
This a great way to get a foot in the door – having one successful workflow from CSV/XLS to RDF means that an organisation can easily apply it to others, with the same software and input formats. Though this is an area that I’m guessing a lot of software providers will want be the center of…
Work done so far for Windsor and Maidenhead can be seen here: Local Government Spend Explorer
The hard parts…
The meeting also raised some familiar concerns/questions about the publishing and maintenance of open linked data:
How do organisations agree on identifiers to use for suppliers?
This is pretty hard without a central registry or lookup service. Companies House data would be a great starting point, but is not open or free.
UNIT4 are going down the route suggested by Tim BL – minting their own URIs for things, then using the owl:sameAs predicate to link them to definitive versions later.
How should an individual entering data find out an supplier’s URI given its name?
Auto-completion? Drop down lists? Even though this is more of a user interface issue, it raises the important point of getting people who don’t cared about linked data to be accurate about data they’re entering.
Which URIs should we use to describe currency?
While ISO maintains a list of currency codes, they’re not available in an open form, and the data set isn’t available without paying.
How should data from separate councils be aggregated?
There are hundreds of local councils in the UK, and collecting data from all of them, or querying 100+ triplestores to get at data for comparison just isn’t feasible.
I’m guessing this is something we’ll eventually have to face in the Higher Education linked data world (e.g. someone wanting to query Universities for course data won’t want to have to connect to download data from dozens of institutions).
Should there be a central registry? Should data.gov.uk pull local council data into a central triplestore? Should UNIT4 be pulling in the data as a service to their customers? Are any/all of these methods sustainable?
What next?
I’m sure we’ll be hearing more about UNIT4 and linked data in the near future (assuming the Windsor and Maidenhead pilot goes well!). If the strategy and data produced is successful, we may well see a number of councils adopt it.
If this happens, this would be a great starting point for producing institutional open financial data – choices of identifiers and ontologies to be use will be much clearer if there’s a large body of homogeneous data out there.
Excellent summary Dave. In working with UNIT4 on this myself, I share an interest in answering these same questions.