When I was developing the Graphite PHP Library I added a simple function called $graph->allOfType( $type ) which would return a list of all the things of a given type in the current graph. For example the list of all foaf:Person or all Buildings.
It’s also very tempting to do this when presented with a SPARQL endpoint, and a totally legitimate thing to do when exploring the data.
However…
Leaving applications lying around which use this either as SPARQL or otherwise is a ticking time bomb. Here’s an example;
At the university, I’ve got a list of all the buildings, in our endpoint. So I cat get all our buildings by doing this query:
(Note that rooms: is just a prefix for a vocabulary to describe rooms and buildings, sorry if that’s confusing)
SELECT ?thing WHERE { ?thing a rooms:Building . }
OK, that’s great. But as our system has grown we’ve now got some buildings which are in the data but no longer part of the university state, and I’d like to add some buildings which are in the city, but I can’t because my stupid naieve coding will assume all buildings in the store are our buildings.
Easily Solved
The solution is to add some semantics to say what I really mean, which is to have a list of buildings which are occupied by the university of southampton. I guess I just need to add
<http://id.southampton.ac.uk/building/32> <http://vocab.deri.ie/rooms#occupant> <http://id.southampton.ac.uk/> .
(That last URI is the identifier for the university). Then the question becomes:
SELECT ?thing WHERE { ?thing a rooms:Building . ?thing rooms:occupant <http://id.southampton.ac.uk/> . }
Which is a tiny bit more work, but much more future-proof.
App Builders are Lazy
Well, I am. So people will only do just enough to work. The first version of an app may well use naive solutions of the all-things-of-type-X pattern, as it’ll solve their immediate problem. When it starts to break, they’ll look for a new pattern and so it’d be good if data providers made sure there’s solid triples giving these simple facts.
There’s a really cheap-and-cheerful alternate solution, which is to solve the problem at the ‘graph’ level. ie. state that all the buildings in triples you get from <http://id.southampton.ac.uk/dataset/places/latest> have certain properties. This is handy for hacking, but lousy for data aggregation.
This came about as I was thinking about making a version of my building finder web-app which would aggregate together buildings from Southampton, Oxford & Lincoln. I realised I could easily do that, but it doesn’t give me a way to indicate who each building belongs to as the assumption breaks down when we merge the data.
In the short term, there’s some value in aggreeing that an App accepts a target RDF URL, and should show everything it understands. If there’s a SPARQL endpoint then maybe the owners need to write a CONSTRUCT query to give that app what it needs. This isn’t an ideal solution, but it works.
I think right now it’s just important for us to notice what assumptions we make. RDF & SPARQL are not like Tables & SQL. There’s some new techniques to learn…
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.