July 21, 2015
by Ash Smith
One of the biggest drawbacks of open data is that if the data is truly open, there is no reliable way of measuring its use. One of the most common questions we get is “what about analytics?” We have no answer… for our data to be open, it must be freely available and under a license that allows reuse and, specifically, republishing. Once we put it on the web we can track who downloads individual datasets from our domain, but once the data has been downloaded we can’t tell where it goes from there. It may get downloaded by someone who looks at it and deletes it, thinking it useless, or it may end up becoming part of a massively successful startup business. This is actually more of a problem than it seems, because a lot of people who use words like ‘impact’ and ‘synergy’ would rather have an average service that’s measurable than a fantastic service that isn’t.
But although we can’t tell exactly how much of our data is being used, or how much people rely on our tools, we have ways of making educated guesses, and I’ll cover three important ones here.
1. Someone complains if it breaks
I wrote whatweekisit around September 2013, just before the change of academic year. The odd thing about the academic year, at least at Southampton, is that the year starts on a Thursday, which means that there’s always one week per calendar year spread across two academic years. The period between Monday and Wednesday of this week is known as Week 52, and the Thursday onwards is known as Week 0 of the new year. I made absolutely sure this happened correctly in whatweekisit.
Come September 2014 and I got a ticket on our bug report system. Not just that, but I’ve also had an email and a coffee-room discussion about how whatweekisit has gone wrong. I inform all of them that this is, in fact, the way the academic year works, and whatweekisit is correct to split the week into two in this way. But in doing so, I feel satisfaction in the knowledge that enough people are using my service that I get multiple complaints when it behaves unexpectedly.
2. Someone hacks it
It’s a known fact that the more popular something is, the more of a target it becomes. Frankly, if I was writing a piece of malware that I wanted to infect as many machines as possible, I’d write it for Windows rather than Linux, purely because it’s a more common desktop operating system, not because I like programming Windows. So it’s for this reason that you see MySQL exploits, Drupal vulnerabilities and Internet Explorer flaws in the news all the time, but don’t hear anything about malware targeted at triplestores, for example. So it’s a testament to the popularity of a platform or service when people are writing malware for it, or trying to exploit it in some way.
Some years ago, Chris actually built several nonsense datasets – one asserting that all people in the world named Dave are the same person, for example – in order to illustrate this problem. I actually look forward to the day that someone works out how to do a SPARQL injection attack, as it’ll mean that linked open data is being taken seriously enough that people want to abuse it.
3. It wins an award!
We run a multi-award-winning service. The most recent award we received was for innovation in catering, a field I personally know nothing about. For the last few months I’ve been perfecting a mutually beneficial system for the University’s catering department. Last year the law changed and all catering outlets in the UK now need to provide accurate information on what allergens are contained within their food. My system takes a spreadsheet, filled in at source by a chef, and converts it into RDF. Once it’s in RDF, it’s used to generate the catering website, some printable menus in Word format, and, more recently, some nice digital signs around the staff club showing today’s menu. We get up-to-date data on the food being served in the University, and Catering get their menus completely automated and remain within the law, as all food allergens are clearly printed on the menus, and the only human effort involved is a chef updating a spreadsheet once a day, something that already happened before I got involved. Whenever I explain the system to someone in my team it’s met with a kind of “is that all?” sort of look, but to a Catering department, who do not program computers for a living, it’s a dream come true. I for one didn’t really comprehend that until I was stood on a stage in a posh hotel in London being congratulated by several hundred catering professionals.
This list is in no way scientific, or exhaustive, and hopefully we’ll have more evidence of success to add in the future.