I’m writing this post to highlight a recurring anti-pattern I’ve seen when people new to the open data field are asked to come up with a project for a coursework, group project, or hackathon.
What happens time and time again is that people set their heart on an idea which requires data which is just not available. At http://data.southampton.ac.uk/ we make every effort to provide all available data in a timely and linked way. If it’s not there it’s because we don’t have access to it, we don’t have the right to publish it, or it doesn’t exist.
Often we are requested data such as class timetables, and this is getting very close to data about people, and therefore we are far more cautious about it, as we have a duty to protect our students’ privacy. We hope one day to find a way to provide secure & consent based API access to such data, but it’s a bit of a pipedream.
Other datasets people have requests just don’t exist. This summer our open data intern was sent on a mission to create a dataset of all building entrances, as amazingly there was no such dataset that we could locate! Our “Buildings and Estates” department are very helpful, but their system thinks in terms of site/building/room, so we had to build our own dataset which was a lot of time and effort, but worth it as buildings and building entrances hardly alter year on year. You can see the new building entrances layer on Ash’s interactive university map. (Click them to see a photo!)
If wishes were open data we’d all have full harddrives, but they’re not.
The lesson here is that we need to better communicate to open data newbies that it is unwise to plan a project which requires data you don’t know to be available. If it’s not available, it’s virtually certain there will not be enough time to make it available in the hours, weeks or months of your project.
We need to teach our students and hackathon participants:
Don’t start from the application you want to build and looking for open data that “should be”.
Start with the data that is there, and invent the application that can be!