Our open data service, data.soton.ac.uk, has been around for a long time now. Most of our sister services at other UK universities have been de-invested and are gone or limping along quietly. Currently, we still have a full time open data specialist, Dr Ash Smith.
What matters to our “decision makers” isn’t open data. It’s reputation, student satisfaction, sustainability and saving money. Our open data service enables most of these, with the possible exception of sustainability. I’ve been thinking about how to reframe what we actually have from these perspectives, rather than from an idealist early adopter viewpoint.
What we’ve actually built is a corporate knowledge graph that only contains public information, and primarily contains current information. Our archives actually have what the menu was for lunch in the staff restaurants, and student piazza, every single day for the last few years. Nobody cares. It’s just what is for lunch today that’s in the live knowledge graph (aka. SPARQL database aka triplestore).
What has open data done for us, anyway?
Having this information all in one place has enabled some valuable services to be produced by our own team. As it was already cleared as open data, there’s no hassle getting permission to use it to make a new service, even though it contains data from several different parts of the organisation.
The crown jewel of the services it enables is maps.soton.ac.uk, but there’s a number of others. The actual “open data” can be viewed as just another service enabled by this knowledge graph. One of the easily missed, but useful features is the ability to view and download simple lists of things, like buildings or parts of the organisation, and to view a page per, er, thing with a summary of information available about that thing. Of these pages, the most valuable is the pages for rooms used for student teaching. These are linked to by the timetable system so are a part of our information infrastructure now.
The open data has enabled several useful developments. Primarily excellent maps of campus produced by Colin Williams (PhD Student) and Chris Baines (Undergraduate). The problem with these maps is that they were so useful we needed to support them when they left and the best approach for us was to nick all the good ideas but rebuild our map from scratch. The current map.southampton.ac.uk wouldn’t exist without their work, which only happened because the data was and is open, so they could play.
Another innovation Colin inspired, was the augmentation of corporate data. Our university didn’t have a good database of building shapes (as a polygon of lat/long points), reference lat/long per building, photo of each building etc. Colin started work producing these as datasets which augmented the official data we get from Planon and since then we’ve hired summer interns to maintain and improve this. This includes the lat/long and a photo of most building entrances. Which wasn’t that much work for interns to create, and needs little curation as not many entrances move or change each year. Once the entrances have ID codes, we can then start to make more data about them, such as which entrance is best to get to a lecture room.
Where we’ve seen less return on our investment is in providing resolvable URIs that give data on a single entity. These return RDF and the learning curve is too sharp for casual users. I’ve spoken to people using regular expressions to extract data from an RDF/XML page, and that is a mismatch between what our users need and what we provide.
Sadly, organisation open data services have not caught on. Yet. It’s still not normal, and I suspect Open Data is just starting it’s way up the “Slope of Enlightenment“. The recent work on UK Goverment open registers is a great example. It’s simple and knows what it’s there to do. It’s learned lessons from data.gov.uk and gov.uk and it’s built on a really well designed API model, that unless you look you wouldn’t notice how simple and elegant it is. It’s a normal and sensible thing for any government to provide in the digital age. It provides official lists of things and the codes for those things. This is simple and valuable, like having a standard voltage for mains, and the same shaped plus, or train tracks in Southampton and London being on the same gauge. It’s clearly good sense, but didn’t happen by luck.
Our work on the open data service has also taught us loads and I’m proud to have helped lead a session at Open Data camp in Belfast, which produced a document listing crowd sources solutions to improving open data services, and a few years back Alex Dutton (data.ox.ac.uk) and I produced a similar document listing our experiences in dealing with the challenges of setting up an open data service. I’m really proud of both of those. The meta-skill I’ve learned is to be more introspective, both as an individual, and as a community, so we can work out what we’ve learned and share it effectively. Hey, like this blog post! Meta!
Where are we now?
Where we’ve stalled is that we now have all the corporate data that’s practical to get, so new datasets and services are becoming more rare. One of our more recent additions was a list of “Faith-related locations in Southampton“. Which has value to both current students and students considering moving to the city, but from a technical point of view was an identical dataset to the one listing “waste pickup points” for the university. With the exception that a picture of a place of worship is usually quite nice, and a picture of a bin store is… less so.
Over the summer 2017 we had our intern, Edmund King (see this blog for his experiences) experiment with in-building navigation tools. The conclusion was that the work to create and maintain such information for the university estate was to expensive for the value it would provide. When we did real tests we discovered lots of quirks like “that door isn’t to be used by students” or “that internal door is locked at 5pm”, and these all massively complicate the costs of providing a good useful in-building navigation planner. Nice idea, but it can’t be skunkworks, and that’s a perfectly good outcome.
As new datasets are getting rarer, we’ve been looking more at improving rather than expanding. Part of this has been work to harden each part of the service, and get it running on better-supported infrastructure. The old VMs Edward and Bella have lots of legacy and cruft. The names come from the fact Edward used to do all the SPARQL but then the SPARQL moved to Bella. I suggested Beaufort and Edythe as names for the new servers but that’s mostly got me funny looks.
Another part of our current approach is the shocking move to retire datasets! Now we’re focused on quality over quantity, the “locations of interest for the July 2015 open day” dataset needs to just go away. It’s not been destroyed, just removed from public view as not-very-helpful. There’s also a few other datasets that seemed a good idea at the time but are more harmful than useful as they are woefully out of date, like our “list of international organisations we work with” that’s about 6 years out of date.
Where do we go from here?
The biggest issue is “how do we move forward as a service” or maybe even “should we?”. My current feeling is that yes, we should, but focusing with the knowledge graph to enable joined-up and innovative solutions, with open data as just another service depending on that, not the raison d’être for the project. Open data, done right, will continue to enable our staff and students to produce better solutions than we could have thought of and which we can sometimes incorporate back into our offerings. Last year a student project, on the open data module, produced a facebook chatbot you could ask questions about campus and it would give answers based on your current location, eg. if you asked it “where can I get a coffee” it would identify that “coffee” was a product or service in our database, look at points of service that provided it, filter out ones that were not currently open, and send you a list of suggestions starting with the one physically closest to you. I investigated the complexities of running it for real, and found it was a bit brittle, needing 3rd party APIs and lots of nursing to understand the different ways people ask questions. Also, there’s big data protection implications in asking where people are and what they want in a machine readable way!
The point is that the open data stimulates innovation. Not as much as we like, and it doesn’t do our job as uni-web-support for us, just helps us find ways to do it better.
Long term I think the service needs to stop being a side-project. We should strip back everything that we can’t justify, and just have a knowledge graph be part of our infrastructure, like biztalk. We then turn the things built on top into normal parts of IT infrastructure. Ideally the pages for services, rooms, buildings etc. would merge into the normal corporate website, but this raises odd issues. We have been asked what the value is in providing a page on a shed. For me, it’s obvious, and that makes me bad at explaining it.
We could keep a separate “innovation graph” database which included zany new ideas, and sometimes broke, but the core graph database should be far more strictly managed, with new datasets being carefully considered and tested that they don’t break existing services.
What does the future hold?
In the really long term, well structured, auto-discoveable open data should be the solution to the 773 frustration. If you look at the right-hand side of that diagram almost everything is lists of structured information. That information isn’t special, either. It’s information many other organisations would provide, and with the same basic structure. One day maybe we can have nice discoverable formats for such information and get over using human-readable prose documents to convey it. We did a bit of work early on about suggesting standards for such information from organisations, but this was trying to answer a question that nobody was yet asking. I still think that time will come and when it does we’ll look back and laugh at how dumb 2018 websites were, still presenting information as lists in HTML. The middle ground is schema.org, with which I have a bit of a love-hate thing going. It’s excellent, but answering the wrong question. It helps you get your data into Google. I don’t want a my data needlessly mediated by corporations, but I get most people don’t really care so much about that.
The good news is once people have seen something done a sensible interoperable way it’s hard to go back. I can’t imagine people buying a house with just “Apple” sockets that didn’t fit normal appliances. Then again, computer systems are less compatible now than 10 years ago, so who knows for sure?
I’m optimistic that eventually we’ll achieve some sea-change moment in structured data that will be impossible to backtrack from. But such “luck” requires a lot of work, and we may fail many times before we succeed.
We didn’t quite change the world with data.southampton, but the by-products are valuable enough to easily have returned on the investment.
Cheers Chris, interesting and challenging and self-challenging blog. Loads of data in the NHS!