Southampton Open Data Blog

Getting Real

December 18, 2012
by Christopher Gutteridge

Up until now the open data service has been run on a pretty much seat-of-our-pants approach. We’re actually at the point where one of our services, the events calendar,  really needs to graduate into a normal university service. It requires a little regular TLC to deal with broken feeds. There’s 74 feeds so some break now and then. They were always breaking, but now at least someone notices. I (Chris) recently attended a course on the University “Change management” process (which is basically getting sign-off to modify live services to reduce the impact and manage risk). I was pleasantly surprised to hear that the change management team actually use the events calendar to check if a change to live IT services might cause extra issues (eg. don’t mess with the wifi the weekend we’re hosting an international conference.

I always said that the success criteria for data.soton.ac.uk was that it becomes too important to trust me with (tongue in cheek, but not actually a joke). And, lo and behold, management has asked me to start looking at how to start the (long) journey to having it be a normal university service.

I feel some fear, but not panic.

I’ve been trying to think about how to divide the service into logical sections and consider them separately.

I’ve discussed the workflow for the system before, but here’s a quick overview again.

Publishing System: This downloads source data from various sources and turns it into RDF, publishes it to a web enabled directory then tells the SPARQL database to re-import it. This has just been entirely re-written by Ash Smith in command line PHP. An odd choice you might think, but it’s a language which many people in the university web systems team can deal with, so beats perl/python/ruby on those grounds. We’ve put it on github. The working title is Hedgehog (I forget why) but we’ve decided that each dataset workflow is a quill, which sounds nice.

SPARQL Database: This is 4 store. It effectively just runs as a cache of the RDF documents the publishing system spits out, it contains nothing that can’t be recreated from those files.

SPARQL Front End: This is a hacked version of ARC2’s SPARQL interface but it dispatches the reqests to the 4store. It’s much friendlier than the blunt minimal 4store interface. It also lets us provide some formats that 4store doesn’t, such as CSV.

URI Resolver: This is pretty minimal. It does little more than look at the URI and redirect you the the same path on data.soton. It currently does some content negotiation (decides if /building/23 should go to /building/23.rdf or /building/23.html) but we’re thinking of making that a separate step. Yeah, it’s a bit more bandwidth, but meh.

Resource Viewers: A bunch of PHP scripts which handle all the different type of resources, like buildings, products, bus-stops etc. These are a bit hacky and the apache configuration under them isn’t something I’m proud of. Each viewer handles all the formats a resource can be presented in (RDF, HTML, KML etc.)

Website: The rest of the data.soton.ac.uk website is just PHP pages, some of which do some SPARQL to get information

 

So here’s what I’m thinking about getting some of this managed appropriately by business processes.

As a first step, create a clone of the publishing system on a university server and move some of the most stable and core datasets there. Specifically the organisation structure: codes, names, and parent groups in the org-chart, and also the buildings data — just the name, number and what site they are on. These are simple but critical. They also happen to be the two datasets that the events calendar depends on and so would have to be properly managed dependencies before the calendar could follow the same route.

The idea of this 2nds data service, lets call it reliable.data.soton.ac.uk, is that it would only provide documents for each dataset, all the fun stuff would stay (for now) on the dev server, and I really don’t want to get iSolutions monekying around with SPARQL until they’ve got at least a little comfortable with RDF. The hedgehog instance on reliable.data would still trigger the normal “beta” SPARQL endpoint to re-import the data documents when they change.

We could make sure that the schema for these documents was very well documented and that changes were properly  managed, and could be tested prior to execution. I’m not sure how, but maybe university members could register an interest so that they could be notified of plans to change these. That would be getting value out of the process. For the buildings dataset, which is updated a few times a year, maybe even the republishing should have a prior warning.

The next step would be to move the event calendar into change management, and ensure that it only depended on the ‘reliable’ documents. This service is pretty static now in terms of functionality, although we’ve got some ideas for enhancements, these could be minor tweaks to the site, with the heavy lifting done on the ‘un-managed’ main data server.

Don’t get my wrong, I don’t love all this bureaucracy, but if open data services are to succeed they need to be embedded in business processes, not quick hacks.

Apps wanted!

December 13, 2012
by Ash Smith

Southampton’s Open Data is really gathering momentum now, and is being used for many things. Personally, I like the “cool stuff” approach, as it allows people to see what’s really possible with Open Data. Recent additions to our “cool stuff” are the university events page, which gathers event information from all over the university and makes it available in one searchable index, and the workstation locator which allows members of the university to locate an available iSolutions workstation nearby, using a GPS-enabled smartphone if they prefer. I’m currently liaising with the providers of the council’s live bus data in order to make sure that no existing apps break when the system goes live again, which should be in a few weeks time. I’m making it my top priority to not inconvenience application developers, and data integrity is something I take very seriously. After all, if I were to develop a cool app based on external data and a week later the data format changed for no good reason, I probably wouldn’t trust that data not to change again. So if you’ve developed an app that uses our bus data, please feel free to get in contact if you find it’s suddenly started behaving strangely and I’ll do everything I can to help. The system may go up and down in the coming weeks while we iron out some bugs, but the last thing I want is everyone having to re-implement their apps because of something we’ve done.

With cool apps in mind, I’d like to take this opportunity to publicise the university’s Open Data Competition, an initiative designed to try and encourage developers to use our data. If you can’t program, don’t worry, you can submit an idea for a cool app without having to actually develop it yourself. The competition also accepts visualisations of our data, so if you’re into statistics or making mash-ups, this may be your chance to impress the judges. There’s a ÂŁ200 Amazon voucher up for grabs for the winner of each category and ÂŁ100 vouchers for the runners up. Don’t feel you have to restrict your ideas to the data we provide, data is best when linked. It’d be really nice to see something that provides a useful service by combining our data with that of, say, the government or the police.

Times Higher Education Award

December 3, 2012
by Christopher Gutteridge

The University of Southampton won the award for “Outstanding ICT Initiative of the Year” for the open data service.

Personally I feel rather smug about this, as you can imagine, but while I may have worked my socks off, there’s a hell of  a lot of people who made it possible.

Obviously first of all is Professor Nigel Shadbolt & Dame Professor Wendy Hall for convincing the University it should have an open data service.

Next up is the team who created the origional ECS open data service,  Marcus Cobden, Alastair Cummings, Dr Nicholas Gibbins and Dr Colin Williams (who got his PhD last week).

Lots of general support from the Web and Internet Science research group and the members of the Enakting Project in particular.

There’s the project board; who’ve been very enthusiastic from the start; Malcolm Ace (Chief Operating Officer), Wendy Hall, Nigel Shadbolt, Debra Humphis (now sadly left the Uni to work for some place called “Imperial College”, sounds nice), Simon Peatfield (our head of Communications),  Hugh Davis (head of eLearning)  and Pete Hancock (our head of IT). The first meeting with this bunch had me really bloody scared but it went well, and they were all keen to see if we could prove this technology/approach in our day to day operations.

Dr Su White deserves special mention, as whenever I talked to any of the heads of services, it seemed she’d been chatting to them only a few days previous, talking up the benefits of open data.

Thanks to Paul Seabrooke in Buildings & Estates for help with navigating the subtleties of our list of buildings, and lots of other people in that department;  Jodie Barker and the energy team, and Neil Smith and the sustainability and recycling people, Adam Tewkesbury in the transport office (who was also part of a team shortlisted for a different THE Award).

A special James Leeming and his team in retain catering for being helpful, enthusiastic and patient when we’ve not yet delivered everything we promised.

In my own department, Tim Boardman who has now gone to some place up the road called “Oxford”, but was really helpful helping us learn to navigate the politics of databases in our University, Graham Robinson who did the cool feed which enables us to have workstations-in-use data. Lots of people who’ve given help, or had more work generated as a result of this project.

Nic Burns at the council, and both the previous and new real-time bus information contractors. We’re hoping to have that all up and running soon!

The Equipment sharing project team; Adrian Cox, Louise Payne, (and recently Adam Field has joined that mix), Don Spalinger, Hilary Smith, Pete Hancock (again), some helpful people from Finance who’s names elude me right now but are helping get things hooked up to their data.

The other open data projects around the UK have been a source of inspiration (and occasionally the only other people who understand the weird new challenges these projects bring). Mathieu D’Aquin (data.open.ac.uk) who I’ve not always agreed with but have learned lots in our discussions, Alex Bilbie and Joss Winn at http://data.lincoln.ac.uk/. And a big thank-you for Dave Flanders for creating the UK community of developers that has meant we’ve started sharing ideas and solutions rather than stay buried in our institutional silos.

(I knew this was a long list, but wow! We’re down to the last few now…)

Dave Challis who kept the triplestores up and happy and worried about details I wouldn’t have had time for.

The company Garlik has a number of ex-Southampton staff who’ve been very helpful with advice on good practice. I’ll be gracious and still thank them even though they went and hired Dave Challis away from us. (he seemed happy when we had lunch on Saturday, so maybe the real world isn’t so bad).

Gavin Costigan actually put together our entry, and evidently did a good job– we won!

Charles Elder is the member of Communications who accompanied us to the awards, and was reassuring when we were rather out of our depth.

Naomi & Caroline, My and Dave’s girlfriends, who have been “RDF Widows” on a number of occasions when we were working silly hours to get everything working.

Colin Williams. What can I say about Colin? I think he’s the reason we won the award, without all the stuff he built on top of the open data, plus the events calendar. He’s had an amazing week with both the awards show and then successfully defending his PhD the day after. I’m gutted he’s leaving, but I’m sure we’ll see each other at the occasional hack day.

A wave to my new immediate team mates Patrick McSweeney and Ash Smith who both joined the team this year, Ash as full time Open Data Service development and Patrick as a “replacement” for Dave, although his facial hair is different enough to avoid people getting confused.

I think my biggest thanks goes to Alex Dutton at data.ox.ac.uk for being the sounding board, friend, and rival that we needed to make data.southampton.ac.uk what it is. It’s fair to say that I can see aspects of my designs in data.ox.ac.uk, and of Alex’s in our service.

I’ve not included everyone who’s been a help, but this post is already nearly a thousand words, and past the TL;DR point, so I’m going to call it to a halt. Thanks to everybody, as a child I read science-fiction. Now I implement it.

Christopher Gutteridge, 2012.

 

New additions to data.southampton.ac.uk

October 15, 2012
by Ash Smith

Ash mugshotMy name may not be familiar to people who follow this blog, so I’ll introduce myself first of all. I’m Ash, and I’m the new member of staff employed here at Southampton purely to manage the University’s linked open data. My official title, according to HR, is a ‘Data Management Specialist’, but Chris has been referring to me as a ‘Lodmaster’, which I think sounds far cooler. I used to work in ECS as a research fellow and did my PhD with Wendy Hall as my supervisor. I’m a lifelogger, and if you want some of my less thought-out comments, I can sometimes be found on Twitter as @DrAshSmith.

Apart from me, there are a few other additions to the site this week. Firstly, as users of the now-deprecated ECS EPrints system should be aware, the University has a central EPrints repository. So we now have a new regularly updated data set which maps users of the old system (and their papers) to the new system. The raw data is a massive list of owl:sameAs triples. On the subject of owl:sameAs, a new unofficial app that uses our data has been added to the Apps page of the site. SameAs.org is a service for finding equivalent URIs, and we now have a custom version of this service which looks for equivalent URIs according to our data only.

Shortlisted for a Times Higher Education Award

September 6, 2012
by Christopher Gutteridge

Some very exciting news. I’m proud to say that Southampton have been short-listed for the Times Higher Education awards for “Outstanding ICT Initiative of the year” and the submission was for our work with data.southampton.ac.uk!

This may involve me having to wear a dinner jacket, which may get a chuckle from people who know my usual, er, style.

While I’ve worked very hard on the open data service, none of it would be possible without the help of dozens of people from all around the University, so it really is an award to the whole university. That said, I’m hoping I’m the one who gets the tasty dinner!

 

Unplanned Downtime

September 6, 2012
by Christopher Gutteridge

Some of the data.southampton.ac.uk related services have been unavailable this morning due to an unplanned power cut.

Sorry about that.

As the service becomes more important to the University, it’s clear that we need to make sure it’s as robust as possible, and reduce the risk of incidents like this in future.

We are hiring!

August 4, 2012
by Christopher Gutteridge

This is very exciting news.

The university has created a full time postion (initially 2 year fixed term) for data.southampton! This will involve taking the system towards maturity and “business as usual”. It’ll involve working closely with myself and Patrick.

I’m hoping to get someone enthusiastic about the technology and way it can improve how we all work, but with different skills to Patrick and I. My ideal candidate is the type of person who enjoys doing all the fiddling required to build a really good software package release. Part of the goal is to make open data not only practical for other organisations but actually easy.

I’m really chuffed that our university thinks its worth investing in Linked Data as infrastructure, not just as a research area.

Location: Highfield Campus
Salary: ÂŁ27,578 to ÂŁ33,884
Full Time Fixed Term
Closing Date: Sunday 19 August 2012
Interview Date: To be confirmed
Reference: 146112JF

More Information

Job Description and Person Specification [Word Document] — we will use the person specification to determine who gets the job, I anticipate we may know, or even be friends with, some of the applicants, so judging everybody by the person spec. helps keep it fair.

Easting & Northing

June 12, 2012
by Christopher Gutteridge

We’ve added a new dataset which adds Ordnance Survey style Easting and Northing data to everything which currently has a latitude and longitude (but only for items for which we are authoritative – University Buildings but not Bus-stops, basically.

If you get data from, say, http://data.southampton.ac.uk/building/59.rdf it now has Easting and Northing data in. I nicked the pattern from Ordinance Survey Postcode data documents.

Maybe this is useful, let us know if it is.

See the post on the Webteam Blog for the nitty gritty about how this works.

Launch of new University events calendar

May 29, 2012
by Christopher Gutteridge

The new University events calendar is now live and accessible via the current link http://www.events.soton.ac.uk/

The calendar was developed in conjunction with Electronics and Computer Sciences using Open Data as its foundation. It is automatically populated with events via RSS feeds from existing University websites, so minimal maintenance is required. Where a website does not have an RSS feed, staff can upload events manually via a SharePoint form http://www.southampton.ac.uk/submitevent/ This page also contains  a list of the feeds currently used.

For queries of further information please email digital@soton.ac.ukn

Under the hood…

The data is aggregated once per hour into RDF from an assortment of RSS feeds, and a few stray websites, and a Sharepoint Calendar, then presented as a pretty javascript driven website.

The data is also uploaded every hour to the open data service, get the data.

The source code is available from github, and was paid for by the University Communications department, and mostly built by Colin Williams with some support from me.

Joining it up

The website uses the open data service data to let you filter by campus (it links building number to campus number to campus name), and filter by divisions of the university, get the name of buildings from their number and the homepages of the schools and faculties.

Thing is… the open data about university division homepages has not been maintained since we created the list a year ago. It was still mostly correct but some had moved and a many divisions had been created, or merged and so forth.

The exciting thing is that there’s now value to the comms dept. to maintain this information as it provides them value. This may sound minor, but it means that there’s an incentive to the “right people” to maintain this data, and that’s always been part of the model we’ve been striving for!

There’s still a lot of missing features; rss, ical etc. We’re working on that.

data.ac.uk and some things to read

April 24, 2012
by Christopher Gutteridge

The really exciting news is that we’ve just registered data.ac.uk to act as a home for UK-wide data projects. If you want to contribute ideas, join the data.ac.uk mailing list.

This week is also the last chance to respond to the UK Government Consultation on Open Stanards. Large companies have reportedly been pushing their agenda in this consultation, but anybody is allowed to voice their support, objections or suggestions to the proposals.

I’ve published a couple of relevant blog posts on the webteam blog:

There’s also been two recent blog posts from peer-projects I’d like to recommend: