Southampton Open Data Blog

Interview with Christopher Gutteridge

April 14, 2011
by Christopher Gutteridge

There’s an interview with Christopher Gutteridge (me!) on this weeks Ubuntu UK Podcast.

(If you’re wondering, runs on virtual machine running Ubuntu)

Actually, it’s worth giving a shout out to the technologies we use, but I’ll save that for a future post.

[April 1st Gag] PDF selected as Interchange Format

April 1, 2011
by Christopher Gutteridge

The following article is our prank for April 1st.

Just to be clear PDF is a dreadful format to exchange data in. It was inspired, in part, by The Register wesbsite running the following picture and quote. Yes, I did say that, but I was talking about research and data communication.

It was fun working out how to make our site output PDF versions of the data, and we’ll leave those as available, but no longer the default. Also, I’ve now linked in the “.svg” format which is basically the same as the PDF.

Hopefully this gave a few people a chuckle.

*** *** ***

We have had many complaints that RDF is complicated, unsupported and makes it difficult to control how people will reuse your data.

With this in mind, we have taken a big decision: PDF (Portable Document Format) has been selected as our preferred format for exchanging data on the site.

Many of the data.southampton team felt we should listen to the pro-PDF comments on the forum for the recent Register Article about Open Data in Southampton.

PDF is widely recognised as one of the most accessible document formats available today, and is ideally suited to both the publication and importing of data because of its ability to accurately maintain the layout of complex data sets in the browser on the desktop, and via printed hard copy. The immaturity of the Linked Data community means that there are still considerable technical overheads involved in the publication and use of data represented in less well supported formats, such as RDF or XML.
When we compared the number of search results PDF has when compared with RDF the decision became far easier to justify.

Henceforth, the preferred method for both importing and exporting data from the site will be PDF. We will continue to provide other formats such as CSV & XML for the time being, but with a clear goal of removing these options as soon as is practical.

From May 1st onward we will only accept and export data in PDF and HTML formats. This allows us much more control and flexibility over how our data is presented. Data providers will be able to supply the Southampton OpenData team with data via PDF documents, or as printouts that we can scan and convert to PDF, and we will know exactly how to deal with it. To make things even easier, people will even be able to use the networked scanners anywhere on campus to directly upload data. Data providers at remote sites will be able to fax their data in.

As well as PDF, we are also working with owners of very large databases on an application that will allow them to dump their data into a view resembling a spreadsheet view; we will then republish this data via an interface a little like Google Maps. This will allow users to cast their eye over very large datasets and then zoom in to data values that look particularly interesting. We hope this will particularly enthuse library staff, as it is bringing a familiar micro-fiche style user interface to the web of open data.

Extending 4store

For now, we will be continuing to use 4store as our database server, but we have significantly improved on the default interface by adding a “PDF” output mode which users will find familiar.


Our extension will be made available, on request, under an open source license.

PDF Descriptions of Resources

Many of the resources in the site will now be available to download as PDF in addition to HTML, just by changing “.html” to “.pdf”. Look out for the “Get the data!” box on many pages which will offer a link to the PDF format.

Real-time PDF data!

The most valuable data of all is accurate and up to date, and we are now able to do this in a way you’ve never seen before! We’ve already created an HTML page for every bus-stop in the city, but that’s only in HTML format, which is well known to be inferior to PDF.

Imagine you’re at a bus-stop and want to know when the next bus is, now all you need to do is download the following link into your phone and view it in the mobile PDF viewer of your choice, and hey-presto! – realtime bus data direct to you on your handset!

Positive Reactions

So far all the feedback we have had has been massively positive. One user of data.southampton said

“I’m so glad they have done this, and it’s easy to switch too, all I needed to do was change a “R” to a “P” – simples!”

Professor Nigel Shadbolt and Professor Sir Tim Berners-Lee were unavailable to comment as they are currently at the WWW2011 Conference, but we are confident they will have a very strong reaction when they hear about the decision.