Southampton Open Data Blog

[April 1st Gag] PDF selected as Interchange Format

April 1, 2011
by Christopher Gutteridge

The following article is our prank for April 1st.

Just to be clear PDF is a dreadful format to exchange data in. It was inspired, in part, by The Register wesbsite running the following picture and quote. Yes, I did say that, but I was talking about research and data communication.

It was fun working out how to make our site output PDF versions of the data, and we’ll leave those as available, but no longer the default. Also, I’ve now linked in the “.svg” format which is basically the same as the PDF.

Hopefully this gave a few people a chuckle.

*** *** ***

We have had many complaints that RDF is complicated, unsupported and makes it difficult to control how people will reuse your data.

With this in mind, we have taken a big decision: PDF (Portable Document Format) has been selected as our preferred format for exchanging data on the data.southampton.ac.uk site.

Many of the data.southampton team felt we should listen to the pro-PDF comments on the forum for the recent Register Article about Open Data in Southampton.

PDF is widely recognised as one of the most accessible document formats available today, and is ideally suited to both the publication and importing of data because of its ability to accurately maintain the layout of complex data sets in the browser on the desktop, and via printed hard copy. The immaturity of the Linked Data community means that there are still considerable technical overheads involved in the publication and use of data represented in less well supported formats, such as RDF or XML.
When we compared the number of search results PDF has when compared with RDF the decision became far easier to justify.

Henceforth, the preferred method for both importing and exporting data from the site will be PDF. We will continue to provide other formats such as CSV & XML for the time being, but with a clear goal of removing these options as soon as is practical.

From May 1st onward we will only accept and export data in PDF and HTML formats. This allows us much more control and flexibility over how our data is presented. Data providers will be able to supply the Southampton OpenData team with data via PDF documents, or as printouts that we can scan and convert to PDF, and we will know exactly how to deal with it. To make things even easier, people will even be able to use the networked scanners anywhere on campus to directly upload data. Data providers at remote sites will be able to fax their data in.

As well as PDF, we are also working with owners of very large databases on an application that will allow them to dump their data into a view resembling a spreadsheet view; we will then republish this data via an interface a little like Google Maps. This will allow users to cast their eye over very large datasets and then zoom in to data values that look particularly interesting. We hope this will particularly enthuse library staff, as it is bringing a familiar micro-fiche style user interface to the web of open data.

Extending 4store

For now, we will be continuing to use 4store as our database server, but we have significantly improved on the default interface by adding a “PDF” output mode which users will find familiar.

Examples:

Our extension will be made available, on request, under an open source license.

PDF Descriptions of Resources

Many of the resources in the site will now be available to download as PDF in addition to HTML, just by changing “.html” to “.pdf”. Look out for the “Get the data!” box on many pages which will offer a link to the PDF format.

Real-time PDF data!

The most valuable data of all is accurate and up to date, and we are now able to do this in a way you’ve never seen before! We’ve already created an HTML page for every bus-stop in the city, but that’s only in HTML format, which is well known to be inferior to PDF.

Imagine you’re at a bus-stop and want to know when the next bus is, now all you need to do is download the following link into your phone and view it in the mobile PDF viewer of your choice, and hey-presto! – realtime bus data direct to you on your handset!

Positive Reactions

So far all the feedback we have had has been massively positive. One user of data.southampton said

“I’m so glad they have done this, and it’s easy to switch too, all I needed to do was change a “R” to a “P” – simples!”

Professor Nigel Shadbolt and Professor Sir Tim Berners-Lee were unavailable to comment as they are currently at the WWW2011 Conference, but we are confident they will have a very strong reaction when they hear about the decision.

Categories: Uncategorized.

30 thoughts on “[April 1st Gag] PDF selected as Interchange Format

  1. James Munro says:

    The careful rationale you’ve set out for this change is compelling.
    We’ve decided to follow best practice and upgrade our existing legacy xml api to server the more widely used pdf format by default also.

  2. I think it is a very smart and unavoidable step. Basically all major research conferences have already switched to PDF as a default format for accepting research results, like for example the WWW2011 that is mentioned in the end of the article. Thank you for boldly going this step! I will re-evaluate the projects I am involved in with this new perspective in mind.

  3. You are dropping RDF in favour of PDF??? Pure stupidity. A much better replacement would be RTF.

  4. Andrew Paul Landells says:

    I’m very disappointed with the PDFs as they currently stand. I assume the plain style with Computer Modern Roman is only a temporary design decision. Linked data could be much more ‘sexy’ if a graphic designer were involved in the production of the data. I look forward to seeing font choices and nice images on future documents! As it stands currently, there’s no way I could print these as a glossy brochure for placing on coffee tables around campus.

  5. Colin Williams says:

    This is clearly a great improvement to the open data service. However, I would suggest that for each row, the source of the data should be referenced in the usual manner. Does the PDF output mode support this?

    I expect that the live bus data will become particularly popular in this format, as users become more familiar with its layout.

  6. Richard Reid says:

    APRIL FOOLS!

  7. Nigel Shadbolt says:

    where is the fax number – I’ve get data to upload!

  8. stuart says:

    what’s wrong with parchment?

  9. Adrian Short says:

    Many’s a time I’ve thought — Why mess around with all these triplestores and SPARQL endpoints when you could just open up Adobe Reader?

    It’s bundled with most PCs, too.

    Good luck with this. I hope your data will be available in both A4 and US Letter sizes for maximum compatibility.

  10. Martin says:

    Ha, like good AFJs this is actually quite believable, and in a way, almost sensible. My concern is that someone, somewhere, will see this, and actually use it as justification for switching to PDF…

  11. JamesR says:

    We ought to send a PDF of commiseration to data.gov ??

  12. Matt Palmer says:

    Could you include an option to split the dataset into separate PDFs for each data item? This would make it easier to share just the bits of data you want to with others.

  13. This is an impressive and compelling step forward; congratulations!

    Note that I tried but failed to post this comment in PDF…

  14. As someone who has, I’m sorry to say, had problems with fixing the font face and colours in RDF, I’m glad to hear that others will no longer be able to mess up the look and feel of my data after I’ve published it. Thanks.

  15. Philip Hunter says:

    How the aliens will snigger :)

  16. Lemesmer says:

    Some Government departments have been using this philosophy for many years, hence the reluctance to ever move to RDF. #onestepahead

  17. Mark Braggins says:

    Why haven’t Word documents been considered? Word offers a highly versatile ‘save as’ facility. Is this decision an attempt to monopolise open data with a single proprietary format?

  18. Terry Payne says:

    Surely PDF provides only the derived model? Wouldn’t Postscript provide a better axiomatisation… you could then reason using the PS->PDF distiller ???

  19. Steve Peters says:

    Could you bring back frisbee-net too please. Broadband is so over-rated when compared to throwing DVDs across the office.

  20. Rob says:

    We have some students to tippex out the little leg of the R in RDF, to transcode to PDF, on all our printouts .

  21. Andy Turner says:

    I think this is fun, but it is now afternoon where I am, so I will resist taking this any further :)

  22. Mr. Gunn says:

    If I write my articles in Comic Sans, You can be sure that’s how I want it read, too! PDF allows me to do this. Winning.

  23. Alex Wade says:

    Microsoft Research is working on an add-in that will allow you to type an RDF triple in Word (using natural language, if you like), and to export each triple as its own PDF. Hope this is useful. (Unfortunately, there will be no Kinect integration yet with Google Motion in v.1.)

  24. anonymouos says:

    great well done, the next step is to drop PDF and adopt JSON :)

  25. tim finin says:

    The meta joke is that for the past ten years Adobe has has stored metadata in PDF using XMP (http://bit.ly/hJUWo5) which is commonly serialized and stored as RDF.

  26. Tom Blake says:

    I personally prefer .WTF!

  27. [...] Southampton Data Blog: PDF selected as Interchange Format [...]

  28. [...] We hear a lot about open data these days. Organisations embrace it differently- Southampton University have a huge amount available in machine-readable formats, others think that posting a PDF is the way to go, despite the fact that it isn’t. [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>