[April 1st Gag] PDF selected as Interchange Format

April 1, 2011
by Christopher Gutteridge

The following article is our prank for April 1st.

Just to be clear PDF is a dreadful format to exchange data in. It was inspired, in part, by The Register wesbsite running the following picture and quote. Yes, I did say that, but I was talking about research and data communication.

It was fun working out how to make our site output PDF versions of the data, and we’ll leave those as available, but no longer the default. Also, I’ve now linked in the “.svg” format which is basically the same as the PDF.

Hopefully this gave a few people a chuckle.

*** *** ***

We have had many complaints that RDF is complicated, unsupported and makes it difficult to control how people will reuse your data.

With this in mind, we have taken a big decision: PDF (Portable Document Format) has been selected as our preferred format for exchanging data on the data.southampton.ac.uk site.

Many of the data.southampton team felt we should listen to the pro-PDF comments on the forum for the recent Register Article about Open Data in Southampton.

PDF is widely recognised as one of the most accessible document formats available today, and is ideally suited to both the publication and importing of data because of its ability to accurately maintain the layout of complex data sets in the browser on the desktop, and via printed hard copy. The immaturity of the Linked Data community means that there are still considerable technical overheads involved in the publication and use of data represented in less well supported formats, such as RDF or XML.

When we compared the number of search results PDF has when compared with RDF the decision became far easier to justify.

Henceforth, the preferred method for both importing and exporting data from the site will be PDF. We will continue to provide other formats such as CSV & XML for the time being, but with a clear goal of removing these options as soon as is practical.

From May 1st onward we will only accept and export data in PDF and HTML formats. This allows us much more control and flexibility over how our data is presented. Data providers will be able to supply the Southampton OpenData team with data via PDF documents, or as printouts that we can scan and convert to PDF, and we will know exactly how to deal with it. To make things even easier, people will even be able to use the networked scanners anywhere on campus to directly upload data. Data providers at remote sites will be able to fax their data in.

As well as PDF, we are also working with owners of very large databases on an application that will allow them to dump their data into a view resembling a spreadsheet view; we will then republish this data via an interface a little like Google Maps. This will allow users to cast their eye over very large datasets and then zoom in to data values that look particularly interesting. We hope this will particularly enthuse library staff, as it is bringing a familiar micro-fiche style user interface to the web of open data.

Extending 4store

For now, we will be continuing to use 4store as our database server, but we have significantly improved on the default interface by adding a “PDF” output mode which users will find familiar.

Examples:

Our extension will be made available, on request, under an open source license.

PDF Descriptions of Resources

Many of the resources in the site will now be available to download as PDF in addition to HTML, just by changing “.html” to “.pdf”. Look out for the “Get the data!” box on many pages which will offer a link to the PDF format.

Module described in PDF
Where to buy booze (popular with some students!)

Real-time PDF data!

The most valuable data of all is accurate and up to date, and we are now able to do this in a way you’ve never seen before! We’ve already created an HTML page for every bus-stop in the city, but that’s only in HTML format, which is well known to be inferior to PDF.

http://data.southampton.ac.uk/bus-stop/SNA19777.html

Imagine you’re at a bus-stop and want to know when the next bus is, now all you need to do is download the following link into your phone and view it in the mobile PDF viewer of your choice, and hey-presto! – realtime bus data direct to you on your handset!

http://data.southampton.ac.uk/bus-stop/SNA19777.pdf

Positive Reactions

So far all the feedback we have had has been massively positive. One user of data.southampton said

“I’m so glad they have done this, and it’s easy to switch too, all I needed to do was change a “R” to a “P” – simples!”

Professor Nigel Shadbolt and Professor Sir Tim Berners-Lee were unavailable to comment as they are currently at the WWW2011 Conference, but we are confident they will have a very strong reaction when they hear about the decision.

Categories: Uncategorized.

30 thoughts on “[April 1st Gag] PDF selected as Interchange Format”

James Munro says:

April 1, 2011 at 7:33 am

The careful rationale you’ve set out for this change is compelling.
We’ve decided to follow best practice and upgrade our existing legacy xml api to server the more widely used pdf format by default also.

Reply
uoccou says:

April 1, 2011 at 7:48 am

Phew !

Reply
Denny Vrandecic says:

April 1, 2011 at 8:21 am

I think it is a very smart and unavoidable step. Basically all major research conferences have already switched to PDF as a default format for accepting research results, like for example the WWW2011 that is mentioned in the end of the article. Thank you for boldly going this step! I will re-evaluate the projects I am involved in with this new perspective in mind.

Reply
Richard Cyganiak says:

April 1, 2011 at 8:29 am

You are dropping RDF in favour of PDF??? Pure stupidity. A much better replacement would be RTF.

Reply
Andrew Paul Landells says:

April 1, 2011 at 9:03 am

I’m very disappointed with the PDFs as they currently stand. I assume the plain style with Computer Modern Roman is only a temporary design decision. Linked data could be much more ‘sexy’ if a graphic designer were involved in the production of the data. I look forward to seeing font choices and nice images on future documents! As it stands currently, there’s no way I could print these as a glossy brochure for placing on coffee tables around campus.

Reply
Colin Williams says:

April 1, 2011 at 9:22 am

This is clearly a great improvement to the open data service. However, I would suggest that for each row, the source of the data should be referenced in the usual manner. Does the PDF output mode support this?

I expect that the live bus data will become particularly popular in this format, as users become more familiar with its layout.

Reply
Richard Reid says:

April 1, 2011 at 9:24 am

APRIL FOOLS!

Reply
Nigel Shadbolt says:

April 1, 2011 at 9:29 am

where is the fax number – I’ve get data to upload!

Reply
stuart says:

April 1, 2011 at 10:14 am

what’s wrong with parchment?

Reply
Adrian Short says:

April 1, 2011 at 10:20 am

Many’s a time I’ve thought — Why mess around with all these triplestores and SPARQL endpoints when you could just open up Adobe Reader?

It’s bundled with most PCs, too.

Good luck with this. I hope your data will be available in both A4 and US Letter sizes for maximum compatibility.

Reply
Martin says:

April 1, 2011 at 10:40 am

Ha, like good AFJs this is actually quite believable, and in a way, almost sensible. My concern is that someone, somewhere, will see this, and actually use it as justification for switching to PDF…

Reply
JamesR says:

April 1, 2011 at 10:47 am

We ought to send a PDF of commiseration to data.gov ??

Reply
Matt Palmer says:

April 1, 2011 at 10:48 am

Could you include an option to split the dataset into separate PDFs for each data item? This would make it easier to share just the bits of data you want to with others.

Reply
John S. Erickson, Ph.D. says:

April 1, 2011 at 11:14 am

This is an impressive and compelling step forward; congratulations!

Note that I tried but failed to post this comment in PDF…

Reply
David Pidsley says:

April 1, 2011 at 11:22 am

As someone who has, I’m sorry to say, had problems with fixing the font face and colours in RDF, I’m glad to hear that others will no longer be able to mess up the look and feel of my data after I’ve published it. Thanks.

Reply
Philip Hunter says:

April 1, 2011 at 11:28 am

How the aliens will snigger 🙂

Reply
Lemesmer says:

April 1, 2011 at 12:39 pm

Some Government departments have been using this philosophy for many years, hence the reluctance to ever move to RDF. #onestepahead

Reply
Mark Braggins says:

April 1, 2011 at 1:49 pm

Why haven’t Word documents been considered? Word offers a highly versatile ‘save as’ facility. Is this decision an attempt to monopolise open data with a single proprietary format?

Reply
Terry Payne says:

April 1, 2011 at 2:07 pm

Surely PDF provides only the derived model? Wouldn’t Postscript provide a better axiomatisation… you could then reason using the PS->PDF distiller ???

Reply
Steve Peters says:

April 1, 2011 at 2:09 pm

Could you bring back frisbee-net too please. Broadband is so over-rated when compared to throwing DVDs across the office.

Reply
Rob says:

April 1, 2011 at 2:10 pm

We have some students to tippex out the little leg of the R in RDF, to transcode to PDF, on all our printouts .

Reply
Andy Turner says:

April 1, 2011 at 2:30 pm

I think this is fun, but it is now afternoon where I am, so I will resist taking this any further 🙂

Reply
Mr. Gunn says:

April 1, 2011 at 2:41 pm

If I write my articles in Comic Sans, You can be sure that’s how I want it read, too! PDF allows me to do this. Winning.

Reply
Alex Wade says:

April 1, 2011 at 2:50 pm

Microsoft Research is working on an add-in that will allow you to type an RDF triple in Word (using natural language, if you like), and to export each triple as its own PDF. Hope this is useful. (Unfortunately, there will be no Kinect integration yet with Google Motion in v.1.)

Reply
anonymouos says:

April 1, 2011 at 3:26 pm

great well done, the next step is to drop PDF and adopt JSON 🙂

Reply
tim finin says:

April 2, 2011 at 5:43 pm

The meta joke is that for the past ten years Adobe has has stored metadata in PDF using XMP (http://bit.ly/hJUWo5) which is commonly serialized and stored as RDF.

Reply
Tom Blake says:

April 4, 2011 at 3:55 pm

I personally prefer .WTF!

Reply
- Christopher Gutteridge says:
  
  April 4, 2011 at 5:10 pm
  
  We considered WTF, but it was rejected as not being well enough supported, and some people found it offensive. 🙂
  
  Reply
eLearning@Ed 2011 – Liveblog » Nicola Osborne says:

April 4, 2011 at 9:34 pm

[…] Southampton Data Blog: PDF selected as Interchange Format […]

Reply
PigBlog » Blog Archive » Open Data says:

May 12, 2011 at 8:46 pm

[…] We hear a lot about open data these days. Organisations embrace it differently- Southampton University have a huge amount available in machine-readable formats, others think that posting a PDF is the way to go, despite the fact that it isn’t. […]

Reply

News and Ideas from the Southampton Open Data Team

Southampton Open Data Blog