Southampton Open Data Blog

New Formats

March 25, 2011
by Christopher Gutteridge

New ways to enjoy our data.

We’ve added some links to the “Get the Data” box which let you see what formats are available. Some pages let you download RDF, others you can get back as tabular data, suitable for loading into Excel, amongst other things. Roughly speaking, pages about things have RDF versions, pages about lists of things (places, buildings etc) have a tabular download available.

eg.

Grasping the nettle and changing some URIs

March 24, 2011
by Christopher Gutteridge

We’ve realised that using UPPER CASE in some URIs looked fine in a spreadsheet but makes for ugly URLS, and if we’re stuck with them, we want them to look nice.

Hence I’ve taken an executive decision and renamed the URIs for all the Points of Service from looking like this

http://id.southampton.ac.uk/point-of-service/38-LATTES

to this

http://id.southampton.ac.uk/point-of-service/38-lattes

meaning the URL is now

http://data.southampton.ac.uk/point-of-service/38-lattes.html

This actually matters, as these are going to become the long term web pages for the catering points of service, so aesthetics are important, and “If t’were to be done, t’were best done quickly”.

We’ve seen lots of visitors as a result of the Register Article, which is nice. (we saw a 10x increase in visitors, so that’s good)

I’ve just added in the lunchtime menu for the Nuffield. They are not yet quite taking ownership of their data, but that’s just a case of getting them some training. I’ve also talked today to the manager of the on-campus book shop to see if they want to list some prices and products. I’m thinking they could do well to list the oddball stuff they sell like memory sticks & backpacks.

Mostly I’m preparing to tidy up the back-end code — it needs to be a bit more slick and logical, more on this later.

Also today our very own Nigel Shadbolt is featured in the first ever edition of the Google Magazine. (It’s a PDF!)

We are featured in The Register

March 22, 2011
by Christopher Gutteridge

I recently had the slightly scary experience of giving an interview to the Register, along with my old friend John Goodwin. I appear to have made it onto the frontpage of the site, along with my comment about how much I hate to see people still using PDF to simulate A4 paper in documents never destined to be printed.

Knowing that The Register tends to quickly puncture pretentiousness, I did my best to be as straight-talking as I could. The article has come out well, but with slightly more colourful language than I’d have used talking to the BBC!

The Register: Southampton Uni shows way to a truly open web.

A question of policy

March 18, 2011
by Christopher Gutteridge

To make this site sustainable we’re going to have to work out some policies about scope. The student-run Southampton Open Wireless Network Group (SOWN) have produced a dataset about their wireless nodes, and the council has more data sources we could wrap into the site (eg. number of spaces in carparks).

This leads to a number of interesting policy questions which I’ve not got an easy answer for.

  1. What data should we host on data.southampton.ac.uk (ie. allow it to be the primary source of the data and host a copy of the data dump)?
  2. What should we allow (or insist)  use id.southampton.ac.uk URIs?
  3. Is data about the council a special case?
  4. What data should we list as part of the data catalog?
  5. What data should we import into the triple store?
  6. What data should we recommend (via links)?

Right now it’s easy to say yes to lots of things, but we need to think about the future maintenance too.

I’m currently thinking that what we should do is, for now, say yes council and other useful local data such as SOWN under sections ‘6’ and ‘5’ above only, with the intention later of having a 2nd ‘authoratative’ triple store which only imports our authoratative datasets.

SOWN is a good test case as it’s a grey area. It’s a university society run by university members, but certainly not part of the university administration. As it’s coming from the owners of the data it *is* authoratative, but it’s not authoratative AND published by University of Southampton.

Best dataset for the job

I’m also running into the question of how to divide data between datasets, for example I’ve got

  • points of service & opening hours for SUSU and catering provided from the catering manager
  • menus for catering points of service, provided by the catering manager
  • I’m hoping to get daily menus for a few catering points of service provided by the catering manager
  • I’ve got opening hours for the theatre bar provided by their manager
  • I’ve got menus for the theatre bar (from their menu!)
  • Opening hours for local amenities (provided by a small group of postgrad volunteers)
  • Student services points of service and hours, provided by the university student services and therefore authoratative
  • Waste & recycle points (currently run by the student volunteers but we hope to hand that over to the authoratative source)
  • Transport points such as the travel office, bike racks, parking etc. which were created by the student volunteers, but now are being curated by the data owner (the transport office).
  • List of vending machines, sourced from our contractors, via catering, and then annotated with building numbers by me.
  • Bus stops, taken from a list provided by the council.

It’s really hard to work out if these should be one dataset each, or if not how to deal with them. Do I move the data out of the amenities (student sourced) dataset when rows of data are taken over by the data owner? Should I have an ‘authoratative university of southampton’ dataset including everything that is thus, and a non-authoratative amenities dataset? Also, the bigger the dataset, the more often it’ll need to be republished.

I am almost certainly going to make the ‘todays menu’ dataset separate due to it having to be updated daily.

A key reason to use separate datasets has been to filter things. I think it makes more sense to include this in the data itself than rely on the dataset. My current thinking is that we should rearrange the data to be based around provenance so;

  • Authoratative Services including buildings & estates & catering and menus and vending machines.
  • Todays Menus (because they change so fast), it’s a daily ammendum to the previous set.
  • Nuffield Theatre Bar times & menus (authoratative, but not from the University)
  • Non-authoratative (Colin-sourced) amenities
  • Bus Stops

Menus for the local coffee shop and the nearest pubs (Brewed Awakening, Crown, Stile) can be included in the non-authoratative datasets.

It leads to a change in some underlying technology for me as currently each dataset only contains one “type” or record, eg. a set of prices OR a set of points-of-service.

Hopefully once we settle on a workable pattern for this it’ll save other people making the same false starts we have.

Jargon FIle

March 15, 2011
by Christopher Gutteridge

I’ve added a new dataset;

It’s semi-crowd sourced; I’ll give any member of iSolutions, or other professional services, the ability to edit it. It could use a search tool similar to the phonebook, but we’ll get to that at some point.

Improvements to the Embedable Map Tool

March 13, 2011
by Christopher Gutteridge

I’ve added an option for ‘terrain’ instead of map/satellite. This only works when a bit more zoomed out than the other views.

More importantly, I’ve added numbered placemarkers. This only works for buildings with a simple one or two digit number. If it ever becomes massively popular we’ll build a custom placemark generator.

View an example: Full Screen

Where does the Money Go?

March 12, 2011
by Christopher Gutteridge

After many battles with excel, pivot tables and the IBM “Many Eyes”s site, I’ve had a go at visualising our Payments Dataset. I’m now an armchair auditor!

Please note that I am far an expert in working with such data so the below graphs should not be considered “official” data from the university as I may have made mistakes in my processing. The data is not entirely complete as it contains no payments to individuals, and nothing commercially sensitive.

Here’s who we’ve paid money to in that dataset… I had to trim the data down to payments of £10K+ as otherwise it seemed to crash their java!

U. of Southampton Spending by company 10K+ Many Eyes

This shows a break down of the broad categories and sub categories of what we paid money for.

Where does our (U of Southampton) Money Go? Many Eyes

I hope that we’ve got some budding statisticians, accountants or data visualisers who can do something better than me!

One cool idea; find out what payees we have in common with the local hospitals and council:

Interview on

March 11, 2011
by Christopher Gutteridge

I’m very please with this interview published on semanticweb.com, it represents what I said pretty clearly, and gets the message out that I planned – this is a part of business as usual, not a gimmick,.

More RDF

March 11, 2011
by Christopher Gutteridge

I’ve improved the back-end tools which provide RDF when you request a .rdf or .ttl file. By default the system just gives an facts which have the current resourece as the start of the fact. This sort of sucks as when looking at a building it’ll tell you BUILDING-X  is within SITE-Y and BUILDING-X is called “The building of advanced science stuff”. What it won’t do is give any that go backwards, eg. if I know ROOM-Z is within BUILDING-X, it won’t mention that by default.

So I’ve mae a way to make it relatively easy to add this information. I can also tell it to follow several hops to find all the useful information. The art is going to be, for each class of item in our system, working out the balance between utilitity and brevity. The very simple rule of thumb is to get all the information you need to display an HTML page about that thing.

And that example leads to another point, I really need to give an example of every type of data item under the hood. This site is all iceberg-like right now. Only I know for sure what lurks in the SPARQL… I’ll get to it, I promise.

Friends, Romans, Countrymen…

…send me your data. But maybe don’t hurry, as I’ve got a back-log already! Yesterday I got an email about some data we have a legal obligation to publish… am I the right person? I guess I am! But it’s not my only responsibility and I had to put all my other work on the back-burner to get this site up, so things will now move slower but always forward. Maybe a little sideways.

What I won’t except are things which we have no hope in hell of keeping up to date, so we only really want data which is already someones job to maintain. I’ve made a couple of exceptions, most notable that the building position and footprint data is created by volunteers — however this data moves very slowly. We’ll learn what works as we go, this has never been done before!

Research data is a world of complicated and awesome all by itself. We’ll never add it to this site. It will want a very different form of collection and curation. If there’s research data that you want to publish right now, and it’s not crazy big, I recommend you put it in eprints.soton.ac.uk – this will give it metadata, a license and a permenent home on the university web.

RDF Bus Stop Data

March 10, 2011
by Christopher Gutteridge

The individual Bus Stop data is now available as RDF…

http://data.southampton.ac.uk/bus-stop/SNA19777.rdf

That took way longer than I expected to set up!