A question of policy
March 18, 2011
by Christopher Gutteridge
To make this site sustainable we’re going to have to work out some policies about scope. The student-run Southampton Open Wireless Network Group (SOWN) have produced a dataset about their wireless nodes, and the council has more data sources we could wrap into the site (eg. number of spaces in carparks).
This leads to a number of interesting policy questions which I’ve not got an easy answer for.
- What data should we host on data.southampton.ac.uk (ie. allow it to be the primary source of the data and host a copy of the data dump)?
- What should we allow (or insist) use id.southampton.ac.uk URIs?
- Is data about the council a special case?
- What data should we list as part of the data catalog?
- What data should we import into the triple store?
- What data should we recommend (via links)?
Right now it’s easy to say yes to lots of things, but we need to think about the future maintenance too.
I’m currently thinking that what we should do is, for now, say yes council and other useful local data such as SOWN under sections ‘6’ and ‘5’ above only, with the intention later of having a 2nd ‘authoratative’ triple store which only imports our authoratative datasets.
SOWN is a good test case as it’s a grey area. It’s a university society run by university members, but certainly not part of the university administration. As it’s coming from the owners of the data it *is* authoratative, but it’s not authoratative AND published by University of Southampton.
Best dataset for the job
I’m also running into the question of how to divide data between datasets, for example I’ve got
- points of service & opening hours for SUSU and catering provided from the catering manager
- menus for catering points of service, provided by the catering manager
- I’m hoping to get daily menus for a few catering points of service provided by the catering manager
- I’ve got opening hours for the theatre bar provided by their manager
- I’ve got menus for the theatre bar (from their menu!)
- Opening hours for local amenities (provided by a small group of postgrad volunteers)
- Student services points of service and hours, provided by the university student services and therefore authoratative
- Waste & recycle points (currently run by the student volunteers but we hope to hand that over to the authoratative source)
- Transport points such as the travel office, bike racks, parking etc. which were created by the student volunteers, but now are being curated by the data owner (the transport office).
- List of vending machines, sourced from our contractors, via catering, and then annotated with building numbers by me.
- Bus stops, taken from a list provided by the council.
It’s really hard to work out if these should be one dataset each, or if not how to deal with them. Do I move the data out of the amenities (student sourced) dataset when rows of data are taken over by the data owner? Should I have an ‘authoratative university of southampton’ dataset including everything that is thus, and a non-authoratative amenities dataset? Also, the bigger the dataset, the more often it’ll need to be republished.
I am almost certainly going to make the ‘todays menu’ dataset separate due to it having to be updated daily.
A key reason to use separate datasets has been to filter things. I think it makes more sense to include this in the data itself than rely on the dataset. My current thinking is that we should rearrange the data to be based around provenance so;
- Authoratative Services including buildings & estates & catering and menus and vending machines.
- Todays Menus (because they change so fast), it’s a daily ammendum to the previous set.
- Nuffield Theatre Bar times & menus (authoratative, but not from the University)
- Non-authoratative (Colin-sourced) amenities
- Bus Stops
Menus for the local coffee shop and the nearest pubs (Brewed Awakening, Crown, Stile) can be included in the non-authoratative datasets.
It leads to a change in some underlying technology for me as currently each dataset only contains one “type” or record, eg. a set of prices OR a set of points-of-service.
Hopefully once we settle on a workable pattern for this it’ll save other people making the same false starts we have.