New Formats
March 25, 2011
by Christopher Gutteridge
New ways to enjoy our data.
Weāve added some links to the āGet the Dataā box which let you see what formats are available. Some pages let you download RDF, others you can get back as tabular data, suitable for loading into Excel, amongst other things. Roughly speaking, pages about things have RDF versions, pages about lists of things (places, buildings etc) have a tabular download available.
eg.
Grasping the nettle and changing some URIs
March 24, 2011
by Christopher Gutteridge
Weāve realised that using UPPER CASE in some URIs looked fine in a spreadsheet but makes for ugly URLS, and if weāre stuck with them, we want them to look nice.
Hence Iāve taken an executive decision and renamed the URIs for all the Points of Service from looking like this
http://id.southampton.ac.uk/point-of-service/38-LATTES
to this
http://id.southampton.ac.uk/point-of-service/38-lattes
meaning the URL is now
http://data.southampton.ac.uk/point-of-service/38-lattes.html
This actually matters, as these are going to become the long term web pages for the catering points of service, so aesthetics are important, and āIf tāwere to be done, tāwere best done quicklyā.
Weāve seen lots of visitors as a result of the Register Article, which is nice. (we saw a 10x increase in visitors, so thatās good)
Iāve just added in the lunchtime menu for the Nuffield. They are not yet quite taking ownership of their data, but thatās just a case of getting them some training. Iāve also talked today to the manager of the on-campus book shop to see if they want to list some prices and products. Iām thinking they could do well to list the oddball stuff they sell like memory sticks & backpacks.
Mostly Iām preparing to tidy up the back-end code ā it needs to be a bit more slick and logical, more on this later.
Also today our very own Nigel Shadbolt is featured in the first ever edition of the Google Magazine. (Itās a PDF!)
We are featured in The Register
March 22, 2011
by Christopher Gutteridge
I recently had the slightly scary experience of giving an interview to the Register, along with my old friend John Goodwin. I appear to have made it onto the frontpage of the site, along with my comment about how much I hate to see people still using PDF to simulate A4 paper in documents never destined to be printed.
Knowing that The Register tends to quickly puncture pretentiousness, I did my best to be as straight-talking as I could. The article has come out well, but with slightly more colourful language than Iād have used talking to the BBC!
The Register: Southampton Uni shows way to a truly open web.
A question of policy
March 18, 2011
by Christopher Gutteridge
To make this site sustainable weāre going to have to work out some policies about scope. The student-run Southampton Open Wireless Network Group (SOWN) have produced a dataset about their wireless nodes, and the council has more data sources we could wrap into the site (eg. number of spaces in carparks).
This leads to a number of interesting policy questions which Iāve not got an easy answer for.
- What data should we host on data.southampton.ac.uk (ie. allow it to be the primary source of the data and host a copy of the data dump)?
- What should we allow (or insist) use id.southampton.ac.uk URIs?
- Is data about the council a special case?
- What data should we list as part of the data catalog?
- What data should we import into the triple store?
- What data should we recommend (via links)?
Right now itās easy to say yes to lots of things, but we need to think about the future maintenance too.
Iām currently thinking that what we should do is, for now, say yes council and other useful local data such as SOWN under sections ā6ā and ā5ā above only, with the intention later of having a 2nd āauthoratativeā triple store which only imports our authoratative datasets.
SOWN is a good test case as itās a grey area. Itās a university society run by university members, but certainly not part of the university administration. As itās coming from the owners of the data it *is* authoratative, but itās not authoratative AND published by University of Southampton.
Best dataset for the job
Iām also running into the question of how to divide data between datasets, for example Iāve got
- points of service & opening hours for SUSU and catering provided from the catering manager
- menus for catering points of service, provided by the catering manager
- Iām hoping to get daily menus for a few catering points of service provided by the catering manager
- Iāve got opening hours for the theatre bar provided by their manager
- Iāve got menus for the theatre bar (from their menu!)
- Opening hours for local amenities (provided by a small group of postgrad volunteers)
- Student services points of service and hours, provided by the university student services and therefore authoratative
- Waste & recycle points (currently run by the student volunteers but we hope to hand that over to the authoratative source)
- Transport points such as the travel office, bike racks, parking etc. which were created by the student volunteers, but now are being curated by the data owner (the transport office).
- List of vending machines, sourced from our contractors, via catering, and then annotated with building numbers by me.
- Bus stops, taken from a list provided by the council.
Itās really hard to work out if these should be one dataset each, or if not how to deal with them. Do I move the data out of the amenities (student sourced) dataset when rows of data are taken over by the data owner? Should I have an āauthoratative university of southamptonā dataset including everything that is thus, and a non-authoratative amenities dataset? Also, the bigger the dataset, the more often itāll need to be republished.
I am almost certainly going to make the ātodays menuā dataset separate due to it having to be updated daily.
A key reason to use separate datasets has been to filter things. I think it makes more sense to include this in the data itself than rely on the dataset. My current thinking is that we should rearrange the data to be based around provenance so;
- Authoratative Services including buildings & estates & catering and menus and vending machines.
- Todays Menus (because they change so fast), itās a daily ammendum to the previous set.
- Nuffield Theatre Bar times & menus (authoratative, but not from the University)
- Non-authoratative (Colin-sourced) amenities
- Bus Stops
Menus for the local coffee shop and the nearest pubs (Brewed Awakening, Crown, Stile) can be included in the non-authoratative datasets.
It leads to a change in some underlying technology for me as currently each dataset only contains one ātypeā or record, eg. a set of prices OR a set of points-of-service.
Hopefully once we settle on a workable pattern for this itāll save other people making the same false starts we have.
Jargon FIle
March 15, 2011
by Christopher Gutteridge
Iāve added a new dataset;
Itās semi-crowd sourced; Iāll give any member of iSolutions, or other professional services, the ability to edit it. It could use a search tool similar to the phonebook, but weāll get to that at some point.
Improvements to the Embedable Map Tool
March 13, 2011
by Christopher Gutteridge
Iāve added an option for āterrainā instead of map/satellite. This only works when a bit more zoomed out than the other views.
More importantly, Iāve added numbered placemarkers. This only works for buildings with a simple one or two digit number. If it ever becomes massively popular weāll build a custom placemark generator.
View an example: Full Screen
Where does the Money Go?
March 12, 2011
by Christopher Gutteridge
After many battles with excel, pivot tables and the IBM āMany Eyesās site, Iāve had a go at visualising our Payments Dataset. Iām now an armchair auditor!
Please note that I am far an expert in working with such data so the below graphs should not be considered āofficialā data from the university as I may have made mistakes in my processing. The data is not entirely complete as it contains no payments to individuals, and nothing commercially sensitive.
Hereās who weāve paid money to in that dataset⦠I had to trim the data down to payments of Ā£10K+ as otherwise it seemed to crash their java!
This shows a break down of the broad categories and sub categories of what we paid money for.
I hope that weāve got some budding statisticians, accountants or data visualisers who can do something better than me!
One cool idea; find out what payees we have in common with the local hospitals and council:
Interview on
March 11, 2011
by Christopher Gutteridge
Iām very please with this interview published on semanticweb.com, it represents what I said pretty clearly, and gets the message out that I planned ā this is a part of business as usual, not a gimmick,.
More RDF
March 11, 2011
by Christopher Gutteridge
Iāve improved the back-end tools which provide RDF when you request a .rdf or .ttl file. By default the system just gives an facts which have the current resourece as the start of the fact. This sort of sucks as when looking at a building itāll tell you BUILDING-X is within SITE-Y and BUILDING-X is called āThe building of advanced science stuffā. What it wonāt do is give any that go backwards, eg. if I know ROOM-Z is within BUILDING-X, it wonāt mention that by default.
So Iāve mae a way to make it relatively easy to add this information. I can also tell it to follow several hops to find all the useful information. The art is going to be, for each class of item in our system, working out the balance between utilitity and brevity. The very simple rule of thumb is to get all the information you need to display an HTML page about that thing.
And that example leads to another point, I really need to give an example of every type of data item under the hood. This site is all iceberg-like right now. Only I know for sure what lurks in the SPARQL⦠Iāll get to it, I promise.
Friends, Romans, Countrymenā¦
ā¦send me your data. But maybe donāt hurry, as Iāve got a back-log already! Yesterday I got an email about some data we have a legal obligation to publish⦠am I the right person? I guess I am! But itās not my only responsibility and I had to put all my other work on the back-burner to get this site up, so things will now move slower but always forward. Maybe a little sideways.
What I wonāt except are things which we have no hope in hell of keeping up to date, so we only really want data which is already someones job to maintain. Iāve made a couple of exceptions, most notable that the building position and footprint data is created by volunteers ā however this data moves very slowly. Weāll learn what works as we go, this has never been done before!
Research data is a world of complicated and awesome all by itself. Weāll never add it to this site. It will want a very different form of collection and curation. If thereās research data that you want to publish right now, and itās not crazy big, I recommend you put it in eprints.soton.ac.uk ā this will give it metadata, a license and a permenent home on the university web.
RDF Bus Stop Data
March 10, 2011
by Christopher Gutteridge
The individual Bus Stop data is now available as RDFā¦
http://data.southampton.ac.uk/bus-stop/SNA19777.rdf
That took way longer than I expected to set up!