Oooo, data
May 27, 2011
by Christopher Gutteridge
On Wednesday I gave a well-recieved talk to the university āDigital Economyā research group (a virutal group containing people from all over the university).
Yesterday I had the fun problem of lots of people getting in touch with ideas! For the next couple of months I still canāt put my full focus on the Open Data, but hereās some of the interesting things going on behind the scenes:
- Facilities / Equipment dataset to describe our cool toys. Iāve got people interesting in contributing to this from all over the university. You can see a preview here. The idea is to help the left hand know what resources the right hand has, and whoās allowed to use them. Iāve had provisional interest in this from medical imaging, the high voltage lab, the nano cleanrooms, archaeology, civil engineering and chemistry.
- Disabled Go reports ā someone pointed me at this site which has detailed reports on disabled access for 98 of our buildngs. Most of the data is too detailed to map into RDF, but what I was hoping to do is (1) just provide a link to the reports for each building from our data and /building/ pages. That alone gets far more value out of it and maybe (2) pull out the headline data, eg āhas disabled looā, āallows guidedogsā. Weāve been in touch with them and it sounds like they are pretty postitive about the idea. I still need their permission to provide that information under OGL or another open license.
- Catering have updated all the menus to include coffee & other hot drinks (it was missing before), after noticing the the opendatamap didnāt have any results for searching for ācoffeeā (the horror). Problem is, the menu says āFilter (Large)ā now so still no match for coffee! Weāll either rename it to āFilter Coffee (Large)ā or consider adding a āHidden Labelsā field to help searches.
I got asked what the success criteria for the Open Data project was. This is very difficult to define but for me it will be when the open-data-service is so much part of business-as-usual that people on longer want an enthusiastic hacker running it! Iām looking forward to talking about the good āole days when open data was a new frontier and nobody even had an ontology for coffee types or bus timetables yet.
The Open Data is starting to get put to use to:
- People are using the bus times pages (I need to make the interface better, I know!)
- Our upcoming campus mobile phone app will use some of the location data
- Iāve been asked how the service could aid with student inductionā eg. help people find whatās available, and where it is.
The other thing ticking along is getting live hookups to databases. Right now itās all done with one-off dumps, we want to be showing the living data. The dump-and-email approach is fine for getting started but now itās time to do the far less glamorous job of making the back-end more automated. Iām still working on getting energy use data per building, and Iāve a lead on recycling data!
Good times.
One final thing, you may notice that the Open Data Map is now not quite as pretty, thereās a good reason for this. We noticed that we may not own data traced using the Google Maps, so Colin has re-created all the data from the ordnance survey instead. There is slightly less detail, but the functionality is all still there.
The slides from my talk are available on EdShare. Iāve never uploaded to EdShare before ā theyāve done a really great job at making a streamlined submit process. Itās far better than anything Iāve used in EPrints before, and I say this as the person who designed the EPrints 3.0 submit workflow!
Interview with Christopher Gutteridge
April 14, 2011
by Christopher Gutteridge
Thereās an interview with Christopher Gutteridge (me!) on this weeks Ubuntu UK Podcast.
(If youāre wondering, data.soutampton.ac.uk runs on virtual machine running Ubuntu)
Actually, itās worth giving a shout out to the technologies we use, but Iāll save that for a future post.
[April 1st Gag] PDF selected as Interchange Format
April 1, 2011
by Christopher Gutteridge
The following article is our prank for April 1st.
Just to be clear PDF is a dreadful format to exchange data in. It was inspired, in part, by The Register wesbsite running the following picture and quote. Yes, I did say that, but I was talking about research and data communication.
It was fun working out how to make our site output PDF versions of the data, and weāll leave those as available, but no longer the default. Also, Iāve now linked in the ā.svgā format which is basically the same as the PDF.
Hopefully this gave a few people a chuckle.
*** *** ***
We have had many complaints that RDF is complicated, unsupported and makes it difficult to control how people will reuse your data.
With this in mind, we have taken a big decision: PDF (Portable Document Format) has been selected as our preferred format for exchanging data on the data.southampton.ac.uk site.
Many of the data.southampton team felt we should listen to the pro-PDF comments on the forum for the recent Register Article about Open Data in Southampton.
Henceforth, the preferred method for both importing and exporting data from the site will be PDF. We will continue to provide other formats such as CSV & XML for the time being, but with a clear goal of removing these options as soon as is practical.
From May 1st onward we will only accept and export data in PDF and HTML formats. This allows us much more control and flexibility over how our data is presented. Data providers will be able to supply the Southampton OpenData team with data via PDF documents, or as printouts that we can scan and convert to PDF, and we will know exactly how to deal with it. To make things even easier, people will even be able to use the networked scanners anywhere on campus to directly upload data. Data providers at remote sites will be able to fax their data in.
Extending 4store
For now, we will be continuing to use 4store as our database server, but we have significantly improved on the default interface by adding a āPDFā output mode which users will find familiar.
Examples:
- PDF query for a list of University Buildings
- PDF query for a list of Programmes taught at the University
Our extension will be made available, on request, under an open source license.
PDF Descriptions of Resources
Many of the resources in the site will now be available to download as PDF in addition to HTML, just by changing ā.htmlā to ā.pdfā. Look out for the āGet the data!ā box on many pages which will offer a link to the PDF format.
- Module described in PDF
- Where to buy booze (popular with some students!)
Real-time PDF data!
The most valuable data of all is accurate and up to date, and we are now able to do this in a way youāve never seen before! Weāve already created an HTML page for every bus-stop in the city, but thatās only in HTML format, which is well known to be inferior to PDF.
Imagine youāre at a bus-stop and want to know when the next bus is, now all you need to do is download the following link into your phone and view it in the mobile PDF viewer of your choice, and hey-presto! ā realtime bus data direct to you on your handset!
Positive Reactions
So far all the feedback we have had has been massively positive. One user of data.southampton said
āIām so glad they have done this, and itās easy to switch too, all I needed to do was change a āRā to a āPā ā simples!ā
Professor Nigel Shadbolt and Professor Sir Tim Berners-Lee were unavailable to comment as they are currently at the WWW2011 Conference, but we are confident they will have a very strong reaction when they hear about the decision.
New Formats
March 25, 2011
by Christopher Gutteridge
New ways to enjoy our data.
Weāve added some links to the āGet the Dataā box which let you see what formats are available. Some pages let you download RDF, others you can get back as tabular data, suitable for loading into Excel, amongst other things. Roughly speaking, pages about things have RDF versions, pages about lists of things (places, buildings etc) have a tabular download available.
eg.
Grasping the nettle and changing some URIs
March 24, 2011
by Christopher Gutteridge
Weāve realised that using UPPER CASE in some URIs looked fine in a spreadsheet but makes for ugly URLS, and if weāre stuck with them, we want them to look nice.
Hence Iāve taken an executive decision and renamed the URIs for all the Points of Service from looking like this
http://id.southampton.ac.uk/point-of-service/38-LATTES
to this
http://id.southampton.ac.uk/point-of-service/38-lattes
meaning the URL is now
http://data.southampton.ac.uk/point-of-service/38-lattes.html
This actually matters, as these are going to become the long term web pages for the catering points of service, so aesthetics are important, and āIf tāwere to be done, tāwere best done quicklyā.
Weāve seen lots of visitors as a result of the Register Article, which is nice. (we saw a 10x increase in visitors, so thatās good)
Iāve just added in the lunchtime menu for the Nuffield. They are not yet quite taking ownership of their data, but thatās just a case of getting them some training. Iāve also talked today to the manager of the on-campus book shop to see if they want to list some prices and products. Iām thinking they could do well to list the oddball stuff they sell like memory sticks & backpacks.
Mostly Iām preparing to tidy up the back-end code ā it needs to be a bit more slick and logical, more on this later.
Also today our very own Nigel Shadbolt is featured in the first ever edition of the Google Magazine. (Itās a PDF!)
We are featured in The Register
March 22, 2011
by Christopher Gutteridge
I recently had the slightly scary experience of giving an interview to the Register, along with my old friend John Goodwin. I appear to have made it onto the frontpage of the site, along with my comment about how much I hate to see people still using PDF to simulate A4 paper in documents never destined to be printed.
Knowing that The Register tends to quickly puncture pretentiousness, I did my best to be as straight-talking as I could. The article has come out well, but with slightly more colourful language than Iād have used talking to the BBC!
The Register: Southampton Uni shows way to a truly open web.
A question of policy
March 18, 2011
by Christopher Gutteridge
To make this site sustainable weāre going to have to work out some policies about scope. The student-run Southampton Open Wireless Network Group (SOWN) have produced a dataset about their wireless nodes, and the council has more data sources we could wrap into the site (eg. number of spaces in carparks).
This leads to a number of interesting policy questions which Iāve not got an easy answer for.
- What data should we host on data.southampton.ac.uk (ie. allow it to be the primary source of the data and host a copy of the data dump)?
- What should we allow (or insist) use id.southampton.ac.uk URIs?
- Is data about the council a special case?
- What data should we list as part of the data catalog?
- What data should we import into the triple store?
- What data should we recommend (via links)?
Right now itās easy to say yes to lots of things, but we need to think about the future maintenance too.
Iām currently thinking that what we should do is, for now, say yes council and other useful local data such as SOWN under sections ā6ā and ā5ā above only, with the intention later of having a 2nd āauthoratativeā triple store which only imports our authoratative datasets.
SOWN is a good test case as itās a grey area. Itās a university society run by university members, but certainly not part of the university administration. As itās coming from the owners of the data it *is* authoratative, but itās not authoratative AND published by University of Southampton.
Best dataset for the job
Iām also running into the question of how to divide data between datasets, for example Iāve got
- points of service & opening hours for SUSU and catering provided from the catering manager
- menus for catering points of service, provided by the catering manager
- Iām hoping to get daily menus for a few catering points of service provided by the catering manager
- Iāve got opening hours for the theatre bar provided by their manager
- Iāve got menus for the theatre bar (from their menu!)
- Opening hours for local amenities (provided by a small group of postgrad volunteers)
- Student services points of service and hours, provided by the university student services and therefore authoratative
- Waste & recycle points (currently run by the student volunteers but we hope to hand that over to the authoratative source)
- Transport points such as the travel office, bike racks, parking etc. which were created by the student volunteers, but now are being curated by the data owner (the transport office).
- List of vending machines, sourced from our contractors, via catering, and then annotated with building numbers by me.
- Bus stops, taken from a list provided by the council.
Itās really hard to work out if these should be one dataset each, or if not how to deal with them. Do I move the data out of the amenities (student sourced) dataset when rows of data are taken over by the data owner? Should I have an āauthoratative university of southamptonā dataset including everything that is thus, and a non-authoratative amenities dataset? Also, the bigger the dataset, the more often itāll need to be republished.
I am almost certainly going to make the ātodays menuā dataset separate due to it having to be updated daily.
A key reason to use separate datasets has been to filter things. I think it makes more sense to include this in the data itself than rely on the dataset. My current thinking is that we should rearrange the data to be based around provenance so;
- Authoratative Services including buildings & estates & catering and menus and vending machines.
- Todays Menus (because they change so fast), itās a daily ammendum to the previous set.
- Nuffield Theatre Bar times & menus (authoratative, but not from the University)
- Non-authoratative (Colin-sourced) amenities
- Bus Stops
Menus for the local coffee shop and the nearest pubs (Brewed Awakening, Crown, Stile) can be included in the non-authoratative datasets.
It leads to a change in some underlying technology for me as currently each dataset only contains one ātypeā or record, eg. a set of prices OR a set of points-of-service.
Hopefully once we settle on a workable pattern for this itāll save other people making the same false starts we have.
Jargon FIle
March 15, 2011
by Christopher Gutteridge
Iāve added a new dataset;
Itās semi-crowd sourced; Iāll give any member of iSolutions, or other professional services, the ability to edit it. It could use a search tool similar to the phonebook, but weāll get to that at some point.
Improvements to the Embedable Map Tool
March 13, 2011
by Christopher Gutteridge
Iāve added an option for āterrainā instead of map/satellite. This only works when a bit more zoomed out than the other views.
More importantly, Iāve added numbered placemarkers. This only works for buildings with a simple one or two digit number. If it ever becomes massively popular weāll build a custom placemark generator.
View an example: Full Screen
Where does the Money Go?
March 12, 2011
by Christopher Gutteridge
After many battles with excel, pivot tables and the IBM āMany Eyesās site, Iāve had a go at visualising our Payments Dataset. Iām now an armchair auditor!
Please note that I am far an expert in working with such data so the below graphs should not be considered āofficialā data from the university as I may have made mistakes in my processing. The data is not entirely complete as it contains no payments to individuals, and nothing commercially sensitive.
Hereās who weāve paid money to in that datasetā¦ I had to trim the data down to payments of Ā£10K+ as otherwise it seemed to crash their java!
This shows a break down of the broad categories and sub categories of what we paid money for.
I hope that weāve got some budding statisticians, accountants or data visualisers who can do something better than me!
One cool idea; find out what payees we have in common with the local hospitals and council: