January 24, 2013
by Ash Smith
Over the last few weeks, Patrick has been exploring the university’s central data store looking for information on rooms and the features they contain. We’ve always had room features on data.southampton.ac.uk, but they were all generated from a single XML file given to Chris some years ago, and things change over time. So thanks to Pat’s fearless efforts investigating the central Oracle database, we now have a couple of scripts to pull not only room features, but booking information as well. A quick RDF generation script from me later, and we now have a method of ensuring the open data is as up to date as the university’s central database.
This is quite a big deal in my opinion – anyone planning a lecture or event can now view room information from the web and work out which rooms are suitable and available at the required time without having to phone Estates or walk across campus in the rain. Also, updating our data after such a long time is interesting for noting how things change over time; if nothing else, audio/visual technology is improving while chalk blackboards are definitely getting rarer!
October 15, 2012
by Ash Smith
My name may not be familiar to people who follow this blog, so I’ll introduce myself first of all. I’m Ash, and I’m the new member of staff employed here at Southampton purely to manage the University’s linked open data. My official title, according to HR, is a ‘Data Management Specialist’, but Chris has been referring to me as a ‘Lodmaster’, which I think sounds far cooler. I used to work in ECS as a research fellow and did my PhD with Wendy Hall as my supervisor. I’m a lifelogger, and if you want some of my less thought-out comments, I can sometimes be found on Twitter as @DrAshSmith.
Apart from me, there are a few other additions to the site this week. Firstly, as users of the now-deprecated ECS EPrints system should be aware, the University has a central EPrints repository. So we now have a new regularly updated data set which maps users of the old system (and their papers) to the new system. The raw data is a massive list of owl:sameAs triples. On the subject of owl:sameAs, a new unofficial app that uses our data has been added to the Apps page of the site. SameAs.org is a service for finding equivalent URIs, and we now have a custom version of this service which looks for equivalent URIs according to our data only.
June 12, 2012
by Christopher Gutteridge
We’ve added a new dataset which adds Ordnance Survey style Easting and Northing data to everything which currently has a latitude and longitude (but only for items for which we are authoritative – University Buildings but not Bus-stops, basically.
Maybe this is useful, let us know if it is.
See the post on the Webteam Blog for the nitty gritty about how this works.
September 26, 2011
by Christopher Gutteridge
The “Org” dataset has been updated with data provided by our HR department on the new university structure, and the proper codes for everything. Where possible we’re using the two character alpha code (eg. FP for ECS) in the URI, but for the lower levels we’re using the full 10 charater code from the HR database, (eg. F7FP090000 for Web & Internet Science). This provides canonical IDs for all parts of the university, which is nice.
Over the summer we’ve also added data on
- The location of student offices for each Faculty
- The location of the “deanery’ for each Faculty (although these usually are an entire floor, and our system doesn’t really do much with the concept of floors, yet)
- For faculties which have provided the data, a list of the buildings they ‘occupy’ (defined as controlling space in, rather than just having a member in that building). This has been provided by some faculties at the faculty level, and others at the Academic Unit level. We are happy to accept either. If we could complete this dataset it would be a very powerful resource.
- Where we can, we’ve added the homepage for that part of the University. We’ve not done this below the top two levels, but adding the data is very easy if people want it.
The spreadsheet in the get-the-data box on the organisation page is slightly unsual as I’ve set it to collect data we don’t show in the page. In addition to the URIs for the org. element and it’s parent element, and the label, it also includes a field for the code (FP or P2 or F5AF010000) — I figure that might be really useful for admin staff.
- Names of org. elements: This data belongs to HR and corrections should be sent to them. If they accpet it then I’ll update our data (please copy me the acceptance). Later, we’ll automatically source this data direct from their database.
- URLs for org. elements: data.southampton currently own this dataset, so it’s very easy to update. Just send us the correction to email@example.com — I don’t anticipate much confusion so anyone is welcome to send us these.
- Space Occupied by org. element — This must be sent to us by an official admin for the group, unit or faculty concerned. If it’s a list for lots of org. elements, such as every group in a faculty, please get in tocuh as firstname.lastname@example.org so we can send you a template spreadsheet to save us having to translate into the codes we use.
I should have done this ages ago, but the job got lost down the back of a sofa.
Every Sunday morning the EPrints datasets are now refreshed from the live archives.
July 17, 2011
by Christopher Gutteridge
First of all, I’ve also just added a new post over at the web team blog which might be interesting to our readers on the data blog, if you’ve ever been confused about the relationship between Open Data, Linked Data and RDF Data.
Secondly, I’ve just added in the sameAs links between our bus-stop data and data.gov.uk. I should have done this months ago, but kept forgetting. It’s up now and I imagine Hugh Glaser will import them into the sameAs.org service which will allow you to discover our data on Southampton bus-stops by resolving the government ID for a bus-stop in sameas.org (maybe we’ll link to a demo, as I don’t think I explained that very well!)
** UPDATE **: Turns out my sameAs links were wrong, but Colin has created a full set which also links our codes for train stations and I’ve added in the airport. I’ve published it as a separate linkset.
Lastly, I asked a few keys staff for comments about the value of Open Data, and here’s a great one:
The Open Day map, based on open data, amazed so many of our visitors, is was great example of how our leading edge research has translated into a very real an practical application, second only to Soton Bus!
– University of Southampton Pro Vice-Chancellor Education, Professor Debra Humphris
The open days pages aren’t actually linked from the data.southampton homepage; but they aren’t secret, just only valuable for the period of the now-passed event.
July 13, 2011
by Christopher Gutteridge
In celebration, I’ve added a new bus routes page to better navigate this data.
If you look deep in the data, sometimes the data identifies the exact vehicle which is coming.
I admit the RDF is shonky, is anybody working on an ontology about that should get in touch!
June 23, 2011
by Colin R Williams
Let me introduce Colin Williams, a postgraduate student who has been doing lots of interesting stuff to help the Open Data service. I’ve asked him to contribute to this blog. Over to you Colin…
Recently, I have been assisting the data.southampton.ac.uk team in gathering geographic data for their site. By geographic data, I am referring to the latitude and longitude of campus buildings and services.
This can be a simple point reference, as in:
<http://id.southampton.ac.uk/point-of-service/sculpture-hepworth-1968> geo:lat 50.935436
<http://id.southampton.ac.uk/point-of-service/sculpture-hepworth-1968> geo:long -1.398055
or it could be an outline of a building (or site) footprint, as in:
<http://id.southampton.ac.uk/building/32> dct:spatial “POLYGON((-1.3960952 50.9368069,-1.3958352 50.9368250,-1.3956962 50.9360329,-1.3959562 50.9360148,-1.3960952 50.9368069))”
One of the surprising discoveries made by the data.southampton.ac.uk team during their data gathering was the lack of any geographic data held by Estates and Facilities. So, I set out to gather this data… [Editors Note: Our Estates & Facilities service do have all the geo data they need, but it's not very useful to the open data project as they just don't need a reference lat/long point.]
First stop, Google Maps. Google allows users to create their own maps, by overlaying points and polygons on their maps (or their satellite imagery). Their tool is easy to use, using a web interface to add points (and polygons) to the map. This data can then be exported, as a .kml file, which we can easily convert to a form that can be imported into data.southampton.ac.uk.
This started off fine, until I started to think more about the licencing of the data. I had read in the past that, due to the copyright held by Google (or their mapping providers) over their map data, contributors to OpenStreetMap aren’t allowed to use Google’s data to determine the location of entities.
2. Restrictions on Use. Unless you have received prior written authorization from Google (or, as applicable, from the provider of particular Content), you must not:
(e) use the Products in a manner that gives you or any other person access to mass downloads or bulk feeds of any Content, including but not limited to numerical latitude or longitude coordinates, imagery, and visible map data;
So, that rules out the use of Google as a data source.
As its name suggests, OpenStreetMap is an open data street map, with its data being available under the CC BY-SA licence. OpenStreetMap is a great example of a collaborative, wiki-style geographic application. We could re-use their data, however, we wanted to generate authorative data, without making huge, possibly unnecessary changes to the OpenStreetMap data simply in order to achieve our goal. So, let’s look somewhere else. (I should probably contribute some of our building outlines back to OpenStreetMap when I find some time.)
The Ordnance Survey is Great Britain’s national mapping agency, which, in recent years, has released some open products. Confusingly, they seem to have two ‘Open’ products which could be relevant to our task.
Well, it seems that this the data used on OS OpenSpace is licensed under the ‘OS OpenData terms’, which ‘include’ the Open Government Licence.
However, the OpenSpace FAQs include this entry:
2.1 I am using OS OpenSpace to create a database of location based information. Does Ordnance Survey own this?
When you use OS OpenSpace to geocode data by adding locations or attributes to it that have been directly accessed from and/or made available by Ordnance Survey mapping data, then the resulting data is ‘derived data’, because it is derived from Ordnance Survey data.
Ordnance Survey would own such ‘derived data’, but we grant you a non-exclusive, personal licence to use it within your web application. Please refer to the definition of ‘Derived Data’ and Clause 5.4 of the OS OpenSpace Developer Agreement.
Well that’s not what we want. But, how about the data, that is under the Open Government Licence?
The OS OpenData site holds a variety of geographical datasets. For example, Code-Point Open is a dataset containing the latitude and longitude of 1.7 million postcodes, whilst OS VectorMap District is a vector based map of Great Britain. Unfortunately it’s not quite detailed enough to show individual buildings, which is what we’re really after.
So, the product we’re after is OS Street View (not to be confused by a similarly named, but completely different product offered by Google).
Can we use this data? The FAQ (which is in PDF format) has this to say:
11 Am I able to reuse “derived data” created from the OS OpenData products?
The licence allows the reuse of the derived data created from any OS OpenData products for commercial and non-commercial use. For more information on terms and conditions, read the OS OpenData Licence at www.ordnancesurvey.co.uk/opendata/licence.
OK, so we have found some mapping data that we are allowed to use. Is it in an easy-to-use form? Of course not, its in raster format. In other words, it’s a bitmap image (or rather, a series of images, each covering a 5km by 5km patch of Great Britain). How can we easily extract the information we need from these images?
Merkaartor describes itself as an OpenStreetMap editor for Unix, Windows and Mac OS X. It turns out that we can use it to export data rather than uploading that data to OpenStreetMap.
By default, Merkaartor has a number of data sources installed. In order to use the OS OpenData maps, we add http://os.openstreetmap.org/ as a data source, which uses the OS Street View data mentioned earlier.
All that remains to be done is to trace the shapes on the map and then export the data, as KML, which we then convert into a simple CSV file to be imported into data.southampton.ac.uk.
The data that has been generated as part of this process is available in the buildings and places dataset, and you can see it in use on the University’s open data map (which I have also been developing).
Thanks, Colin. I’ll just wrap this up by saying that University of Southampton Buildings & Estates will one day probably take over curation of this data, and they are aware of this work. They are happy to let us worry about it for the time being. This is fine with me as buildings don’t move much. Colin has done all of this for fun in his own time. I hope the other data.xxx.ac.uk projects are lucky enough to get some helpers like this. Be ready with a plan of how to let people help if they offer!
May 27, 2011
by Christopher Gutteridge
On Wednesday I gave a well-recieved talk to the university ‘Digital Economy’ research group (a virutal group containing people from all over the university).
Yesterday I had the fun problem of lots of people getting in touch with ideas! For the next couple of months I still can’t put my full focus on the Open Data, but here’s some of the interesting things going on behind the scenes:
- Facilities / Equipment dataset to describe our cool toys. I’ve got people interesting in contributing to this from all over the university. You can see a preview here. The idea is to help the left hand know what resources the right hand has, and who’s allowed to use them. I’ve had provisional interest in this from medical imaging, the high voltage lab, the nano cleanrooms, archaeology, civil engineering and chemistry.
- Disabled Go reports – someone pointed me at this site which has detailed reports on disabled access for 98 of our buildngs. Most of the data is too detailed to map into RDF, but what I was hoping to do is (1) just provide a link to the reports for each building from our data and /building/ pages. That alone gets far more value out of it and maybe (2) pull out the headline data, eg “has disabled loo”, “allows guidedogs”. We’ve been in touch with them and it sounds like they are pretty postitive about the idea. I still need their permission to provide that information under OGL or another open license.
- Catering have updated all the menus to include coffee & other hot drinks (it was missing before), after noticing the the opendatamap didn’t have any results for searching for ‘coffee’ (the horror). Problem is, the menu says “Filter (Large)” now so still no match for coffee! We’ll either rename it to “Filter Coffee (Large)” or consider adding a “Hidden Labels” field to help searches.
I got asked what the success criteria for the Open Data project was. This is very difficult to define but for me it will be when the open-data-service is so much part of business-as-usual that people on longer want an enthusiastic hacker running it! I’m looking forward to talking about the good ‘ole days when open data was a new frontier and nobody even had an ontology for coffee types or bus timetables yet.
The Open Data is starting to get put to use to:
- People are using the bus times pages (I need to make the interface better, I know!)
- Our upcoming campus mobile phone app will use some of the location data
- I’ve been asked how the service could aid with student induction– eg. help people find what’s available, and where it is.
The other thing ticking along is getting live hookups to databases. Right now it’s all done with one-off dumps, we want to be showing the living data. The dump-and-email approach is fine for getting started but now it’s time to do the far less glamorous job of making the back-end more automated. I’m still working on getting energy use data per building, and I’ve a lead on recycling data!
One final thing, you may notice that the Open Data Map is now not quite as pretty, there’s a good reason for this. We noticed that we may not own data traced using the Google Maps, so Colin has re-created all the data from the ordnance survey instead. There is slightly less detail, but the functionality is all still there.
The slides from my talk are available on EdShare. I’ve never uploaded to EdShare before — they’ve done a really great job at making a streamlined submit process. It’s far better than anything I’ve used in EPrints before, and I say this as the person who designed the EPrints 3.0 submit workflow!
March 12, 2011
by Christopher Gutteridge
After many battles with excel, pivot tables and the IBM “Many Eyes”s site, I’ve had a go at visualising our Payments Dataset. I’m now an armchair auditor!
Please note that I am far an expert in working with such data so the below graphs should not be considered “official” data from the university as I may have made mistakes in my processing. The data is not entirely complete as it contains no payments to individuals, and nothing commercially sensitive.
Here’s who we’ve paid money to in that dataset… I had to trim the data down to payments of £10K+ as otherwise it seemed to crash their java!
This shows a break down of the broad categories and sub categories of what we paid money for.
I hope that we’ve got some budding statisticians, accountants or data visualisers who can do something better than me!
One cool idea; find out what payees we have in common with the local hospitals and council: