Southampton Open Data Blog

Licenses

License Verification Link

October 25, 2011
by Christopher Gutteridge

A helpful chap named Glyn, from http://data.linkedgov.org/ pointed out that it’s hard to verify that the council really gave us permission to republish their bus data under the license we claim.

As a really simple and cheap solution to this, all pages on our site using this data, eg. http://data.southampton.ac.uk/bus-stop/HA030013.html now have a link at the bottom with the following text:

Bus stops, routes & live data provided under the Open Government License by the Southampton City Council ROMANSE office. To verify, see the Confirmation Page.

The important part is the ROMANSE project have provided a nice simple license verification page on their site at http://www.romanse.org.uk/sotonshared.htm which allows a 3rd party to verify in seconds that we’re telling the truth and are in agreement with the data provider. The above link is a page which just reads:

Southampton ROMANSE shared data

The ROMANSE Data on Bus Stops, Bus Routes and Live Bus Times is available via the University of Southampton Open Data Service http://data.southampton.ac.uk/bus-routes.html, Under the Open Government License http://www.nationalarchives.gov.uk/doc/open%2Dgovernment%2Dlicence/

I suggest that republishing someone else’s data, with permission, this is a good practice to establish and much cheaper than any other alternative.

Bus Route Updates

July 13, 2011
by Christopher Gutteridge

The Southampton ROMANSE project has given us the go-ahead to put the Southampton bus times data under the OGL (Open Government License).

In celebration, I’ve added a new bus routes page to better navigate this data.

If you look deep in the data, sometimes the data identifies the exact vehicle which is coming.

I admit the RDF is shonky, is anybody working on an ontology about  that should get in touch!

Generating Open Geographic Data

June 23, 2011
by Colin R Williams

Let me introduce Colin Williams, a postgraduate student who has been doing lots of interesting stuff to help the Open Data service. I’ve asked him to contribute to this blog. Over to you Colin…

*****

Recently, I have been assisting the data.southampton.ac.uk team in gathering geographic data for their site. By geographic data, I am referring to the latitude and longitude of campus buildings and services.

This can be a simple point reference, as in:

<http://id.southampton.ac.uk/point-of-service/sculpture-hepworth-1968> geo:lat 50.935436

<http://id.southampton.ac.uk/point-of-service/sculpture-hepworth-1968> geo:long -1.398055

or it could be an outline of a building (or site) footprint, as in:

<http://id.southampton.ac.uk/building/32> dct:spatial “POLYGON((-1.3960952 50.9368069,-1.3958352 50.9368250,-1.3956962 50.9360329,-1.3959562 50.9360148,-1.3960952 50.9368069))”

One of the surprising discoveries made by the data.southampton.ac.uk team during their data gathering was the lack of any geographic data held by Estates and Facilities. So, I set out to gather this data… [Editors Note: Our Estates & Facilities service do have all the geo data they need, but it’s not very useful to the open data project as they just don’t need a reference lat/long point.]

Google Maps

First stop, Google Maps. Google allows users to create their own maps, by overlaying points and polygons on their maps (or their satellite imagery). Their tool is easy to use, using a web interface to add points (and polygons) to the map. This data can then be exported, as a .kml file, which we can easily convert to a form that can be imported into data.southampton.ac.uk.

This started off fine, until I started to think more about the licencing of the data. I had read in the past that, due to the copyright held by Google (or their mapping providers) over their map data, contributors to OpenStreetMap aren’t allowed to use Google’s data to determine the location of entities.

Time to check the Google Maps Terms of Use. Specifically, term 2.(e) states that:

2. Restrictions on Use. Unless you have received prior written authorization from Google (or, as applicable, from the provider of particular Content), you must not:

(e) use the Products in a manner that gives you or any other person access to mass downloads or bulk feeds of any Content, including but not limited to numerical latitude or longitude coordinates, imagery, and visible map data;

So, that rules out the use of Google as a data source.

OpenStreetMap

As its name suggests, OpenStreetMap is an open data street map, with its data being available under the CC BY-SA licence. OpenStreetMap is a great example of a collaborative, wiki-style geographic application. We could re-use their data, however, we wanted to generate authorative data, without making huge, possibly unnecessary changes to the OpenStreetMap data simply in order to achieve our goal. So, let’s look somewhere else. (I should probably contribute some of our building outlines back to OpenStreetMap when I find some time.)

Ordnance Survey

The Ordnance Survey is Great Britain’s national mapping agency, which, in recent years, has released some open products. Confusingly, they seem to have two ‘Open’ products which could be relevant to our task.

OS OpenSpace

The OS OpenSpace API, according to their website, is “free to access and lets developers create amazing web applications and online projects with Ordnance Survey maps”. Sounds good so far. Their web-mab builder allows the user to add markers and routes, and then to export a html page (with javascript) that can be put on a web site. Not exactly what we’re after, but we could probably extract the data from it. Are we allowed to?

Well, it seems that this the data used on OS OpenSpace is licensed under the ‘OS OpenData terms’, which ‘include’ the Open Government Licence.

However, the OpenSpace FAQs include this entry:

2.1 I am using OS OpenSpace to create a database of location based information. Does Ordnance Survey own this?
Yes.

When you use OS OpenSpace to geocode data by adding locations or attributes to it that have been directly accessed from and/or made available by Ordnance Survey mapping data, then the resulting data is ‘derived data’, because it is derived from Ordnance Survey data.

Ordnance Survey would own such ‘derived data’, but we grant you a non-exclusive, personal licence to use it within your web application. Please refer to the definition of ‘Derived Data’ and Clause 5.4 of the OS OpenSpace Developer Agreement.

Well that’s not what we want. But, how about the data, that is under the Open Government Licence?

OS OpenData

The OS OpenData site holds a variety of geographical datasets. For example, Code-Point Open is a dataset containing the latitude and longitude of 1.7 million postcodes, whilst OS VectorMap District is a vector based map of Great Britain. Unfortunately it’s not quite detailed enough to show individual buildings, which is what we’re really after.

So, the product we’re after is OS Street View (not to be confused by a similarly named, but completely different product offered by Google).

Can we use this data? The FAQ (which is in PDF format) has this to say:

11 Am I able to reuse “derived data” created from the OS OpenData products?
Yes.

The licence allows the reuse of the derived data created from any OS OpenData products for commercial and non-commercial use. For more information on terms and conditions, read the OS OpenData Licence at www.ordnancesurvey.co.uk/opendata/licence.

OK, so we have found some mapping data that we are allowed to use. Is it in an easy-to-use form? Of course not, its in raster format. In other words, it’s a bitmap image (or rather, a series of images, each covering a 5km by 5km patch of Great Britain).  How can we easily extract the information we need from these images?

Merkaartor

Merkaartor describes itself as an OpenStreetMap editor for Unix, Windows and Mac OS X. It turns out that we can use it to export data rather than uploading that data to OpenStreetMap.

By default, Merkaartor has a number of data sources installed. In order to use the OS OpenData maps, we add http://os.openstreetmap.org/ as a data source, which uses the OS Street View data mentioned earlier.

All that remains to be done is to trace the shapes on the map and then export the data, as KML, which we then convert into a simple CSV file to be imported into data.southampton.ac.uk.

The data that has been generated as part of this process is available in the buildings and places dataset, and you can see it in use on the University’s open data map (which I have also been developing).

****

Thanks, Colin. I’ll just wrap this up by saying that University of Southampton Buildings & Estates will one day probably take over curation of this data, and they are aware of this work. They are happy to let us worry about it for the time being. This is fine with me as buildings don’t move much. Colin has done all of this for fun in his own time. I hope the other data.xxx.ac.uk projects are lucky enough to get some helpers like this. Be ready with a plan of how to let people help if they offer!

Licenses in data.Southampton

June 14, 2011
by Christopher Gutteridge

I got the following enquiry a few days ago (reproduced with permission), and figured the response would be a good blog post (and that saves me answering people individually)

While developing our site about Tsinghua University OpenData, we met some question about licence & copyright.
Some data we got are crawled from public homepages of our university’s organizations and faculties. And we are not sure if it’s proper to release these data.
In your project of Southampton Open Data, I noticed that most of the datasets are published under CreativeCommons, and I found Open Government Licence on your homepage.
Do your have any data source that may have copyright issue while collecting data? How do you deal with that?

Thanks a lot! Look forward to your reply!

I’m going to be honest in the response as that will help people see where we are now.  I am not a lawyer and can’t offer legal advice. We are doing our best to get it right, while not slowing down the progress we’re making.

We apply licenses per dataset. In someways that helps define the scope of a dataset, a dataset is a bunch of data with shared metadata.

Open Government License

In general, we use the UK Governments http://www.nationalarchives.gov.uk/doc/open-government-licence/ Open Government License (OGL), which really is a lovely bit of work. At first glance it’s very like the creative commons “cc-by” license, which is sometimes called “must attribute”.

However, it’s got some clever little restrictions, which make it easier for your management to feel comfortable releasing the data as they address some of the key concerns;

  • ensure that you do not use the Information in a way that suggests any official status or that the Information Provider endorses you or your use of the Information;
  • ensure that you do not mislead others or misrepresent the Information or its source;
  • ensure that your use of the Information does not breach the Data Protection Act 1998 or the Privacy and Electronic Communications (EC Directive) Regulations 2003.

So, if a railway used this for timetables;  if someone took a train timetable under this license and publish train times on a porn site, that’s OK. But if they deliberately gave out slightly incorrect  times to make the trains look bad, that’s not OK. If they claim to be the train company to sell tickets, on commission, that’s not OK. The DPA bit doesn’t mean anything outside the UK, of course.

It gives people lots of freedom but restricts them doing the obvious malicious exploits that are not actually illegal.

NULL License

Another license we use is a lack of a license. Maybe I should add a URI for the deliberate rather than accidental omission of the license?

I have to be very careful about slapping a license on things. Without permission of the data owner, I don’t do it.

A couple of examples of datasets which at the time of writing have no licence:

  • EPrints.soton — people are still looking into the issues with this. The problem is that the database may at some point have imported some abstracts from service without an explicit license to republish. It’s a small issue, but we are trying to be squeaky clean because it would be very counter productive to have any embarrassing cock ups in the first year of the open data service. All the data’s been around via OAI-PMH for years, so it’s a low risk, but until I get the all clear from the data owner I won’t do anything.  The OGL has the lovely restriction of not covering “third party rights the Information Provider is not authorised to license;” but we shouldn’t knowingly put out such data. My ideal result here is that the guidance from the government is that publishing academic bibliographic metadata is always OK, but I’ve not had that instruction, yet.
  • Southampton Bus Routes & Stops — I’ve been told over the phone by the person running the system that he considers it public domain, but until I’ve got that in writing I’m not putting a license on it. Even if he says public domain, I’m inclined towards OGL as it prevents those kinds of malicious use I outlined earlier.

CC-BY

We may use this in a couple of places. It’s only win over OGL is that it’s more widely understood, but I think the extra restrictions of OGL are a good thing.

CC-Zero

This is pretty much saying “public domain”. It’s giving an unlimited license on the data. We use this for the Electronics and Computer Science Open Data, which acted as a prototype for data.southampton (boy, we made some mistakes, read the webteam blog and this blog for more details).

We’ve never yet had anybody do anything upsetting with the ECS RDF, but I’m inclined to relicense future copies as OGL, as it adds the protection against malicious but non-illegal uses.

Creative Evil

Out of interest, I challenge the readers to suggest in the comments harmful, or embarrassing, things they could do with the data.southampton data if it was placed in the public domain, rather than having an OGL license. It’s useful to get some ideas of what we need to protect ourselves against.

If there’s some evil ideas of what you could do under the restrictions of the OGL or no license, please send them to me privately, as I don’t want to actually get my project into disrepute, just get some ideas of what spammers, and people after lulz, might do. Better to think about what bolts the stable door needs well in advance.

3rd Party Data

I’ve got a lovely dataset I’ve added but not yet added metadata for, it maps the disibility information hosted by a group called “disabledgo” to the URI for buildings, sites and points of service. eg. http://www.disabledgo.com/en/access-guide/zepler-building/university-of-southampton is mapped to the URI for that building, and gets a neat little link in http://data.southampton.ac.uk/building/59.html

I created this dataset by hand by finding every URL and mapping it myself, so I have the right to place any license on it I choose. I also added in some data I screen scraped from their site (flags indicating disabled parking, good disabled toilets etc.). I checked with disabledgo and they asked me not to republish that data, so I can’t.

We pay them to conduct these surveys, and our contract does not specify the owner of the data. I’m hoping we might actually renegotiate next year to be allowed to republish the data, but it would be far better if *they* published under an open license and we just used their open data. Probably that’s still a few years off.

Either way, it’s a nice demo of the issues facing us. They are friendly and helpful, just don’t want anyone diluting the meaning of their icons. They give them a strict meaning.

Screen Scraping

Very little data in data.southampton is screen scraped. Exceptions are the trivia about buildings (year of construction, architect etc.) and some of the information about teaching locations, including their photos, and the site which lists experts who can talk to the press on various subjects.

I have a clear remit from the management of the university to publish under an open license anything which would be subject to a “Freedom of Information” (FOI) request. In the long run we can save a fair bit of hassle and money by pointing people at the public website.

The advantage I have over most other Open Data projects is that I’m operating under direct instructions from the heads of Finance, Communications, iSolutions (the silly name we give our I.T. team, which I’m part of) and the VC’s office. This means that I can reasonably work with anything owned by the organisation.

Another rule of thumb I was given is that if it’s already on the web as HTML or PDF then it might as well be data too! It’s not a strict rule, as obviously there’s some things which might not be appropriate, but I’ve not had much to screen scrape yet.