June 27, 2013
by Ash Smith
Yesterday, 26th June, we held the University’s first ever ‘Open Data Open Day’. This was a day organised by the Open Data service to show the wider university community exactly what linked and open data can do for them. The University has recently announced that open data, specifically an ‘open by default’ attitude, is to be a core principle of its strategy for at least the next five years, so we decided to concentrate our open day on non-technical staff, who may not appreciate what this entails, while at the same time holding a hack event in the building 32 coffee room for anyone with programming knowledge.
We had three main objectives for the day.
- To raise the profile of Open Data to the non-technical members of the university.
- To connect people who have data with people who can do cool stuff with data.
- To come up with some cool demos that show off the power of Open Data.
The day was a success. I ran an introductory talk early in the day, which was well-attended and concluded with an extended question and answer session and a positive response. Pat and I later presented a ‘show and tell’ of all the cool features we’ve been working on that show the power of linked open data, and I followed this up by giving a talk based on my last blog post on good data practice. All the time the talks were going on, several teams of hackers were hard at work in the next room, building some tools and demos using our data. They presented their work at the end of the day in the Terrace Restaurant, each with a well-earned pint. Patrick has written a more detailed post explaining what they got up to.
We also had two guest speakers – Cassie Robinson from Londonscape, a large civic project that uses data from four wards of a London borough, and Sina Samangooei from WAIS, who talked about streamed reasoning to an audience consisting mainly of hackers fresh from their presentations, ensuring a lively discussion.
Because of the success of the event, we plan to run something like this again, possibly in term time in order to get a feel for how students see Open Data.
June 10, 2013
by Christopher Gutteridge
Reena had some very nice things to say about Ash’s Room Finder tool, so I’ve asked her for a quote to publish on the blog:
“My job is to engage young people in technology and I spend a lot of time organising events and have to do a lot of room bookings for them. Using the Open Data website helps with knowing what the room is like, what equipment is in the room, where the room is and knowing if the room is best for my needs. Without it would make the process a lot longer and tedious. It means I can get on with my job and not worry about things I don’t need to worry about.”
– Reena Pau, outreach and diversity consultant
We’re aware that the room finder could use some improvements to the user interface, and it’s on our (very long) todo list. But we’re gluttens for punishment, so always feel free to let us know what features would be useful to you.
June 7, 2013
by Ash Smith
While building the University’s Open Data, we’ve seen many different types of data. Much of the information is exported from Oracle and MySQL databases, or from enterprise systems like Sharepoint, but the vast majority of what we use is in a tabular data format such as a spreadsheet.
Spreadsheets are actually a really good way of producing linked open data without any technical knowledge. A technical person just needs to write a single program or script that converts a spreadsheet into a computer-readable format, and anyone can then modify the spreadsheet to their heart’s content, you just need to run the script again afterwards. But this allows us to fall into a very common trap caused by bad spreadsheet discipline.
Spreadsheets are generally designed for human use. Most modern spreadsheet packages, such as Excel, allow the user to include headings, cell colours, lines, even import images and other files. There are also no strict rules about data type, so you can type a list of numbers in a column and then enter “N/A” or “see below” as part of the list, and the spreadsheet will not complain. This is fine for spreadsheets that only need to be read by people. However when generating information that might one day be read by computer, there is one very important 1975 Doctor Who quote you should remembered, “the trouble with computers is that they’re very sophisticated idiots”. They can only handle what they’re programmed to handle. So if I were to write a program that processes a spreadsheet for converting into linked open data, and then someone were to update a cell in the spreadsheet using the word ‘None’ rather than the number zero, the computer running my program will get confused and behave unexpectedly. This is why good data practice is essential when generating or updating data that may one day become linked open data.
So how can we avoid this? Well, one way is to employ super hackers who can pre-empt every possible anomaly in the data. But in a world with time and financial constraints this isn’t always an option! Joking aside, it’s a really quick and cheap fix to make sure that if you’re designing or editing a spreadsheet, you keep it as computer-friendly as possible. To this end, we’ve come up with what we consider to be the four most important rules for making your spreadsheet ‘linked-data-friendly’.
- Standardise your data format
Values should be numerical or a simple yes/no as far as possible. For example, if you were producing a list of food, rather than put ‘not suitable for vegetarians’ in a general comment field, add an extra column labelled ‘vegetarian’ and restrict the possible values to ‘yes’ or ‘no’. If this isn’t possible, keep to a small set of possible values and don’t deviate from these. ‘Red’, ‘Yellow’ and ‘Green’ is better than ‘Red’, ‘Burgundy’, ‘Yellow’, ‘Lime’, ‘Emerald’ and ‘Jade’, unless the exact shade of green is critically important.
- Keep free text to a minimum
There is always room for a comments column. Sometimes we need to express something that can’t be represented as mere numbers. However, try not to put this in the actual data. The data should be as accurate as possible, and clarified by the comment field. So, for example, if you are maintaining a list of water coolers and their locations, you might have a ‘room’ column. If a cooler is in a corridor rather than a room, there are several ways you can represent this in a spreadsheet. You could leave the room empty and put ‘outside 2065’ in the comments, you could put ‘outside 2065’ as the room number, or you can put the room ‘2065’ as the room number and then write ‘outside’ in the comments. The third way is the linked data way! We still have consistent, numerical data to represent the room, but the comment clarifies to a human reader that the cooler is actually outside the room rather than within it. The computer may not be able to make sense of the ‘outside’ comment, but at least it can get the closest room correct.
- Consistent, unambiguous identifiers
Computer scientists often refer to ‘primary keys’, and information architects will talk about ‘controlled vocabularies’, but at the end of the day we’re all talking about the same thing and that’s a way of identifying a specific thing in an unambiguous way. A good example of this is buildings in the University estate. Some buildings have names, some more than one, but all buildings have a number, so if you have a ‘building’ column in your data, make sure and use the number rather than the name. The same applies for rooms. A computer doesn’t understand ‘level 4 coffee room’ (and indeed many buildings may have a level 4 coffee room) but it does understand ’32/4032′ (for example).
- Style is nothing to a computer
Although you may like to use headers, coloured cells and so on, don’t rely on them for meaning. When you export a spreadsheet to its raw data form, all the styling is lost, so making the vegetarian options in a menu green is not a good way to identify them. If it’s important, it should have a column. By all means, make your spreadsheet as pretty as you like – just be aware that it’s not going to look like that to a computer.
There are other things, but these are the most important. Next time you start a spreadsheet keep to these rules, and your spreadsheet will be trivial to convert and add to the open data service. Once its in data.soton.ac.uk it is really easy for us to give you loads of value add on your data. The value add increases the desirability and accessibility of your data and makes your data helpful. People use your data to make their lives easier and that reflects positively on you and boosts your reputation.
June 6, 2013
by Christopher Gutteridge
Last year we built a system which aggregates event RSS feeds and makes a nice events calendar for the university. I was recently quite surprised to discover a way which the information was being used.
“Change Management within iSolutions uses the Events Calendar open data in conjunction with other confidential data sources to build a rich understanding of critical University activities throughout the year. This enables iSolutions to schedule maintenance work to minimises service disruption to our users, and it also assists in understanding impacts to users when unplanned service outage occurs. ” – D.J. Hampton, IT Service Management & QA Team Manager
May 22, 2013
by Christopher Gutteridge
When: June 26th, 10am-5pm
Where: Access Grid room, Level 3, Building 32 (and probably also the Level 4 coffee room for less formal stuff)
The data.southampton team will be hosting a hackaround day where we’ll give demos, take ideas, help you use the data and build neat things. The exact format of the day will be very loose, but anybody interested is welcome to drop in and have a chat, watch a demo or meet other interested people to start developing ideas and new uses.
If you’re definitely/possibly coming, you can indicate it on the facebook page for the event.
Got some requests or ideas already? Leave them in the comments.
April 11, 2013
by Ash Smith
Just recently I’ve been looking for data we can publish as RDF with minimal effort, and without requiring any access to restricted services or taking up peoples’ time. I came across the University’s jobs site, jobs.soton.ac.uk. It uses a pretty cool system which exports all the vacancies as easily parsable RSS feeds, grouped into sensible categories. We have a feed for each campus, and a feed for each organisational unit of the University, so if a job appears in, for example, the feed for Highfield Campus as well as the feed for Finance, the job is a finance-based job on the Highfield Campus. Because of this, it’s trivial to write a script that parses all the RSS feeds on the jobs site and produces RDF. So that’s what I did, and you can see the results in our new Vacancies dataset.
Normally when I produce a new dataset I like to provide a clever web tool or search engine to make use of the data, but this time I haven’t, because the jobs site already does this very well. So why republish the data at all? There are two reasons. Firstly, our colleague at Oxford University, Alexander Dutton, has already done this with Oxford’s vacancies. If we do the same, using the same data format, we’ve effectively got a standard. If other organisations begin to do the same thing, suddenly the magic of linked open data can happen. The second reason is because now SPARQL queries are possible. They’re a bit advanced for the layman, but if you were looking, for example, for a job at Southampton General Hospital paying £25K or higher, you can write a SPARQL query that does all the hard work for you, and the same query will work with Oxford’s data, although obviously you’ll need to replace the location URI with one of theirs.
Feel free to have a poke around at the data and, as always, if you manage to come up with a cool use for this data – even just an idea – then please let me know.
March 20, 2013
by Christopher Gutteridge
data.ac.uk launched today. It will provide a hub for linked data in .ac.uk open data services, and aggregate open data from UK academia. It’s been set up by the data.southampton team, but it’s owned by the community of .ac.uk open data services.
Our equipment dataset is now aggregated by equipment.data.ac.uk and there is a nifty search.
March 19, 2013
by Ash Smith
We now have a tool that allows anyone in the University to find a suitable room for their event. We call it the Room Finder and I for one am rather proud of it. The tool pulls data from the places dataset, the room features dataset and the new room bookings dataset, and is a really simple way of finding a room at the University of Southampton. Let’s say, for example, that you need a room for a lunchtime meeting on Friday somewhere on Highfield Campus – and by the way, the room must contain a data projector and a piano. Using the Room Finder, you can check to see if such a room is available at the time you need and, if so, click through to the room description pages to find out more. The tool doesn’t currently allow you to actually book the room, but it’s hoped that many phone calls to Estates and/or the central booking service can now be avoided as we continue our ongoing mission to get all the University’s useful data onto the web.
The Room Finder is still under development, so things will change in the coming days. Specifically, I’m not completely happy with the way it displays the features list, it’s still a little bit more technical than it needs to be. We’re also hoping to get a mobile version out soon, it’s a bit fiddly trying to use it on a small screen. But as with everything on this site, I hope it shows just how useful open data can be. If you do find a problem with it, have a request for an additional feature or just find it useful and want to let me know, then feel free to drop me an email at email@example.com.
January 24, 2013
by Ash Smith
Over the last few weeks, Patrick has been exploring the university’s central data store looking for information on rooms and the features they contain. We’ve always had room features on data.southampton.ac.uk, but they were all generated from a single XML file given to Chris some years ago, and things change over time. So thanks to Pat’s fearless efforts investigating the central Oracle database, we now have a couple of scripts to pull not only room features, but booking information as well. A quick RDF generation script from me later, and we now have a method of ensuring the open data is as up to date as the university’s central database.
This is quite a big deal in my opinion – anyone planning a lecture or event can now view room information from the web and work out which rooms are suitable and available at the required time without having to phone Estates or walk across campus in the rain. Also, updating our data after such a long time is interesting for noting how things change over time; if nothing else, audio/visual technology is improving while chalk blackboards are definitely getting rarer!
January 18, 2013
by Christopher Gutteridge
Last week we had a visit from Paul Gibbons aka “FOI Man”. He works at SOAS and came down to Southampton to see what we’ve been up to with open data.
At Southampton the FOI-handling stuff and open data have only a nod-in-the-corridor relationship, but there’s some obvious wins in working together.
In other news, we’ve got more data in the pipes, and are writing importers for it in the next few days, we’ve had a meeting about moving some core critial parts of the open data service into “BAU” – business as usual, so that there’s people who know how to maintain it outside our team, and the core is (change) managed more formally. This is essential if we want open data to be part of the long term IT strategy and not a glued-on-bit on the edge.
I’m also thinking about the fact we have very spotty data on research group building occupation, and so forth. By rights this data probably belongs to the “Faculty Operatiing Office”, but they are busy and don’t answer my questions very often. A cunning plan has entered my mind… Make a ‘report’ URL for each faculty which provides a spreadsheet with what we know about their faculty and let them download it and send it back to us. I think they could ‘colour in’ the missing information in a few minutes, and it will better express the problem to the management/administrator mindset if I show them a spreadsheet with blank cells in. To me, it’s a just data, but then I’m a data nerd, and we’re learning you have to have the data owner work with data in a way that makes sense to them.