Southampton Open Data Blog

Open Data Open Day – Hackers summary

June 27, 2013
by Patrick McSweeney

As you know from Ash’s previous post ran an Open Data Open Day for people in the University to come and learn about open data. As part of the event we invited the University top hackers to come and do some open data hacking. The template  is one familiar to many of you who have attended our events or JISC hack days before. Simply get hackers to sit down together, form into teams of 2-5 people, give them a blank slate, float some ideas, keep coffee on tap and periodically wheel food in and out. At the end of the day go to the bar so that each team can present what they have been working on over a hard earned beer.

One of the key aims for the day was to link up people who had data with people who could do something cool with it. The hack day is a good way to do this because you have a room full of people who can do something cool. Then over the course of the day people with data drop by and talk to the hackers, tell them about there data and get ideas to do cool stuff. It is a friendly and informal environment to work in and people come out with some really good ideas. What always surprises me is, even though I have participated in at least 30 and run at least 5, the outputs from the day are always so amazing good. This hack day was no exception and our teams came out with a combination of the awesomely cool and mind-bogglingly useful stuff.

The outputs were:

LOD Search and the data.southampton.ac.uk usability study 

Collin Williams (CISCO Systems) , Rikki Prince, Biscuits Newton and Andreas Galazis  Decided the usability of data.southampton.ac.uk was not good enough for non-technical people. They performed a usability study of the site and identified key areas of weakness to feed back to us. The biggest problem they found was that the search on data.southampton was nearly unusable. To combat the problem they created LOD Search, a  SPARQL smart search indexing tool which can generate a usual search engine on any SPARQL end point. The demo they presented was very very impressive and prompted a lot of questions from the audience. Attempts to trip up the system by asking for difficult things gave it no trouble at all and the interface was surprisingly good given the short time available to work on it.

LOD Search demo

LOD Search in GitHub

 

 

Open Cycle Routes

Adam Field (iSolutions), Matt Smith (iSolutions) and Lisha Chen-Wilson (iSolutions) met Adam Tewksbury from the University transport office. He was looking for a cool map and video of cycle routes which he can embed in the transport website and attach to the open data pages and maps. The team took helmet cam videos and GPS data about the route and combined them to make cool video which moves the pointer on an open street map as the the video plays. The technique is very powerful and reproducible for any combination of video and GPS data in KML or CSV format. Hopefully this will result in more students and staff getting on to their bikes to cycles the safe routes of Southampton.

Open Cycle Routes demo

Open Cycle Routes on GitHub

 

 

Exchange Calendar to iCal

Martin Chivers (iSolutions) spent a chunk of the morning in talks learning about the nuts and bolts of open data. In the afternoon he grabbed a laptop and cracked really big data.southampton.ac.uk walnut. He created a commandline tool which exports a exchange calendar as iCal. One of our big bug bears in open data has been getting data out of exchange and this tool hit the problem squarely on the head. In the past we have had to ask users with temporal data to set up an account on Google calendar since we can get data out of it but not from our own exchange server. Now users will be able to work in their normal workflow without having to use a tool outside of the University to do a fair ordinary task. As a demo he was kind enough to give us the iSolution change management calendar as open data.

Exchange Calendar to iCal on github

 

 

Southampton Blackout the real story

Tyler Ward (ECS) was more of a victim of circumstance than a hacking volunteer at our Hack Day. He was collocated with us to work on ECS’s media sensation Erica the Rhino but since he is keen got caught up in the open data hackery. The University of Southampton ran a media campaign called the Southampton Blackout to promote efficient power use at the University. The write up from our Comms team had some interesting mathematical inaccuracies which made the integrity of the finds questionable. Tylers aim was to use open energy usage data to tell the real story of the Southampton blackout. What he found is that some buildings during the blackout were using more energy not less to a level which almost eclipses the savings made in other buildings. He found some interesting trends in the data particularly in the figure below for ECS Mountbatten Silcon Fab lab. His close analysis of data was able to deduce where future campaigns should be targeting next an where savings can be made most easily.

 

All in all the day was a great success and lots of fun for the hackers involved. To quote Adam Field:

I had fun.  It’s not often that I can take a problem and spend a day solving it.

 

Open Data Open Day – Summary

June 27, 2013
by Ash Smith

Yesterday, 26th June, we held the University’s first ever ‘Open Data Open Day’. This was a day organised by the Open Data service to show the wider university community exactly what linked and open data can do for them. The University has recently announced that open data, specifically an ‘open by default’ attitude, is to be a core principle of its strategy for at least the next five years, so we decided to concentrate our open day on non-technical staff, who may not appreciate what this entails, while at the same time holding a hack event in the building 32 coffee room for anyone with programming knowledge.

We had three main objectives for the day.

  1. To raise the profile of Open Data to the non-technical members of the university.
  2. To connect people who have data with people who can do cool stuff with data.
  3. To come up with some cool demos that show off the power of Open Data.

The day was a success. I ran an introductory talk early in the day, which was well-attended and concluded with an extended question and answer session and a positive response. Pat and I later presented a ‘show and tell’ of all the cool features we’ve been working on that show the power of linked open data, and I followed this up by giving a talk based on my last blog post on good data practice. All the time the talks were going on, several teams of hackers were hard at work in the next room, building some tools and demos using our data. They presented their work at the end of the day in the Terrace Restaurant, each with a well-earned pint. Patrick has written a more detailed post explaining what they got up to.

We also had two guest speakers – Cassie Robinson from Londonscape, a large civic project that uses data from four wards of a London borough, and Sina Samangooei from WAIS, who talked about streamed reasoning to an audience consisting mainly of hackers fresh from their presentations, ensuring a lively discussion.

Because of the success of the event, we plan to run something like this again, possibly in term time in order to get a feel for how students see Open Data.

More user Feedback: Room Finder

June 10, 2013
by Christopher Gutteridge

Reena had some very nice things to say about Ash’s Room Finder tool, so I’ve asked her for a quote to publish on the blog:

“My job is to engage young people in technology and I spend a lot of time organising events and have to do a lot of room bookings for them. Using the Open Data website helps with knowing what the room is like, what equipment is in the room, where the room is and knowing if the room is best for my needs. Without it would make the process a lot longer and tedious. It means I can get on with my job and not worry about things I don’t need to worry about.”

– Reena Pau, outreach and diversity consultant

We’re aware that the room finder could use some improvements to the user interface, and it’s on our (very long) todo list. But we’re gluttens for punishment, so always feel free to let us know what features would be useful to you.

Good Data Practice

June 7, 2013
by Ash Smith

While building the University’s Open Data, we’ve seen many different types of data. Much of the information is exported from Oracle and MySQL databases, or from enterprise systems like Sharepoint, but the vast majority of what we use is in a tabular data format such as a spreadsheet.

Spreadsheets are actually a really good way of producing linked open data without any technical knowledge. A technical person just needs to write a single program or script that converts a spreadsheet into a computer-readable format, and anyone can then modify the spreadsheet to their heart’s content, you just need to run the script again afterwards. But this allows us to fall into a very common trap caused by bad spreadsheet discipline.

Spreadsheets are generally designed for human use. Most modern spreadsheet packages, such as Excel, allow the user to include headings, cell colours, lines, even import images and other files. There are also no strict rules about data type, so you can type a list of numbers in a column and then enter “N/A” or “see below” as part of the list, and the spreadsheet will not complain. This is fine for spreadsheets that only need to be read by people. However when generating information that might one day be read by computer, there is one very important 1975 Doctor Who quote you should remembered, “the trouble with computers is that they’re very sophisticated idiots”. They can only handle what they’re programmed to handle. So if I were to write a program that processes a spreadsheet for converting into linked open data, and then someone were to update a cell in the spreadsheet using the word ‘None’ rather than the number zero, the computer running my program will get confused and behave unexpectedly. This is why good data practice is essential when generating or updating data that may one day become linked open data.

So how can we avoid this? Well, one way is to employ super hackers who can pre-empt every possible anomaly in the data. But in a world with time and financial constraints this isn’t always an option! Joking aside, it’s a really quick and cheap fix to make sure that if you’re designing or editing a spreadsheet, you keep it as computer-friendly as possible. To this end, we’ve come up with what we consider to be the four most important rules for making your spreadsheet ‘linked-data-friendly’.

  1. Standardise your data format
    Values should be numerical or a simple yes/no as far as possible. For example, if you were producing a list of food, rather than put ‘not suitable for vegetarians’ in a general comment field, add an extra column labelled ‘vegetarian’ and restrict the possible values to ‘yes’ or ‘no’. If this isn’t possible, keep to a small set of possible values and don’t deviate from these. ‘Red’, ‘Yellow’ and ‘Green’ is better than ‘Red’, ‘Burgundy’, ‘Yellow’, ‘Lime’, ‘Emerald’ and ‘Jade’, unless the exact shade of green is critically important.
  2. Keep free text to a minimum
    There is always room for a comments column. Sometimes we need to express something that can’t be represented as mere numbers. However, try not to put this in the actual data. The data should be as accurate as possible, and clarified by the comment field. So, for example, if you are maintaining a list of water coolers and their locations, you might have a ‘room’ column. If a cooler is in a corridor rather than a room, there are several ways you can represent this in a spreadsheet. You could leave the room empty and put ‘outside 2065’ in the comments, you could put ‘outside 2065’ as the room number, or you can put the room ‘2065’ as the room number and then write ‘outside’ in the comments. The third way is the linked data way! We still have consistent, numerical data to represent the room, but the comment clarifies to a human reader that the cooler is actually outside the room rather than within it. The computer may not be able to make sense of the ‘outside’ comment, but at least it can get the closest room correct.
  3. Consistent, unambiguous identifiers
    Computer scientists often refer to ‘primary keys’, and information architects will talk about ‘controlled vocabularies’, but at the end of the day we’re all talking about the same thing and that’s a way of identifying a specific thing in an unambiguous way. A good example of this is buildings in the University estate. Some buildings have names, some more than one, but all buildings have a number, so if you have a ‘building’ column in your data, make sure and use the number rather than the name. The same applies for rooms. A computer doesn’t understand ‘level 4 coffee room’ (and indeed many buildings may have a level 4 coffee room) but it does understand ’32/4032′ (for example).
  4. Style is nothing to a computer
    Although you may like to use headers, coloured cells and so on, don’t rely on them for meaning. When you export a spreadsheet to its raw data form, all the styling is lost, so making the vegetarian options in a menu green is not a good way to identify them. If it’s important, it should have a column. By all means, make your spreadsheet as pretty as you like – just be aware that it’s not going to look like that to a computer.

There are other things, but these are the most important. Next time you start a spreadsheet keep to these rules, and your spreadsheet will be trivial to convert and add to the open data service. Once its in data.soton.ac.uk it is really easy for us to give you loads of value add on your data. The value add increases the desirability and accessibility of your data and makes your data helpful. People use your data to make their lives easier and that reflects positively on you and boosts your reputation.

How open data helps change management

June 6, 2013
by Christopher Gutteridge

Last year we built a system which aggregates event RSS feeds and makes a nice events calendar for the university. I was recently quite surprised to discover a way which the information was being used.

“Change Management within iSolutions uses the Events Calendar open data in conjunction with other confidential data sources to build a rich understanding of critical University activities throughout the year. This enables iSolutions to schedule maintenance work to minimises service disruption to our users, and it also assists in understanding impacts to users when unplanned service outage occurs. ” – D.J. Hampton, IT Service Management & QA Team Manager