Skip to content

Open Data Internship: The First Week

Last friday marked the end of the first week of my Open Data Internship. It’s the first time the University of Southampton has had an open data intern.  Or, to my knowledge, any University for that matter. This puts me in the interesting position of figuring out what it is an open data intern does.

A Little Backstory

I’m Callum, a graduate going-on PHD at the University of Southampton. I started as a developer about 5 years ago, around 2011, scripting for a game called Garrysmod. I’d never touched a programming language before. For the most part, I learned by studying the works of other people.

I didn’t realise it then, but that was my first introduction to open source. Since then, I’ve come to understand how important open software is to driving innovation. It aids learning and provides a platform for other projects to grow on. All that said, the most important thing is does is bring the community together. It brings together total strangers to work towards a common goal. I think that’s vital in an increasingly insular society. In many ways, I believe open data can do the same thing.

With all that covered, what has my first week been like? Well, the open data service here has been running for around 6 years. Over that time, it’s amassed quite a large amount of data. I feel like a new-age explorer, navigating twisting jungles of information. I’ve come across some spectacular datasets I didn’t know existed. My favorite moment so far was discovering building 185. In the middle of the indian ocean. In other words, it’s pretty fun.

What have I been up to?

The team has had quite a few ideas on the back-burner for while, generally things that they haven’t had time to do. These ranged from straightforward data gathering to projects in their own right.

One of the more straightforward tasks was to fill in some of the bigger holes in the data about the University. One of which was in the building data.

The first phase was to identify which buildings were missing data. The tool to do this, without a shadow of a doubt, was the University’s SPARQL endpoint. SPARQL is a language that allows querying of the open data service, much like how databases use SQL. In fact, the syntax is similar. I spent the first part of my week learning SPARQL and creating queries to hunt for the missing data.

Next up, I imported the locations of the buildings missing data into a mapping tool. At that point, I realised the magnitude of the task.

A map showing the spread of buildings missing portal data. Several pins are in Malaysia, while many are in Southampton.

Unfortunately, the department was unable to sanction a photo-gathering trip to Malaysia. The sad result being I booked my holiday in Cornwall, instead.

The difficulty of the task assessed, I developed a cunning plan to gather the data. Naturally, being a Computer Scientist, I dislike the notion of writing on paper. Even more so, I dislike the idea of then copying that data to a database by hand. I thus proposed creating a tool that would allow quick gathering and labelling of data.

This aligned well with the aims of the team. They had been looking for the time to create a tool to crowdsource data from students on campus.

Thus, I set out on a quest to kill two rats with one high-precision stone.

Update: The SPARQL Code

PREFIX soton: <>
PREFIX foaf: <>
PREFIX geo: <>
PREFIX rdfs: <>

SELECT ?building ?label ?lat ?long WHERE {
    ?building a soton:UoSBuilding .
      ?image a foaf:Image ;
      foaf:depicts ?building
      ?building rdfs:label ?label 
      ?building geo:lat ?lat .
      ?building geo:long ?long
    FILTER (!BOUND(?image))


Posted in Open Source, Team.

Tagged with .

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK

or, reply to this post via trackback.