Skip to content

Presenting my proof of concept – Week 7

Coming back from my week away, This is Malware had moved along and was much closer to being released than when I’d worked on it two weeks ago. There were still a few features that needed adding and I got to help out with some of those, mostly on hiding UI elements depending on user privileges. My next small helpful task was to test out some of the online eprints functionality after changes had been made to the plugin structure, reporting bugs back to Patrick who then fixed them.

I also got to help write up the documentation for the profiles.soton architecture options. Profiles.soton is a profiles system that is in the early stages of development. The documents and diagrams I helped put together are going to be taken to TAG, the Technical Architecture Group, on Monday morning. I’m hoping that my diagrams and analysis of the options will encourage the group to go for the cleanest solution in my opinion, which involves the most changes to existing systems and storing more information in the Active Directory but results in a cleaner more easily maintainable overall system. Moreover it will allow for other systems to easily follow suit once that extra information is easier to access all from one source. Systems currently gather this information from various disparate sources that could easily lead to inconsistency.

Next I carried out a small manual data management task for the eprints system. Since updating to Pure, the eprints “shelf” system broke, and instead articles now need to be added manually to a project. This was a simple task of copying and pasting publication ID numbers into the web application and adding the result to a project.

The most important part of my week, however, was preparing to present my proof of concept web application to Service Management on Friday morning. I’ve worked on this web application on and off for a few weeks, and this meeting originally seemed like the final full stop on the project. The original plan was that if Service Management wanted a system like this to be built that they’d request it from the team and have a project added to the pipeline for development in a few months, long after I’ve left.

Presentation Notes

Presentation Notes

However, the demo was met with a more positive response than I could have ever hoped for, and they asked if we could make this live as soon as possible. Suddenly, the code I’d thrown together in a language I’d not touched in years using technology I hadn’t really thoroughly researched was being added to the team igit, and I began kitting it out with a test suite. In hindsight, I should have designed the system more carefully, and it may be the case that a lot of it gets rewritten to make it suitable for actually going live. I believe by the time I’ve got this project ready for deployment, I’ll have really thoroughly learnt to always keep my code clean and maintainable regardless of it’s purpose and scope. As difficult as it will be to clean this up and get it production ready, I’m thrilled that something I’ve worked on during this internship could be live by the time I leave. My goal is to have it ready in time for Fresher’s Week so that new students adapting to life at the University will be able to take advantage of an open more accessible knowledge base.


Posted in Programming.

Tagged with .

Open Data Internship – Week 7 – In the name of progress

“OK, so the plan is simple, we cut through the vent, squeeze down the air duct, take a left and avoid the IR lasers, before carefully dropping down onto the desk so as to prevent setting the pressure sensors off. Before diving though the door and into the office. So any questions?”

“Yes. Why don’t we just go in through the door behind you, which is wide open?”


The past two weeks have been quiet, and for quite a few days I was the only person in the office and as such have been left somewhat to my own devices. This has meant that I have primarily been working on tools for maintaining the data behind the campus navigation map, which is currently a large JSON file.

Whilst developing these tools I have been considering ways to improve the life span of the programs I am developing. The main way I have been working on this is by trying to keep the application as far abstracted from the data as possible. To achieve this I decided to create a schema for the data being used in the map, which is used to create the interface for the data view/editor that I have been building. A frame for the data is created from the schema into which the data is placed, this allows the data structure to change as a user could change the schema and the data would still display, albeit with holes where new unfilled fields exist.

To achieve this I created a text file which was essentially a JSON file with the ‘{‘ and ‘}’ characters replaced with tabs. Unfortunately at the time I did not realise that was what I was doing and went on to write a not insignificant amount of code to read this file into a python dictionary. Mildly impressed with what I had done I showed it to a friend who pointed out that I was essentially reading in a dodgy JSON file and therefore if I changed the format to JSON I could read the file in using about three lines of code. The following morning I spent a good five minutes trying to justify to myself a way to keep the work I had done in my project, as it felt wasteful to delete it. However, in the end it was axed. This did get me wondering about things such as pair programming, which I usually dismiss as a waste of time and resources, as since only one person can type at a time, and people have the annoying habit of having unique(ish) thoughts, which could lead to conflicts over how to achieve tasks. However here it would have saved me half a day’s work. Eventually I reconciled these two views with a compromise, which I think balances efficiency with… well efficiency, that is with regular code reviews. As you still get two people programming at the same time, but also help to prevent programmers from running too far down the wrong path on how to solve a problem as the reviewer can point out such errors, as happened in my case. This would further help with maintainability of code as if you have multiple people regularly reading the code then the comments on such code should be useful.

Posted in Open Data, python.

Cleaning up old code, and learning PHP – Week 6

Following on from last week’s unit testing for eprints, this week I took on the gargantuan 120 line function. It sucked up almost an entire day all by itself, but eventually I had each and every condition and result tested. The next step was, of course, to break it down into readable maintainable chunks of code. I took sections, some of which were originally marked out as separate sections in comments, and put each one it its own function. I also fixed a broken function called is_valid_date that had been returning true without carrying out any checks. Instead of using complicated regex and lots of if statements one after the other, as this function had been before it was given up on and prepended with “return 1;”, I decided to use a Dates library and let that attempt to parse the string being passed in. If it raised any errors I returned false, the string isn’t a valid date, else I returned true. This cut the function down to less than half its original length. It was great being able to put some of the ideas I read about in my first couple of weeks here into practice.

I also got to contribute towards the development of ThisIsMalware, a system I’d been a part of the requirement capture for a few weeks ago. This got me programming in PHP, an entirely new experience for me. After having tried my hand at Perl, it didn’t seem too complicated on a small scale, but I did find that in a larger web application I started to lose track of the overall structure of the system.

Posted in Perl, PHP, Programming.

Tagged with .

Open Data Internship – Week 5 – The Omnibus Edition

Alas there is no creative writing to go at the top of this blog, mostly because it would end up being very similar to the ‘What have the Romans done for us?’ scene from ‘Life of Brian’, but with ‘What have I done recently?’ being the main question. As whilst I have been busy for the last 3 weeks it did not feel like it at the time and so I had neglected to write blog about the stuff which has happened. I shall now try to rectify this.

Lean 6 Sigma

Recent changes in management in iSolutions have led to a plan to introduce new management practices in the form of Lean 6 Sigma. This included a training afternoon that I was invited on along with the rest of the team. The system is made up of two parallel practices ‘Lean’ and ‘6 Sigma’. Lean was developed by Toyota about 70 years ago and its aim is to reduce waste, or as Lean calls it ‘non-value adding tasks’. Waste includes anything that the customer did not ask for such as testing, transport or waiting. Whilst this is clearly geared towards manufacturing processes, I can see how it could be mostly implemented in a software development environment.

6 Sigma on the other hand focuses on improving the quality of products delivered. It does this by modelling the delivery of the product/service as a normal distribution, and finding what percentage of these tasks are completed in an acceptable time/to an acceptable level. From this data, the process is given a ‘sigma rating’, which is the number of standard deviations above the mean the process is. The process is then improved upon with the aim of making it a ‘6 sigma’ process, meaning that the process is being completed such that 99.99966% of outputs are defect free. Personally, I am less positive about 6 Sigma than Lean as 6 Sigma is aimed towards improving standard procedures, but I find there is little to be standardised about software development than project management, and how would you go about determining a standard metric for “defect” free products.

Campus Navigation App

I have spent most of my time for the last 3 weeks building and getting data for an internal and external campus navigation app. The app itself is reasonably simple, each room, fork in path and building entrance is a node, which is connected to other nodes in a simple graph. From there it is a simple case of traversing the graph to find a route. I did look into for other existing solutions similar to this one, but found that most of them were paid solutions and the one that was not had some undesirable features. One such feature was that it required floors to have an image to represent them, typically a floor plan which I didn’t have at the time, and even if I had I would not have wanted to place them on a publicly visible service.

Open Data Service Website

I took a few days off from developing my app to look at the development build for the website to find any issues with it, since it was largely procedurally generated from the data the service acts as a from end to it was unsurprisingly low in issues, with the main issue being dead links present in the data. So I chased down these dead links and where possible found new links for the data in question. A not very interesting task but one which will improve the quality of the affected data on the open data service.

A second task which got push back behind app development was to look into how to make the university’s open data more accessible to people, as whist it exists and is quite comprehensive it appears not be regularly used by the university populace, but existing applications such as ‘Room Finder’ and ‘maps.soton’ are regularly used by people at the university even if they do not know that this is part of the open data service. The main conclusions I drew were that example SPARQL queries and easily accessible source code for current services would be a good place to start, but I am open to suggestions (as is the rest of the Open Data service).

Posted in Open Data, Programming, SPARQL.

Testing Eprints – Week 5

This week I finally got to sink my teeth into testing the Southampton specific eprints configuration files. This presented a pretty steep learning curve for me, having to read, understand, and test code written in a language I only picked up last week. I started small, realising that tackling the 120 line sub routine should probably wait until I’ve built up a little more experience. There were a few very short sub routines that I could start with, only a few lines long. For these shortest of short routines, writing tests almost felt like overkill. It seemed as simple as running a routine that set x = True, and then checking afterwards that x=True and patting myself on the back. As soon as I built this up to routines that were 15 lines long, or 20 lines long, I started to see the usefulness. There were changes I wanted to make immediately, nested if statements that could be cleaned up, variables that seemed redundant. Putting these tests in place means that people will have the freedom to do these things, and to improve the code without fear of breaking the overall functionality.

I also had a somewhat unexpected task set for me, to create a teaching workshop to introduce my team to machine learning. I’ve talked about my undergrad course and the PhD that I’ll be starting in a couple of months’ time, and apparently this sparked an interest in some other members of my team. I’m specialising in machine learning and computer vision, specifically for use in classification of species. My individual project for my undergraduate involved classifying bird species in audio files, and my PhD will be focusing on classifying crab and coral species in images of the sea floor. My workshop will start with an overall introduction to machine learning, very quickly focusing on the idea of classification. I’m not sure when I’ll be giving the workshop, but I’m excited to give it a go. I’m a volunteer teacher with Robogals and go into local schools to teach programming with lego mindstorms, but I haven’t had much opportunity to branch out and teach an older audience more technical topics before.

And of course, my week ended on a high with the University of Southampton staff party. It was a great opportunity to spend more time with some of the other interns at iSolutions, the food was wonderful, and I got to learn a little bit of poi from a good friend of mine at the circus society workshop.

Posted in Perl, Programming, testing.

Tagged with .

Institutional Web Managers Workshop, 2017

I recently attended the annual Institutional Web Managers Workshop (IWMW) conference, this year held at the University of Kent. If you are unfamiliar with it, IWMW self-describes as the premier event for the UK’s higher educational web management community. This marks my second IWMW – I attended last year’s conference in Liverpool and was sufficiently impressed that I was keen to attend again this year.

I made rather more of the networking this time around, speaking to people from all manner of different institutions and organisations. It’s fascinating learning what themes are common across the sector and what’s unique to the University of Southampton. Spoiler: surprisingly little is unique — most institutions are going through similar challenges.

Built without a clear vision

Andrew Millar from University of Dundee on how we build websites. With so many stakeholders, is it any surprise we get such complexity?

One early revelation came from talking with some of the delegates from Dundee University. They have a UX specialist whose role includes ethnographic study of people using their ICT services. I’ve always felt that this is an area where we should start heading. This particular viewpoint was solidified on the third day when Paul Boag said that not only should we be studying people as they use our services, we should be video-recording it and compiling a lowlights video. In essence, put together a 2-3 minute video of all the parts where your users are swearing at their computer in frustration! That way you end up distilling the biggest UX problems your sites have. The University of Bath team also talked about product vision and how finding the true north of your products encourages focus on people’s needs and using data to make decisions. Our services should be simple and intuitive and releases should be iterative and frequent.

The plenaries from both Bath and Greenwich both made the very good point that we should be stopping users from owning the design of their sites; they are content creators and we should be removing the distraction of presentation from them as much as possible. Business value comes from delivering content. Greenwich suggested that we should talk about content instead of pages to help distinguish the material from its presentation.

St. Andrews have run with this approach by publishing their Digital Pattern Library (DPL) to codify all aspects of their University’s brand and make the documentation and process accessible to all so that it’s as easy as possible for their staff to produce St. Andrews-branded websites.

Digital native does not mean tech-savvy

An insightful observation from Tom Wright, University of Lincoln: Digital native does not mean tech-savvy. Don’t assume that the younger generation are necessarily technology experts!

On a rather different tack, the University of Lincoln have made some astute observations about the current generation of undergraduates. It’s no secret that social media platforms, rich multimedia experiences and shared memes are a significant aspect of modern youth culture, but seemingly few organisations have sought to exploit that. Lincoln’s approach to marketing, by having current students create YouTube videos, is a nice touch and makes the experience much more engaging. I definitely recommend checking out their videos.

As well as the plenary talks, I also attended a workshop entitled How to Be a Productivity Ninja from Lee Garrett of Think Productive. Most of the IWMW workshops tend more towards the technical hard-skills end, but there are always one or two soft-skill management sessions and those are the ones I look out for. Claire Gibbons ran an excellent workshop at IWMW 2016 called Leadership 101 that I felt was the highlight of that conference. Productivity Ninja was this year’s equivalent for me; I learned a few great tricks to improve my productivity and have leads on some handy apps to help me organise things better. It was also nice to see one or two tricks mentioned that I already use.

In conclusion, I found attending IWMW 2017 a very worthwhile exercise and I am certainly looking forward to next year’s. I’m definitely keen to improve our UX testing with ethnographic studies and I’ll be investigating whether we can run a Productivity Ninja session at Southampton some time soon.

Posted in Best Practice, Community, Management, web management.

Lego, Lean six sigma, and Eprints – Week 4

This week got off to a fantastic start, with a team lego morning. Having been invited to talk about Robogals (a student society I’m currently the president of) on Monday lunchtime, Pat decided we should all go play with the lego robots together that morning. It was a great morning, it gave my team the chance to see a little bit of what I do outside of work, and it gave me a chance to prepare a couple of robots to showcase at the lunchtime event.

On Tuesday morning, I showed my web application to the team in our weekly meeting. This somewhat marked the end of the bulk of the work on that project for now, and this week took me in a new direction, away from making my proof of concept web application and on towards unit testing for eprints. The preparation before reaching that point was a much longer journey than I anticipated. Firstly, I attempted to install eprints onto an Ubuntu 16.04 virtual machine. This didn’t work, as eprints doesn’t have a Release file, so is completely incompatible with modern Ubuntu systems. Next I installed a Fedora virtual machine. The installation of standard eprints worked on this machine, but trying to use the Southampton University specific system brought up a handful of errors, mostly dependency issues. I counted, and noted down, at least 7 individual packages that were dependencies of the system that weren’t installed already. After installing each of these one by one, the installation eventually worked.

Seeing as I was to going to be writing unit tests for Perl code, it made sense to get a little bit of practice with the language. I hadn’t seen much of Perl before, and had never coded in it at all. My first experience with it was writing a script to check for the missing packages that had made installing eprints such a difficult process for me. This script, once added in before installing the rest of the packages, checks for the necessary packages, prints a message about any that are missing, and will abort the process if there are unmet dependencies.

The team also had a trip up to Boldrewood Wesnesday afternoon for our Lean Six Sigma white belt training. This is the first level of training in the course, and is all about improving efficiency in processes and reducing the number of defects per opportunity. I like the overall approach, especially the idea of blaming processes not people for the majority of problems, and the idea of getting management on board with changing processes to make them better for everyone involved. It is quite strongly driven by the idea of value for money, something that doesn’t quite resonate within iSolutions as we don’t have customers in the same respect, and aren’t being paid per product. I think this should be quite easy to get over though, instead of using money as a metric to measure against, using a feedback score or other performance metric to determine what a product or service is worth – striving to improve the day to day lives of students and academics rather than striving to turn a profit. It will be interesting to see how much of this will be implemented during the remainder of my time here on the internship, and if it works possibly the effect it has on the rest of my time at the university.

Posted in Perl, Programming, testing, Training.

Tagged with .

Lucene.NET index building – Week 3

This week I continued on my proof of concept project, and more specifically looked at ways of indexing the knowledge base articles in order to be able to search them and identify similar articles. There are many libraries and tools built for this, and one that I investigated early on was called gigablast. This system is built for linux, and integration with my .NET project was difficult. I decided instead to look into options native to .NET. The solution I found and integrated with my project is called Lucene.NET, and is an indexing tool originally written in Java, then later ported to C#. This tool indexes items and saves this index to disk where it can then be read at a later time by any application using Lucene with access to that disk space.

Once I had indexed the knowledge base, I then began to investigate other features of Lucene, such as the search features. I created a search function that uses Lucene to search the indexed articles, looking for articles containing exact matches to the search term. I then extended this to use fuzzy queries, where matches that are a set number of edits away from the search term are also returned. For example, if someone were to search VPM instead of VPN, that would count as being one edit difference and therefore with a fuzzy query allowing at least one edit difference a search for VPM would still match with VPN. This is a really useful feature to have, especially with less technical users who may misspell important terms. Once I’d investigated with different search features, I then looked into adding further functionality to my web application by providing links to related articles at the bottom of article pages. This was interesting to play around with, comparing similarities in the title and in the main body of text, with both a standard analyzer and a snowball analyzer, comparing which resulted in more relevant articles being identified. The standard analyzer identifies terms and counts their frequency. A snowball analyzer does the same, only it allows for stemming of words, so the terms print, prints, printing, and printers are all identified as being the same word. There are drawbacks to the snowball analyzer, also identifying terms like organization and organs as being the same, so allowing for more mismatches in some cases. Despite this drawback, I found the snowball analyzer to do a better job in most cases at identifying similar articles, and just comparing titles to give more obviously similar results, whereas comparing the main body of text sometimes returned results that weren’t immediately obviously similar to the given article.

The next step for my web application was to make it look professional enough to possible be an outward facing application by applying the university branding rules to it. I was given a link to the new university branding guide, This website lists in detail how to brand posters, choose fonts, and even how to advertise the university on the side of a mini-bus, but very little on how to create a university of southampton website. I decided, instead, to use the source code of the branding website as a kind of template, and to try to integrate it with my existing web application. This proved more difficult than I originally anticipated as the website wasn’t designed to be pulled apart and reused for a different purpose, but eventually I ended up with a website that looked vaguely professional. There are still a few small bugs such as the navigation menu moving up half a centimetre when you hover over it, and the title shifting downwards if you hover over the breadcrumb links just above it, but all in all the web application is starting to look more clean and professional.


The next steps are to iron out those little visual bugs, probably moving the breadcrumb links as they’re not in a very aesthetically pleasing place at the moment, and ensuring this style works for article pages too. From there, I’d like to get other people’s opinions of the website, to see if they find it more or less useful than the existing knowledge base browser, and to work out how it can be further improved.

Posted in Programming.

Tagged with , , .

Open Data Internship – Week 2 – Dusting off old skills

He ran his fingers across the dust console leaving a trail of clear glass showing the controls beneath. He paused. It had been quite some time since he had last flow this ship, so he took a moment to try and remember the controls. Gingerly he tapped some of the buttons. An explosion echoed through the ship and an alert displayed on the console, “Uncaught TypeError: Cannot read property ‘id’ of undefined”. “Hmm, how about this?” he mused tapping the console again, this time the bridge came to life.

This week has seen me dusting off my JavaScript skills in the aid of making a route finding map prototype for campus. So in its simplest terms this is a graph traversal problem. But before I jumped into coding a graph traversal algorithm I needed to find some way to represent the map. The form I decided on was reasonably simple, I only modelled the nodes and these nodes had the properties of a unique numerical ID, a label for describing the location, a geometric point, and a list of nodes to which it was connected. I decided upon this because it allowed for one-way edges which could represent exit only doors.

Next came the implementation of the graph traversal. The method I used was a modified Breadth-First search which instead of enqueuing newly explored nodes to end a queue, I instead inserted them into an array which was ordered by cost (distance) travelled, and the least travelled node was expanded next.

The underlying data I was using was a list of university building entrances, but this list was incomplete and also didn’t help with navigation as just drawing a line from one door to another will not help anyone navigate anywhere unless they can fly. So I manually created intermediate nodes on paths and crossings, typically where the path forked or crossed a road. As I was only making this data for a prototype I did it sporadically and just arbitrarily picking a spot on the route I was trying to add and working from there, but on I decided  this was not a good way to move forward once the main program was working so I spent time using GIMP2 to simplify a university map down to the elements I needed to see to decide where to place nodes.

I started with this Map:

A map of the University

Then spent time reducing the colour of the image, primarily using tools such as Posterize, which reduces the number of colours in an image, and the ‘Select by Colour Tool’, which I would use to select an all a specific colour on the map and recolour it all. After several hours of this and some manual cleaning up of the image I produced this three colour Map:

Three colour map of the Univercity

Here we see roads in white, paths/paved areas as grey and impassible areas (grass and buildings) in black.

Now it was a matter of adding nodes to the map, I started by adding blue dots to represent doors and then added red dots for forks in paths and road crossing. Finally I connected these nodes with green lines to represent traversable paths. This gave me this map:

3 colour university map with simplified overlay.

Now that I had the graph I could remove the underlying map and see the graph I had created:

A graph representation of campus

Which I personally think looks quite nice, but more importantly will allow me to expand upon the data in my prototype in a logical and well thought out way.

Posted in Data, Geo, Javascript, Programming.

Open Data Internship – Week 1 – The code Jungle

Wandering through the ruins of this civilisation I come across a strange totem in the middle of the road, it makes no sense for it to be there so I examine it closely, its secrets no closer to being uncovered. Then I notice a small plaque, with some prehistoric writing on it, I was previously unaware that this civilisation had mastered writing so I eagerly tried to read it “//Down here as PHP 5.5 or less doesn’t support expressions as initializers.” And suddenly the totems position made sense.

It is my first week here in the Technical Innovation and Development team working as an open data intern, and whilst looking for a starting point for my time here I decided to look to the past and see what last year’s intern did with their time here. Conveniently they wrote a pseudo-weekly blog about their time here and so I settled down with a cup of tea to read what they had done. This went well for about the first five minutes until I came a sample of a SPARQL query. I had never seen anything quite like it before so I started investigating. SPARQL is an Resource Description Framework (RDF) query language, I understand RDF to be a format by which data is expressed as Triples containing a ‘subject’, ‘predicate’ and ‘object’. The ‘subject’ is a reference to the object being described such as ‘buiding32’, the ‘predicate’ denotes a type of the ‘object’ such as ‘residential’ and the ‘object’ denotes a value such as ‘false’. Having never come across either RDF or SPARQL before it took a little while to get my head around this but I got there in the end and building on what was done last year I was able to retrieve information about what buildings on campus do not have images or geo-data in the university’s University ‘Building’ Open Data Set, removing all members of the ‘Item Hidden from Lists’ Data set. I achieved this using the MINUS operator which was implemented in SPARQL 1.1, which allows for the elements in set A which are also in set B to be removed from set A in the case ‘A MINUS {B}’.

The second thing I have worked on this week was working my way through the source code of the data gathering app that last year’s intern wrote, available here. Doing this made me feel a bit like a new age explorer as a whilst the application seems well coded, comments are few and far between which can leave me wandering through the jungle of code longer than necessary and underlines the need for well documented code to improve the lifespan of it. This was particularly frustrating when I was starting out and having hosted a local server on my machine and was trying to get the app to run I was running into a large number of database related problems. I eventually found the problem by sifting through error logs which were complaining that the mysql libraries were unavailable. This was because I was using PHP 7, and these libraries were removed in PHP7 having been deprecated in PHP5.5 for security reasons. Putting the original code on a server with PHP5.5 solved all the database connection problems and made the app functional again. I am now going through and trying to replace these mysql references with mysqli ones instead to improve the quality of code.

Posted in Data, Open Data, PHP, SQL.