Skip to content


rss.data.ac.uk v2: Frameworks, Refactorisation and Documentation

UntitledI’m quite a few weeks into my internship now, and have been working with PHP for a while now (I fear it’s very much becoming the Devil I know) – so it seemed like a prime time to go back and refactor the code behind rss.data.ac.uk, as well as implement a few extra features.

So the first thing was to refactor the back-end – shatter my_first_code.php into some (vaguely) sensible classes. My method to do this was as such:

  • Have my old code up on one screen, a fresh terminal up on the other.
  • Slowly copy things across – making changes, fixing errors and creating classes where necessary.
  • Make tea.
  • Repeat.

After a couple of hours the basic refactoring was done, and so it was time to implement some actual changes (improvements, some might say).

The main improvement made to the back-end was a more robust implementation for inserting institutions, feeds and posts into the database. Two different methods were used – one for institutions (because of their relatively simple nature), and one for both feeds and posts:

  • Institutions came with relatively little meta-data about them (ID, Name, Groups and PDomain), and so could be placed into the database with a simple INSERT IGNORE – if the PDomain was already present then the entry was ignored, if it was not it would be inserted (Name and Group handling comes later).
  • Feeds and posts were a little more fiddly – they couldn’t be based purely off of the URL, like institutions, as these were often re-used and so an INSERT IGNORE would fail to update any relevant meta-data. The solution used was to make a hash of the feed/post title and its related URL. Before attempting to insert a feed/post, the database would be queried for any entries which matched the URL of the item to be inserted, and then return the stored title/URL hash. If no entries were returned, then this was a completely new feed/post and so could safely be put in with an INSERT IGNORE. If an entry was returned then the hash would be checked against a hash of the title/URL of the item to be inserted – if they were equal (we could safely assume that) nothing had changed, and so the insert attempt would be abandoned. If they were different, however, then the URL was the same but the title had changed – the URL had been repurposed, and so an UPDATE statement was used.

Aside from this more robust insertion method, the main change was to tie in meta-data to the institutions (a database ID/URL pair don’t really say much about a place). data.ac.uk stored such meta-data, and so it was a simple matter of writing a script to download the data, compare the URLs of the downloaded data to those in the database, and update the Name/Groups columns of the table wherever the URLs matched.

The web-facing end of rss.data.ac.uk was then sanitised by placing it into a framework – Fat-Free PHP to be specific (our good friend from Honey Badger). This involved separating out the page into the MVC design pattern, which it definitely wasn’t beforehand. After this, there was some fiddling around with CSS to make the page look right. The ability to filter results on specific university groups (e.g. ‘The Russell Group’) was also added – through a combination of checkboxes on the homepage and multiple INNER JOINs in the database. Though using Fat-Free PHP felt difficult at first (my time with it during Honey Badger was fleeting) it really felt far more flexible than hand-writing everything by the end – which is rather good, as that’s its intended purpose and it means I’m actually starting to get a grasp of it.

After this various documentation was written for the project – namely proper commenting (partially done during the refactoring, partially after) and a README with installation instructions.

From here it was a ‘simple’ case of pushing dev to live – and fixing the inevitable torrent of problems caused by this (and updating the README so if anyone does ever implement/move this, hopefully they don’t suffer as I did):

  • Submodules not properly pulling from git with the project clone, so manually initialising and updating them.
  • Apache proving computers aren’t deterministic – restart Apache several times and you have the same issue, call over a colleague for assistance and show him (without making a single change) and it suddenly decides to work…
  • Deprecated flags in the MySQL my.cnf as the VM sneakily updated from 5.4 to 5.5.
  • The need to update MySQL (less sneakily) from 5.5.24 to 5.5.38 for a bug fix.

After this fiddling around, everything worked rather smoothly and I could go from a clone to a working website in a matter of minutes.

As this project was such a massive overhaul of the original code, it’s got its very own repository. If you feel so inclined you can find it here.

Posted in Uncategorized.

Tagged with , , , .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.