Skip to content

Love data week: COVID Data

It’s love data week starting Monday 15th February. I like playing with data but am so better at finding interesting ways to visualise it than I am at interpreting it!

As of the 6th of February, more people in Britain have died with a COVID diagnosis than the entire population of Ealing.

The data that impacts all of our lives right now is the UK COVID statistics. Helpfully, these are available from an API. My first idea for this data was to try to find a way for the numbers to land with people in an emotional way. I hit on the idea of showing milestones in UK COVID deaths in terms of city populations, getting the data from DBPedia. However this was a bit sparse so I expanded it to any “UK settlement of 1000 or more”. Sadly, we were already past that threshold when I wrote it. You can see my COVID cities tool on the web, and the code is available from GitHub.

I thought a lot about how R is a terrible value for public understanding. R=0.9 good, R=1.1  bad. It’s a  measure of change in scale but not in a format anybody uses in their daily lives. I figured the one place the public are comfortable with exponential growth rates is mortgages so a clearer number would be, say, (R-1)*100 which would give a +/- percentage of new cases per case. But R is also very difficult to relate to what’s going on and what’s likely to happen next. It also really matters what the current number of cases are. Occasional spikes in R when there’s 10 cases a day isn’t a big concern. When there’s 1000 cases a day it really is. So after a lot of though I’ve made a visualisation that shows things I think are more useful for public understanding of what’s going on with COVID.

My COVID projection page shows the data I think is useful. The grey barchart is the 7 day averaged number of cases. The green & red bar is the daily change (this is very smoothed out by working out the weekly change in the 7 day average and converting that to a daily change. I think this is a really good indicator of when things were getting worse and better rather than the case numbers themselves. To make the recent values visible I truncated the really big growth from earlier in the year so it didn’t skew the scale too much.

Next I wrote a function to work out how long it is taking cases to double or halve. I think this is a really good way to explain why the virus is different to cancer. This virus can double in cases every three days which sounds more like a loan shark than a mortgage. Writing this equation meant that I used log() for the first time in a really long time!

The final column is what I think of as the “EEEP!” factor. It projects today’s cases (not the 7 day average) by the current growth rate for 28 days and sees where we’re heading for if things don’t change. For this to be a concern cases have to be high AND increasing. I think that EEEP! factor is good for showing why lockdowns were needed.

The final feature I added is to mark weekends and the national lockdowns and other events – although this is complicated buy the fact that NI, Wales, England & Scotland may not quite be doing the same thing as each other.

You can see that lockdowns followed peaks in the “EEEP!” factor, although lockdown #2 was several weeks after Eeeeep #2.

What it does show is that contrary to what you’d believe from my Facebook feed, there was a sustained fall in cases for many weeks after the Cummings incident. It makes it very visible when lockdowns started working and it does show a few notable spikes in the rate of increase – red peaks in the red/green column, which suggest something caused them in the days or weeks before hand.

Notable peaks in the rate of increase were July 8th, September 12th, October 9th, December 22nd, Jan 4th. I’m tempted to correlate Schools and Universities reopening and Christmas  to explain the Sept, Oct and Jan peaks but correlation/causation and all that. I have no guess at what caused the spikes in July & December. Any ideas?

I’m not an expert at interpreting data so please don’t quote my interpretations of this data as a source of truth!

[UPDATE: I got inspired this morning to add a new column, “Doublings” which shows how many times cases have to halve to get back to 1 case. Obviously by that point cases from borders will be more of an issue that community transmission, but it’s sobering to see that with lockdown it takes two weeks to halve and to get to 1 case would take exactly half a year, so mid-August. Hopefully Summer and vaccines will change this picture]

Posted in Uncategorized.

Still learning a lot – Denis’ Internship Adventures #2


What has been happening the past 3 weeks:


Project work

The last few weeks have been me jumping from one project to another, as we have been waiting for information, communication or people on leave.

We finally received the schema for the towing tank project database, which meant that we can now start developing the back-end. My first task was to research graphing technologies that would allow us to make the needed line graphs and then start coding the API. After 2 days of research and trying stuff, as well as getting David’s input, we decided that a library called ApexCharts, that has a wrapper for Vue, is a very good candidate. I then started building the graph functionality into our already existing project (read tried).  I soon found that dotNet Core and Vuejs, don’t play well, and that its not the easiest thing to integrate them. We decided to add both Vuejs and the wrapper inline, which is enough for our project, as we are only using Vue for the graphs. We now have the graphs done, so the next step is starting the API.

Another problem I faced was – how do we take the averages for the graphs? The problem we faced dealt with how the data was sampled, as entries in the environmental database were only added if there was a recorded change. To get correct reading we need to have more than the data points provided, so we decided to create “mock data” in between database entries, to show more accurate graphs.

A lot of the last 2-3 weeks have also been spent working on QnA and learning Laravel. Working on tidying up code and converting it to be more professional/reusable/maintainable has been incredibly eye opening, as it has made me think deeply about the code that I have written in the past. Most of the work on QnA lately has been reducing the code base by introducing gates and policies, creating factories for all models and changing the existing tests accordingly. If next year’s intern is reading this and is still working on QnA – heed my warning, use the tests (and write your own).

Image result for heed my warning

The latest thing I’ve been working on is migrating a service from the old ECS servers, as well as adding more security to it. It has to do with the occupancy screens that are used by the nanofabrication lab in the Mountbatten Building. It has also been a learning experience, as I’m working with another intern that is doing all of the project management.

Something interesting that happened was that last Wednesday, Pat McSweeney, the internship coordinator, took all of us interns to the UoS data centre which was really interesting.

University vs Industry

Being about a month and a half into my internship, I started to see a great difference between what students are taught and what you need to do well in a professional environment. The biggest and most noticeable difference for me is the nature of the work professional developers do, as opposed to what work students do. At my  course at university, we have strict, well defined problems, the solutions to which can be provided within a short/well-defined time span, with a very specific(known) set of tools.

That however, is never the case in an industrial environment. In my experience, industry problems are quite complex, multifaceted beasts, whose (perhaps contradictory) requirements change over the developmental process. Another thing I experienced was that usually, the problems developers have don’t even have anything to do with programming and developments altogether. Issues like miscommunication and wait times, be it bureaucracy or confirmation from other people, can stall progress, drastically reducing efficiency. Overall, academic and industry experience are completely different.

Next few weeks

In the next few weeks I am aiming to finish the Clean-room Occupancy migration, with some extra security. After that, continuing work on Towing Tank, hopefully finishing all requested functionality within 2 weeks of David coming back.

Thanks for reading!!!

Posted in Uncategorized.

Tagged with , .

Learning the ropes – Denis’ Internship Adventures #1

Greetings, I’m Denis and I’m the new TID intern. Few details about me:

  • Just finished my second year at university, doing Computer Science.
  • Interested in Artificial Intelligence and Neural Interfaces, but also … pretty much anything tech related.
  • Avid gamer (both tabletop and computer games)
  • Proud Golden State Warriors fan.

For the next 12 weeks, I will be doing multiple projects of varying sizes which I am very excited about. Currently, I’m working on a project for the Towing Tank (Boldrewood Campus) about visualisation of data on a website. 

My first day was spent installing tools and getting my workspace in order. The laptop already had a clean install of Windows 10, so everything else was quite straightforward.

The main thing I was doing during the first 2 weeks, was learning ASP.Net Core. That proved to be a nice challenge as I had never worked on Web Development and my previous experience with the MVC architectural pattern had not really prepared me for this. I started from the base and was soon able to get a functional example website (different from the bootstrap 3 built-in website).

During my first week me and my teammates met with the product owner of the Towing Tank Project. He showed us the facility, and explained what he had in mind for the project. What he described is simply getting data from a database and then displaying it on a website, with some export functionality. Because of its size and relative simplicity of its requirements, the project seems like a really good induction into WebDev and ASP.Net Core.

From that we create some tasks, assigned them to everyone and we got to work. I mainly did project set-up:

  • Setting up Dependency Injections
  • Setting the university branding
  • Global Error Handling using middleware

The biggest issues I ran into were simply understanding how to do specific things in relation to the framework. While setting up the branding and the Dependency Injections (using Autofac) was relatively simple, the global error handling part was quite difficult, as I did not know enough about error handling to understand how to do it. A few days later and about 20 hours of tutorials and asking around, I was able to get the functionality working.

I also spent a day hopping from campus to campus, on a mission to find water fountains, gender-neutral toilets and baby changing/feeding facilities to put on UoS Maps. It was a fun and enjoyable experience, and I ended up gathering more data than I expected. I still have Winchester campus and Oceanography Center to go through if the opportunity presents itself. 

What I will be doing next 2 weeks:

We are waiting for a database schema for the Towing Tank Project, so whenever that arrives (hopefully this week) we can start development. In the meantime, I will be doing different bits of work that need to be done, as well as doing more .Net Core training and testing.

Posted in Uncategorized.

Tagged with , , .

Drupal Europe


In 2008 I was first introduced to this thing called Drupal. Ten years later and, via a few changes in role, I eventually found myself at my first Drupal conference; Drupal Europe, in the German city of Darmstadt, near Frankfurt. Home of Frankenstein’s Castle and fabled to be the inspiration of a certain book.

As with a lot of conferences these days, Drupal Europe is comprised of numerous sessions running simultaneously in different themed tracks; from business to education, and virtual reality.

The first session I attended was entitled Lessons learned from applying Drupal in Higher Education Projects. The talk showed us how students were using Drupal to build websites for external companies as part of their course at the University of Hildesheim. This was an interesting concept as it gave the students real-world experience, a commodity which is greatly desired by potential employers.

One trend that seemed to be a theme of a lot of the talks was to the move towards decoupled systems. This is where Drupal is used as a headless backend content management system only and the frontend is deployed using a different framework, such as React from Facebook. There is also partial decoupling where some elements may use a different framework–like a booking system–but the rest of the site continues to use Drupal for the front- and backend.

The totally decoupled approach allows you the flexibility to use one source of truth for a myriad of platforms or apps etc. On the flip side, all the frontend functionality that you get from Drupal itself or contributed modules needs to be created again from scratch on the respective frontend platform.

A highlight for me was the keynote given by Dries Buytaert, the founder of Drupal. It’s nice to hear someone talk so passionately about something and the energy and buzz in the lecture theatre were great. You can watch it here:

The networking aspect of these events shouldn’t be overlooked either. Being able to bounce ideas and discuss shared frustrations with fellow Drupalers over a Weißwurst (a traditional Bavarian white sausage) and Bier (hopefully you can work out what that is!), is definitely one of the most useful aspects.

As a result of these sessions, I have a list of must-have modules for most Drupal sites depending on its purpose.

Coffee: If you’ve used Drupal or any other CMS before you will know that sometimes finding something in an admin menu can be a frustrating experience.

“I know I’ve seen x function here somewhere”

Coffee aims to take some of this pain away by allowing you to search and switch between admin pages real fast. Think spotlight search on an Apple Mac. Pressing Alt + D pops up a modal window and you just start typing.

Paragraphs: Allows site builders and content creators to add content to a website in a fast consistent way using pre-defined paragraph types. These types could be anything from a block of text, an image to a complex slideshow.

Although I haven’t had the opportunity to use Paragraphs on a live site I feel it’s the way forward to give content editors in Drupal the experience they expect.

In summary, I found Drupal Europe to be a useful experience not only in learning about new trends and ways of doing things but also affirming that what we are doing with our websites is correct.


Below is a list of other notable modules every site should consider.

Posted in Conference Website, Drupal, Events.

Life-cycle of a university website

There’s a few phases in the life cycle of a website. Some we do better than others.

The website can go offline or be erased for good at any point. In my lifecycle websites often skip forward to later phases but very rarely step back to earlier ones.

Phases 3 & 5 rarely happen, but I think they are things we need to encourage in the future. Sorry about the rather morbid titles, I’m open to more cheerful alternatives.

  1. Go live
  2. Actively maintained
  3. Embalmed for preservation
  4. Fossilisation
  5. Placed in a repository
  6. Interred
  7. Cremated

Phase 1: Go live

Before it’s visible to the public, most websites spend a few days, weeks or months in an inchoate state. Some websites never actually go live as the need for them goes away, or they turn out to require more work than anticipated.

Phase 2: Actively maintained

This is the time when the owners of the site care about it and keep it updated. The end of this phase can be very clear or interest wanes.

For research project websites people’s funding ends on a date, and few people want to put much effort in after this date. For conferences and other events, the actively maintained phase ends at the end of the event or soon after.

Many sites should be formally retired at the end of this phase, but people keep them around, just because.

Phase 3: Embalming

Why would a website need to exist past the period it’s actively maintained? Often it shouldn’t and the lifecycle should proceed rapidly to the “interred” and “cremated” phases, but there’s plenty of good reasons to keep some sites around:

  • Many research funders make a requirement that websites for a project should stick around for 3, 5 or 10 years after the end of a project.
  • Some sites provide a public record that an event or project happened, what the outcomes were and who was involved. For major events and projects, this may be the source-of-truth for Wikipedia, historians, and future researchers.
  • Extending the previous point, discovering past work can lead researchers to new research contacts, collaborations and other opportunities.
  • Finally, and the most important one: As a university, our job is to increase and spread knowledge. The cream of our websites do this in large and small ways. A simple example is the “Music in the Second Empire Theatre” website I just worked on. You can read my blog post about it. The information on this website could be valuable to music researchers centuries from now.

Appropriate steps for the preservation of a website can prevent it going rotten later on.

Something I’ve given a lot of thought to lately is how to reduce the long term support costs of interactive research outputs. Right now we’re having an incident every two weeks or so where some ancient VM goes to 100% CPU and it’s hard to resolve as it’s the research-output of someone who’s left or retired, and still of value, but maybe not worth the cost if it’s causing lots of work to keep up and running.

Old systems like that are also a bit of a horror story we tell young information security specialists. They can also be a, er, challenge to GDPR audit and secure against hackers.

Some sites and services should be preserved indefinitely but to do that they need to have no back-end dynamic code and be easy to shift to a new URL as really long term preservation will probably mean moving into some kind of repository.

When a site is at the end of it’s “active” life we need the site owner to decide if it’s going to be shut off or preserved. If it’s to be preserved then we should be expecting the site owner to do some work to prepare it. This is where we really drop the ball — we ned to make this chore understood and accepted as the price of having your website preserved beyond it’s “actively maintained” phase.

For conferences, removing irrelevant information and removing future tense where it looks silly.

For research projects this would be to remove any private areas (not an issue for more recent projects which tend to use services for sharing files and conversation rather than .htaccess protected directories and wikis). Research projects also need to remove any placeholder pages for planned information which will never be created, and link to outcomes and publications.

This step is a big challenge as it’s boring and unrewarding for the people who need to do it. Often the end of the project is either a race to hit deliverables or a time when it’s hard to care as you’re about to lose your job. After a big event like a conference, it can be hard to find the energy to tidy up the website for posterity.

Phase 4: Fossilisation

For dynamic websites, and blogs, using things like node.js, .net or php,  this would mean turning the site to static files. Static .html files are at virtually no risk of being exploited by hackers and much less risk of breaking when a server has to be upgraded to a new version of PHP (or whatever back end you use), so making long term support much lower cost.

I like to think of this process as fossilisation. The new site is the same “shape” as the site you copied, but it’s lifeless and rock solid and should last a very long time.

The tool “wget” (as seen in The Social Network) is great for turning websites into fossilised versions. It can even rewrite the links to pages, images, javascript etc. to be relative links so they work on a new web location, or even a local filesystem. The one thing it can’t do is edit the filenames inside javascript code, so really fancy stuff may require some manual intervention after the site is crawled.

When serving a fossilised site, the MIME type of each file is not recorded, and the new webserver will guess the MIME type based on filename. This can be a bother if you had something odd going on, like PDF files that didn’t end in “.pdf”. A more long term solution is to save the website as a single huge WARC file. I’ll explain more about that in the repository phase, but the important thing to know here is that it stores the retrieved MIME type with each file, and wget will generate such a file.

One tip, from embarrassing experience: before creating such a “fossilised” site, make sure that hackers haven’t already done anything to the site!  I was responsible for a website for a major conference, which had a very naive “upload your slides” PHP function. Many years after I turned it all to static files, we discovered some cheeky person had uploaded some spam documents amidst the valid content! So a check for stuff like that is recommended, especially for old wikis.

For sites at this stage I have started using apache to inject a <script> tag which

Phase 5: Placed in a repository

The very long preservation of things isn’t really a job for the IT or Comms department. This is a job for librarians and archivists. While this is currently rare, in future more websites may be preserved in institutional repositories, like EPrints, D-Space, PURE etc.

These repositories may or may not make the deposited website visible to the general public. The long term value of preserving the information then becomes a decision for people with appropriate training. Some sites may be important to preserve for future generations, and which ones may be hard to decide, but the IT department can help by making such preservation cheaper and lower risk.

Such preservation could be of any combination of the original site, fossilised site files, or a WARC file. I’m hoping that, in future, like EPrints might start accepting a .warc format file and serve it a bit like the Wayback Machine does. This seems a good idea for when websites can no longer stay on their old URLs due to domains expiring without budget to renew, and hosting servers reaching end of life.

Phase 6: Interred

People can be very nervous about destroying data so it may be useful to offer to take a site offline, but keep the data, and place a useful message at it’s previous location for a period of time. Say 6 months or 12 months. At the end of this time, if nobody has actually needed the content it gets destroyed.

A word of caution; I’m often asked by academics for a copy of a soon-to-be-erased website, just so they know they still have a copy if they need it. That’s usually fine, but might be inappropriate if the site contains data

Phase 7: Cremated

I like the metaphor, even if it’s a bit morbid. There is a point where a website is gone and we have destroyed all copies of it under our control. This sounds a bit dramatic, and it is. We want people to understand that after this point it’s gone and they can’t get it back. This step is most important on sites with a data protection issue.


I’ve been working on this workflow based on a mix of what we do and what we should do. This is a task from my Lean Six Sigma Yellowbelt project, for which the problem statement was “we don’t turn off websites when we should”.

From having put some thought into it, and lots of conversations with people, it seems to me that where we need to put the effort in is to ensure that we can identify when the site is soon ending it’s “actively maintained” phase and have the site owner either elect to, at a certain date,  have it taken offline and destroyed automatically 6 months later, or to declare that they want it preserved, in which case both they and we (IT) will need to take steps to ensure it’s in a suitable shape.

For most sites that will be a cleanup and then turning to static files. For a few oddball sites, like ones running research code, we need to really think about hard about how to sustain it. Old research websites running unmaintained code to provide cool demos and services seem to be a cause of at least two “Old VM at 100% CPU” incidents this month!

My plan is to turn this lifecycle into a document a bit more formal than this blog post so that it can become a first version of our official process, and sent out to website owners to explain both what we can do for them and their responsibilities if they want us to support them.

How can it be improved, simplified, extended? How on earth do we get academics to do the step to prepare sites for preservation when they’ve already got 50 other little jobs?




Posted in Best Practice, Conference Website, Internet Archive, web management.

Pure Javascript Microrepository: Planning for sustainable websites

We’ve just launched a microrepository titled “Music in the Second Empire Theatre“, which is to say “Opera in France between 1848 and 1873”.

Many years ago we had a team member called Adam Field who produced a number of microrepositories using EPrints, which produced great results. These took research datasets and imported them into EPrints to take advantage of the search and browse-by-value functions.

However EPrints requires quite a lot of infrastructure (a dedicated LAMP server with mod_perl and a MySQL). That felt like overkill to me, so I did some experiments to see if it was possible to load all 10,000 records into a single webpage with all the JS libraries and templates it needed. To my surprise that worked, but I never followed up on it as nobody has wanted a microrepository… until now.

This month we’ve produced the Music in the Second Empire site as a single-page web application. It’s not quite self-contained in a single file but a future version probably would be.

How our JS microrepository works

This site will have two phases of life, an initial phase where it will still be tweaked and new data added but at the end of that phase we have a plan for how it can be preserved more-or-less indefinitely.

In it’s initial phase, Professor Mark Everist is still updating and adding to his dataset. He exports an Excel file from the tool he uses and uploads it to a /data/ folder on the website with the file format opera-2019-01-25.json — the use of ISO date format (YYYY-MM-DD) has the handy result that the alphabetically last file is always the one we want.

When the page loads, the first thing it loads is a file called local.js which is used to say where to get it’s dataset from and if this is the live site. If it’s “development” or “pre-production” then the site shows a big notice to say this is not the production version. It’s also used to turn on/off the debugging versions of the vue.js library without us having to fiddle futher.

While the dataset is in the “still changing” phase, we get the data from a PHP script which loads the latest Excel file from the /data/ directory ans uses a config file to turn the tabular data into more structured data. It also includes the config file in the resulting JSON file as it contains the information the site needs to render the dataset. You can see the combined JSON file with config & records.

After that it’s pure javascript. We use a bunch of common tools (jQuery, Vue, Bootstrap). Most of the templates are in index.html, and if you view the source you can see them in <script type=”text/x-template”> tags. The index.html + config.json + local.js files are the site’s configuration, and the rest of the code is more part of the software which we can reuse.

Preparing for preservation

Where our approach really shines is that all that’s needed to do to move this to a long-term preservation phase is

  • save the JSON output of the PHP file as a .json file,
  • update local.js to point to that file instead
  • delete the /dynamic/ directory which contains the PHP and libraries to convert the .xlsx to JSON

Single file repository

One File to rule them all, One File to find them,
One File to bring them all, and in the darkness bind them.

A normal HTML file often has a whole bunch of files which go with it, even if it’s just a single page. These are usually images, stylesheets and javascript. Our system also has the JSON file containing configuration and the dataset. It’s possible to embed all of this into a single .html file and doing that would make sense for this approach so it’s easy to curate. You can even embed the images as data URIs. In addition to the mega HTML document containing HTML+CSS+JS+JSON+Images, I’d be tempted to also store the JSON file as a separate document so that people far in the future who just want to get the data & schema can do so easily.

Open source?

Well… not yet. This would be an option for the future. In days gone by, this would have made a classic JISC project!

We hope to reuse much of this code on future University of Southampton projects and aspire to making it a generic open source tool.

If you have a suitable University of Southampton dataset, or you’re champing at the bit to reuse this code yourself, get in touch!

Posted in Best Practice, Javascript, Open Data, Repositories, Research Data.

ImPURE thoughts

A while back now we made the decision to use PURE as the university CRIS system. This was a bit of a wrench to me as the ex-lead developer for EPrints, and someone who’s been a long time member of the open access and open source communities. However I made the case for using EPrints as the front end for the open access repository.

PURE isn’t something I’ve learned too much about but there’s some things that frustrate me. One is that doing “batch” processes like removing  a tag from 1000s of records isn’t possible with our contract. It turned out to be easier and cheaper to get an intern to do it by hand.

Another disappointment is that the API doesn’t expose all the data. As a result of the open data & equipment data projects we now have lists of most major facilities and equipment at this university and, critically, they all have IDs. This means we were able to add this as a field in PURE so we can tag research datasets and papers with facilities and equipment involved in their creation. This has some internal uses, I guess, but I was really keen on getting it into the public repository.

We’ve got nice pictures of much of our research equipment, and descriptions. I though it would add an interesting new dimension to our repository but it’s all borked unless PURE make it available in their API.

It did give me the opportunity to redesign the metadata page for which I’m really proud of. Here’s an example. I tried to organise the page so that the most important information was at the top – the bibliographic stuff, with IDs lower down and more specialist tools like export and stats at the bottom.

I also managed to add links to the profile pages of authors currently at the university and to the research pages of relevant schools or research groups. This has helped make our repository pages feel more important to people as rather than a dead end.

nb. the text of the links to divisions on the right hand side is a bit of a mess. It’s on my TODO list.

Posted in Open Data, Repositories.

Testing Strategy in TID Projects

In TID we tend to have a few problems with how we do software testing.

  • Someone does some development but they forget to run and update the tests, so next time someone comes along they are broken and no one can remember why.
  • Testing takes a lot of development time, but we still have simple bugs in the code which only turn up with manual testing or after going live.
  • Our tests don’t help us enough when refactoring because they are too fragile and rely too closely on the internal structure of the methods, class or package.
  • We are often updating our tests when making even small changes to code.

These problems indicate that we have holes in our test strategy that we should be aiming to resolve.

The Shotgun Test Strategy

With a shotgun test strategy, you […] assume that everything and anything can and will be buggy. However you accept that you cannot test everything. Since you lack any solid idea on where to find bugs, you test wherever and whatever comes to mind. You attempt to randomly distribute the test effort, like pellets from a shotgun, within the given resource and schedule boundaries.

Pragmatic Software Testing (Rex Black)

This sounds very similar to the strategy that I have followed in the past, except I don’t even feel like I have necessarily made efforts to randomly distribute the test effort. Often the main determiner of whether something gets tested is how easy it is to test. When I know I can’t test everything in as much detail as I want, I fall back to testing as much as I can with the time available, even if that is not a very useful place to be adding tests.

This results in a great deal of tests, but little planning as to what gets tested and some rather obvious holes in test coverage. Examples of this include:

  • View-Models in Aurelia projects are mostly untested because it’s difficult and we don’t understand it.

A cause of this is that we use technology stacks in which unit testing is not straightforward out the box, and we have not put effort into working out how to write testable code in these environments. PHP frameworks in use (Laravel and CakePHP) and Javascript front-end frameworks are the worst culprits here.

As a result I believe we should put some research and work into working out the best way to make sure our framework code is testable at a unit level. This may require a change in how we write the code for these, including making more use of abstractions layers.

Long Term Value of Tests

In theory the value of tests come in several parts:

  1. They confirm that the code you have written does what you expect it to do at the time of development.
  2. They act as regression tests to confirm that you have not broken the system when adding new features or fixing other bugs.
  3. They allow refactoring with confidence that the system will still do the same thing afterwards.

Currently our tests do 1 reasonably well and do 2 in a limited sense, and are often almost useless for 3.

Writing unit tests along with the code allows us to use them as verification of the code we are writing, which is good as it allows us to quickly check assumptions and behaviour. Where this falls down is when we have well tested units but have neglected to test the interfaces and boundaries between units. Mocks (and other testing helpers) are good as it allows us to test a unit in isolation, but when a unit requires an excessive amount of mocking that is often a sign that it may have complex interfaces behind which bugs in the code missed by unit testing may hide. It also makes the tests more complex, which increases the chance that there are errors in the tests hiding errors in the actual code.

On 2, it is useful to be able to run tests when making changes in the system to check that you have not broken anything, but a problem that frequently arises is when changes in the system do cause regression tests to fail. Often these failures are red herrings, and the new code is still correct but has broken the tests because they were fragile and reliant on something working in a specific way when we didn’t really care about the code behaving in that way. Number 3 is very similar, but in most cases refactoring causes test failures anyway, some of which may be legitimate and some erroneous. When there are legitimate and illegitimate test failures happening at the same time it can be very difficult to rely on your unit tests as confirmation that you are doing things right or wrong.

One solution to this problem is to test at multiple levels. Currently unit tests are the priority for testing business logic. However often higher level integration tests for testing subsystems are used, especially when using frameworks which make pure unit testing difficult, or when retro-fitting unit tests to code which was not written with test-ability in mind (for example legacy code). A more appropriate testing strategy might be to use a multi-level testing strategy where all code is tested with unit tests for detail, integration tests to check that the subsystem is behaving correctly from the perspective of the external observer, and system level tests which test from the very outside of the system (at a UI or API level with a full stack in an operational server environment). Higher level tests should be less fragile and less likely to break (unless the external interface changes, which should be a relatively rare occurrence).

Integration and system tests should be fully Black Box tests, where the system internals are ignored and inputs and outputs are checked only via the specified endpoints that are part of the system interface. This is to remove any dependence on implementation details. UI tests should be run against a deployed test system. To facilitate this we should investigate UI testing technologies, for example Selenium and PhantomJS.

Continuous Integration

We have a number of systems which are under relatively continuous development, the key example being Choices. There would be some value in trying out Continuous Integration for a small number of systems such as this. There is risk that maintaining the CI system and making sure that the tests keep working in it becomes a task unto itself, and we should be careful to avoid this.

A CI system would be able to catch any deployments where the tests were not run properly beforehand. Despite the fact that our development processes include the instruction to run all tests before committing, this is still something that causes occasional problems, and CI would prevent these.


Posted in Uncategorized.

Райна епизод 9: Някой чете ли ги тези статии?

Добър ден! Днес ще си говорим на български. Още миналата седмица щях да пиша на български, но не можах поради технически проблеми.

Тази седмица се занимавах само и единствено с работа по моя проект. Трябваше да довърша втория спринт в сряда, но го довърших в петък, понеже мислех, че трябва да го довърша в четвъртък. На практика дори не го довърших… 🙁


Не очаквах да ми отнеме толкова дълго време да работя върху прикачването на изображения към въпроси и отговори. Оказа се, че всъщност трябваше да направя всички тези неща:

  • Да се науча как се качват изображения в PHP/Laravel/HTML
  • Да се науча как се правят тестове за изображения в Laravel
  • Да се науча как да генерирам нови елементи с JavaScript
  • Да се науча как да render-вам изображение от файл, вместо просто да слагам файла в публична папка, което би било риск за сигурността
  • Да измисля как да дам на потребителите да изтриват избирателно вече качени изображения когато редактират въпрос или отговор
  • И така нататък

Съответно първоначалните ми предположения колко време ще отнеме бяха изхвърлени през прозореца. Очаквах да ми отнеме ден или най-много два, но в действителност ми отне… не искам да знам колко дни. Дори днес, докато демонстрирах проекта пред други стажанти, имаше бъг и изображенията не излизаха като хората. ААААА!

В крайна сметка се получиха добре, ама беше много досадно.

Други нови добавки

Останалите подобрения на системата включват:

  • Студентите могат да “харесват” публични въпроси
  • Студентите могат да си изтрият въпрос, ако не му е отговорено още
  • Студентите могат да си редактират въпросите
  • Лекторите могат да си редактират отговорите
  • Лекторите получават имейл когато има изпратен въпрос
  • Студентите получават имейл когато е отговорено на техен въпрос
  • Лекторите могат да направят въпрос публичен или таен пост фактум
  • Потребителите могат да “тагнат” предишен въпрос като споменат номера му (например #15) и това автоматично се превръща в линк към стария въпрос
  • Лекторите могат да отбележат въпрос като спам или обида

Всъщност е бил доста продуктивен този спринт!

Другата седмица – последният спринт

Другата седмица и началото на седмицата след това са последните дни от стажа ми. плача.гиф

Ще бъде голяма лудница, тъй като имам да довърша следните неща:

  • Свързване на системата с Banner
  • Използване на пресния ни плъгин за single sign-on, който работи с mod_auth_mellon
  • Умно търсене на въпроси
  • Качване на сайта на сървърите, които сме помолили да ни предоставят
  • Козметични поправки и довършителни работи
  • Непредвидими проблеми :)))))))))))))))))

Доволна съм от постиженията дотук, дано да го докарам до състояние в което да е готово за употреба!

Благодаря за вниманието, да живее България и киселото мляко!

Posted in Uncategorized.

Rayna Weekly Number 8

Quickest ever!!

This week I was purely working on sprint 2 tasks. It is literally time for me to leave so excuse the fact that I’ll use a screenshot to describe what I’ve done:


Posted in Uncategorized.