In TID we tend to have a few problems with how we do software testing.
- Someone does some development but they forget to run and update the tests, so next time someone comes along they are broken and no one can remember why.
- Testing takes a lot of development time, but we still have simple bugs in the code which only turn up with manual testing or after going live.
- Our tests don’t help us enough when refactoring because they are too fragile and rely too closely on the internal structure of the methods, class or package.
- We are often updating our tests when making even small changes to code.
These problems indicate that we have holes in our test strategy that we should be aiming to resolve.
The Shotgun Test Strategy
With a shotgun test strategy, you […] assume that everything and anything can and will be buggy. However you accept that you cannot test everything. Since you lack any solid idea on where to find bugs, you test wherever and whatever comes to mind. You attempt to randomly distribute the test effort, like pellets from a shotgun, within the given resource and schedule boundaries.
– Pragmatic Software Testing (Rex Black)
This sounds very similar to the strategy that I have followed in the past, except I don’t even feel like I have necessarily made efforts to randomly distribute the test effort. Often the main determiner of whether something gets tested is how easy it is to test. When I know I can’t test everything in as much detail as I want, I fall back to testing as much as I can with the time available, even if that is not a very useful place to be adding tests.
This results in a great deal of tests, but little planning as to what gets tested and some rather obvious holes in test coverage. Examples of this include:
- View-Models in Aurelia projects are mostly untested because it’s difficult and we don’t understand it.
A cause of this is that we use technology stacks in which unit testing is not straightforward out the box, and we have not put effort into working out how to write testable code in these environments. PHP frameworks in use (Laravel and CakePHP) and Javascript front-end frameworks are the worst culprits here.
As a result I believe we should put some research and work into working out the best way to make sure our framework code is testable at a unit level. This may require a change in how we write the code for these, including making more use of abstractions layers.
Long Term Value of Tests
In theory the value of tests come in several parts:
- They confirm that the code you have written does what you expect it to do at the time of development.
- They act as regression tests to confirm that you have not broken the system when adding new features or fixing other bugs.
- They allow refactoring with confidence that the system will still do the same thing afterwards.
Currently our tests do 1 reasonably well and do 2 in a limited sense, and are often almost useless for 3.
Writing unit tests along with the code allows us to use them as verification of the code we are writing, which is good as it allows us to quickly check assumptions and behaviour. Where this falls down is when we have well tested units but have neglected to test the interfaces and boundaries between units. Mocks (and other testing helpers) are good as it allows us to test a unit in isolation, but when a unit requires an excessive amount of mocking that is often a sign that it may have complex interfaces behind which bugs in the code missed by unit testing may hide. It also makes the tests more complex, which increases the chance that there are errors in the tests hiding errors in the actual code.
On 2, it is useful to be able to run tests when making changes in the system to check that you have not broken anything, but a problem that frequently arises is when changes in the system do cause regression tests to fail. Often these failures are red herrings, and the new code is still correct but has broken the tests because they were fragile and reliant on something working in a specific way when we didn’t really care about the code behaving in that way. Number 3 is very similar, but in most cases refactoring causes test failures anyway, some of which may be legitimate and some erroneous. When there are legitimate and illegitimate test failures happening at the same time it can be very difficult to rely on your unit tests as confirmation that you are doing things right or wrong.
One solution to this problem is to test at multiple levels. Currently unit tests are the priority for testing business logic. However often higher level integration tests for testing subsystems are used, especially when using frameworks which make pure unit testing difficult, or when retro-fitting unit tests to code which was not written with test-ability in mind (for example legacy code). A more appropriate testing strategy might be to use a multi-level testing strategy where all code is tested with unit tests for detail, integration tests to check that the subsystem is behaving correctly from the perspective of the external observer, and system level tests which test from the very outside of the system (at a UI or API level with a full stack in an operational server environment). Higher level tests should be less fragile and less likely to break (unless the external interface changes, which should be a relatively rare occurrence).
Integration and system tests should be fully Black Box tests, where the system internals are ignored and inputs and outputs are checked only via the specified endpoints that are part of the system interface. This is to remove any dependence on implementation details. UI tests should be run against a deployed test system. To facilitate this we should investigate UI testing technologies, for example Selenium and PhantomJS.
Continuous Integration
We have a number of systems which are under relatively continuous development, the key example being Choices. There would be some value in trying out Continuous Integration for a small number of systems such as this. There is risk that maintaining the CI system and making sure that the tests keep working in it becomes a task unto itself, and we should be careful to avoid this.
A CI system would be able to catch any deployments where the tests were not run properly beforehand. Despite the fact that our development processes include the instruction to run all tests before committing, this is still something that causes occasional problems, and CI would prevent these.