Working with low test coverage

Having good test coverage is invaluable to a software team, because it gives them the ability to make changes and quickly see the results. A good test suite serves 2 main purposes: it provides a safety net that shows the impact (intended or otherwise) of changes to the codebase, and it provides documentation of both the intended behavior and intended usage of the modules. Working without this lower-level automated coverage means not being able to make changes with confidence, and having to spend longer than necessary on finding problems by other means – normally debugging and manual testing. However simply downing tools and tackling the technical debt that comes from not having tests is not usually a pragmatic or viable approach, so there’s a need to find a way to improve the situation while still delivering.

Finding hotspots

In a codebase of any size, one of the first challenges is knowing where to start. There’s a few techniques I use to quickly assess a way in. The first technique I’ll use are static analysis tools. Things like ReSharper, NCrunch’s coverage features and NDepend will give me a very quick overview of the playing field. I’m looking for what coverage does exist (if any), and what this coverage looks like. Do the tests express intent? Do they have complicated setup code? Are they at a sensible level, i.e. describing behavior or implementation? Do they lean very heavily on mocks and stubs for anything other than infrastructure?

Another great source of insight is logging – however this is also often missing! The ideal here is to have at the very least logging that captures the origin and a description of all unhandled errors, in both the UI and servers. For capturing client-side errors, tools like Track.js are great – they send error logs back to a server for analysis. For web server errors, things like the ELMAH plugin for ASP.Net offer a very low-cost way to get started. Cloud platforms like Azure and Amazon Web Services have some level of logging provided, which is a good starting point, but if there are many components then you’ll want to aggregate the logs into a single store so that they are easier to work with. The logs will tell me what the general level of error-handling is in the codebase, and whether there are any areas of the application that are particularly noisy. Equally, if the application is simply swallowing exceptions, the log will be eerily quiet!

Cheap exploratory testing

In the absence of automated tests, the most expensive thing to do is to have people perform manual regression tests. They simply won’t be fast enough, and will cost too much. A cheap way to get some of the benefits is to deploy “monkey testing”. The idea with monkey testing is that an automated tool simulates random user input, exercising the system in ways it was probably never intended to work. These random inputs shouldn’t break the system, i.e. it should be robust enough to withstand this testing. The monkey test continues, capturing its findings (maybe in the application logs), until a certain number of errors are found. The team can then look at these errors, and fix their causes if needed. Monkey testing alone can be very quick to get in place – check out things like Gremlins.js – and while the technique has obvious limitations, it usually offers a good return on very low investment.

To get more from monkey testing, it can be combined with automation of user journeys. This is particularly helpful if there is a complex workflow, and random interaction would be unlikely to get the application into meaningful states other than the default. Simply automate a path to a waypoint on the user journey, and start monkey testing from this point. You can quickly add new partial journeys by combining parts of previous journeys to build up a suite of monkey testing scenarios. These partial journeys can be built up, and ideally should give an end-to-end flow of the happy path of application usage. The end-to-end happy path serves as validation that all components are deployed correctly and able to interact with each other.

The aim at this stage is to identify the areas that are most in need of further attention. There’s no value in trying to build up an extensive catalogue of every fault, because there aren’t the resources to deal with it. Simply establish 2 or 3 areas that are draining time in debugging or where the end user experience is particularly compromised, and tackle these. You can always repeat the exercise at any stage with 2 or 3 different areas.

Quickly abandoned TDD

What I often see are a handful of very superficial tests, that look like they were written early on as a sign of good intentions that died out. Writing tests, and applying Test Driven Development, are skills that take time to learn, and it’s quite common for engineers to start a project with the intent of following TDD and writing good tests, but for there to be a skills gap. While learning TDD, most developers will go slower, and if there is a perception of pressure to deliver quickly tests are often the first thing to go “because TDD is slower”. Personally, I find that pragmatic TDD combined with sensible application of software design principles are the fastest way to work once a developer is familiar and confident in applying them. There is a rhythmic flow to the red/green/refactor flow and complex features can be delivered far more effectively with the forethought necessary to construct scenarios before implementing. However, what is true is that the first scenario often takes a little longer to write, because it takes time to set up testing fixtures.

The trick here is all in the design of the code; code that is separated out cleanly with logic isolated from coordination isolated from infrastructure is very easy to test while code that has logic, infrastructure and coordination intertwined is very difficult to cover with tests. The problem here starts in the developer’s mind – they have an idea in their head of what they expect code to look like, and each new component ends up looking like that preconceived design. One of the benefits of TDD is to change this way of thinking, allowing testability to shape design decisions. By writing a test first, against the public interface of the production code, the shape of the components is nudged in a way that lends to a simpler and more maintainable design. Once this more maintainable and cleanly separated style becomes the subconscious design in a developer’s head, and it becomes second nature to write in this way, TDD becomes very easy and doesn’t take any additional time. It is actually not that TDD causes things to take longer; more that TDD exposes design problems and it takes longer if you try to work around these problems rather than recognize and resolve them. The challenge in retrofitting TDD is the absence of a testable design.

The important outcome here is to change direction. If the way that new features are being added is adding further testing debt, then the team are chasing their tails. Techniques like pairing are very useful – perhaps someone who is experienced with TDD and designing testable code with someone who is learning, but pairing 2 learners also gives benefits. A pair will keep each other honest, and bounce ideas off each other. It is important to instill a culture of learning, where it is ok to say “I don’t know”, and where the team look to learn from their experiences.

Merciless refactoring

The method I use to bring systems under test is to establish a way of knowing things still work at a high level – this might be an automated scenario, an end-to-end test, or a simple and fast manual test. I wouldn’t spend too long on this smoke test, because it’s only for my benefit while I work on putting better tests in place. With this crude safety net, I will refactor towards separate components, focusing on creating isolation of the domain and logic away from infrastructure as quickly as possible. I’m going to try not to change the logic or the implementation at this stage, but simply add layers to isolate things. These layers can look pretty ugly, but again they won’t be around for long – their job is to enable better tests to be written, which means further refactoring can take place. In many systems, there is a “request-response” paradigm – for example a server responding to a HTTP request – and this gives a high-level façade that can be easy to get in place.

I’m ideally going for a logical function that takes an input request, and produces an output response, with no side-effects – this is known as a “pure function”. There will be infrastructure that maps the user’s inputs into a function call (perhaps HTTP endpoints or controllers), and infrastructure that processes the response (updating a database, returning a response to the user). Separating these out means that the bit in the middle will become clearer and easy to test with simple and fast component-level tests. The infrastructure should also become quite simple – it takes data in, and produces a side-effect that can be observed – normally with very little in the way of conditional branching (if statements, while loops and so on). It might not even be necessary to bother trying to test the infrastructure with conventional “unit test” style fixtures, because it should not be changing very often, and will be very easy to see if it’s not working from an end-to-end test, meaning the value of tests around every individual infrastructure boundary can have a lower return on the higher cost of creating them.

Slowly but surely

If there’s an immediate problem, I might argue for dedicating time to a specific fix – putting time aside purely for the purpose of sorting out the test coverage in that area and fixing the issues that come out. Otherwise, the approach that is most palatable both technically and for the business is to mix improvements to the test coverage in with feature delivery. If a feature modifies an existing area, take a little time to improve that area as part of the feature. It’ll take longer as a one-off investment, but once this area is done it will cease to cause problems. Repeat this as you go, and the coverage will soon improve quite considerably. New features should of course be done right first time – stop creating debt! If delivery takes longer while you learn to deliver with tests, be honest with stakeholders and take the time to improve. Putting in place a metric can help this, as if people see a graph of code coverage slowly creeping up it can be a motivator. Beyond metrics though the better outcome are visible results – if the team can see and feel an improvement in test coverage, it can take pressure off them, and help them to push forward effectively and with confidence.