Types of testing

In order to deliver software, the team need to have confidence in its quality and robustness. The category of activities used to establish that confidence is broadly known as “testing”, but within this overall label there are many different forms of testing, which can deliver different outcomes with different immediate and ongoing costs.

In a more traditional model, testing and development could be separate activities, to the extent that they are done by different people. This approach is mostly considered harmful now, with the recognition that quality is something that the whole team need to strive for, rather than something that can be retrofitted. Developer testing, including unit testing and test driven development among other techniques, is becoming more common, although there is still a valuable role for specialist quality analysts or testers. The main thing testing specialists bring to the table is not the mechanical activity of performing or scripting tests, but rather the way of thinking that identifies test cases and assesses the impact of anything that is found. Developers generally think in terms of making something, and will have a tendency overall to be fairly optimistic and err on the side of confidence, because they know how the product was made and have faith in that process. A good quality analyst is curious to a fault, always questioning everything around them. The two should be thought of as complementary rather than adversarial, and the positive impact a strong testing specialist can have on a team is very significant. The testing specialist’s input on what to test can be combined with an assessment of how to construct a test, and with the developer’s ability to implement that.

As a rule, the team should always aggressively target the cheapest, fastest feedback, most maintainable form of testing. The emphasis should always be on delivering the most with the least cost – both in terms of immediate effort cost and deferred maintenance costs. It is crucial that the team defend their position on testing debt, especially when it can be tempting for some who aren’t directly involved to encourage corner cutting to deliver a feature quickly. In the long run, the only way to deliver quickly is to deliver thoroughly and short cuts – especially around quality or testing – will only lead to problems. It is the responsibility of those who do understand this (often as a result of bitter experience) to help their team avoid these traps.

Type system checking

Often overlooked, but incredibly powerful, is the technique of using a type system to enforce constraints on the codebase. This is becoming more widespread, with languages like F# particularly noted for its expressive type system – check out Scott Wlaschin’s excellent talks on the subject. The idea here is simple; define the domain model and the codebase in general in such a way that invalid states will not even compile. Whereas every other form of testing here technically could allow faulty code to run and be shipped, using the compiler to defend against defects gives the clearest possible red light that cannot be worked around or go unseen. An example of this that is growing in popularity is the use of compilers/transpilers for JavaScript, such as Flow or TypeScript, which can prevent entire categories of defect without even running the code. An extension of this is to use code generation to validate schemas across language boundaries – such as the schemas of API requests and responses, which could be implemented in a different language and be out of sync between the client and server; this gives a single “source of truth” for the schema rather than a duplication in different languages, and allows fast feedback if there are discrepancies.

Component testing

Otherwise referred to as “unit” testing – a name I dislike because it is ambiguous and misunderstood – the idea here is to test components. One issue with the name “unit” testing is the disagreement over what a “unit” is; some consider a “unit” to be a class, or a method, but the limitation of this is that these are implementation trivia, coupled to the specific approach taken. People who follow the approach of testing at class or method level often find themselves down the line with a test suite that can be brittle, because the tests have been applied to somewhat arbitrary boundaries that will change too often to be maintainable. The definition I prefer is that a “unit” is a “unit of behavior” – a set of collaborating sub-components that deliver a business requirement. These tests should exercise as much of the logical complexity of the codebase as possible, but never touch infrastructure (such as a disc, database, network, or similar). The tests should execute completely in-memory, and provide the main feedback loop while developing.

The challenge for some developers in applying this technique is in being able to isolate infrastructure from logic, with leaky or inappropriate abstractions and coupling the main issue. The best way to get better at making these design decisions is to apply strict test-driven-development for a while; it will become more natural to use testable approaches by default. The discipline of test-driven-development can be a very effective way to create a good suite of component tests; however TDD is about much more than a test suite. A style that works very well here is to apply the “ports and adapters” pattern, also known as the “onion” or “hexagonal” architecture, which places a very high value on having a domain that is isolated from “ports and adapters” which connect it to infrastructure. Done well, the domain has no reference to any form of infrastructure and the infrastructure has no logic about the specific domain.

An approach I like when writing component tests is to use behavior-driven-development, which although often thought of in the context of UI automation or end-to-end tests, is merely a way of expressing test cases that can be implemented in many different ways. The idea here is to write test cases in language that a business expert can easily understand, and to separate the description of what the system does from how it does it. The big benefit is that these can replace “dead tree” specifications (that get out of date as the application evolves) with an executable specification that both verifies and documents the behavior of the application.

The big benefits of component tests are their speed of feedback and low cost. They can cover a vast amount of the logical complexity of a system in very short time (often measured in seconds for thousands of tests). Of course it’s possible to write component tests badly – as it is possible to do anything badly – and that can reduce the value. If you’re trying component testing and running into frustrations, it’s probably an indicator that something else isn’t quite right, most likely that infrastructure concerns have leaked or that the tests are too tightly coupled to the implementation. The cost of component tests is that they cover a relatively shallow stack; the trick is to move as much complexity into the parts of the system that can be easily component tested, leaving the hard-to-test parts as simple as possible, or delegating that testing by re-using a third party library that is already established as being robust.

Integration testing

Integration tests are great for exercising the boundaries of the application, that couldn’t be tested by component tests, such as the integration with a database or other infrastructure. If you’ve been able to manage the complexity of the infrastructure code well, integration tests can be quite simple. Integration tests should test real infrastructure, and are designed to catch things like badly formed requests or queries, problems in a database schema, or other things that are external to the main codebase. They can often be convention-based, where the test will scan in some way and dynamically expand the test suite, for example scanning for all queries and executing them, or randomly generating objects against a schema to insert.

Integration tests will take longer to run than component tests, but should be much fewer in number, as a result of keeping the complexity of these parts of the system as light as possible. An anti-pattern that can really harm a team is to fail to segregate logic from infrastructure, which condemns the team to relying on too few component tests and too many integration tests, and a severely impeded feedback loop. If you have to drop one type of testing, integration tests are probably the one that I’d choose because the things they test can be tested in other ways; for example if the integration between the application and database is broken, it’ll be visible through automation tests or by simply running the application. It’s also worth considering that the infrastructure code should be much slower-changing than the domain logic around it, and should be relatively stable for most of the lifecycle of the project with the exception of the start and whenever a major change is needed, at which point one-off testing activities can be scheduled.

Automation testing

Automation testing is great for giving an end-to-end health check of a deployed application. It is the best way of identifying problems with the configuration or deployment itself, and with the connection between components. However this comes with a cost; these tests are often expensive to write and maintain, and are orders of magnitude slower than component and integration tests. Automation tests, that execute by driving a user interface with simulated interactions, are also famously brittle, although there are patterns that can manage that brittleness. The biggest pitfall is again to over-rely on automation tests, and try to use them to cover logical complexity – which can be tested much more effectively at component level.

One important consideration with automation tests is that they have to belong to the whole team; it’s not going to work if you have “the automation guy”, and nobody else has any awareness of the automation test suite. Everybody in the team should understand how the tests work, be able to write or amend a test, and should be able to interpret the results. The results from automation tests are more likely to be need interpretation; it’s more common for an automation test to fail as the result of transient issues or changes to the user interface, and the main way to debug this is to repeat the test manually and investigate. Many automation tools offer functionality like recording screen captures as they go, which can give a good reference point.

The big challenge of automation tests is usually data; the tests will need to know some examples of things they can browse to or click on. Ideally the tests will create and destroy their own data, otherwise the tests might not be repeatable or may not be able to run in parallel. Sometimes this will affect prioritization of some features, where the ability to manage data through the user interface or an API could otherwise be deferred in favor of manually managing data, but it would not be possible to use automated tests without providing an automatable way to do this. It’s also really important to know which tests mutate data and which don’t; “safe” tests can be used to smoke-test the production system, while tests that mutate data need to be treated with a high degree of caution!

Manual exploratory testing

Having a person manually explore the application looking for defects is a very expensive way to test, because it by definition cannot be automated and does not follow any script, and takes a person’s time. However, the benefits of this technique are that it is the fastest way to expand the range of tests performed, and will find the most new defects, because the tests are hopefully covering previously untested areas of the application. Manual exploratory testing is a great activity to have going on in the background, using otherwise spare time to look for defects. It is also a really good way to smoke test and sense-check a newly implemented feature; for things that are highly subjective and hard to measure, such as aesthetics or user experience, having a person manually test may also be the best or only option. A golden rule for any form of manual testing though is to constantly assess the value added against the very high costs, and to use it mostly as a last resort, preferring repeatable automated tests.

Manual regression testing

Any form of manual testing following a script should be avoided wherever possible (and it’s nearly always possible!). Manual regression testing should always be considered technical debt; it exists because the team failed to adequately create automated tests in any other form at the time. At the end of every iteration, any new feature that does not have automated tests will have to be retested manually at the end of every future iteration until automated tests are added, which can cripple a team’s ability to deliver. It is the whole team’s responsibility to always think of the future health of the project and the consequences of every decision they make on a day to day basis, and allowing a buildup of low-value high-cost time-draining activities is in nobody’s interest!

User feedback

User feedback is a little different than the others in that it can be highly subjective, but that doesn’t make it any less valuable. This one will probably need some involvement from the business; perhaps a closed beta program with customers that are friendly, or workshops with super-users. Many products offer things like an early access program, where people can choose to use a product that they know isn’t 100% production ready yet, and can give feedback. The value for the user in these schemes is that they might get access to a shiny new feature sooner, or be able to influence the development process with their ideas. For the organization, managed properly, this can be great – it gives an opportunity to refine the product based on what people who will actually use it (or not!) are saying. One of the main points of agile delivery is to get and act on feedback as early as possible, rather than build up a big deferred release with high investment that might not be well-received.

Techniques such as A-B testing, where some users get a different version to others, are a more advanced form of using feedback. The point here is that the user doesn’t know they are an A or an B, but by collecting statistics about usage, the business can measure which is more effective. This can often apply to more subtle differences; for example the business might try a different logo or different description, and measure whether more people buy with this version.

Performance and other non-functional testing

Often overlooked, performance testing and other non-functional testing such as reliability or resilience testing should be considered early in a project’s lifecycle. They are often the most expensive tests to write, maintain and execute, and are highly specialized. The danger is that sometimes a team will only resort to these tests when they already have a problem; it is much better to write a simple test harness early on, and to run it on a fairly regular basis, to detect the point at which a problem starts to develop. There are often a finite set of inputs into a system – HTTP requests, messages on a queue or similar – and simulating these with a high volume is normally a good starting point.