Managing software entropy

In physics, the laws of entropy are that overall complexity tends to increase. Software engineering has borrowed this concept, and it holds true – especially if unchecked – that as time goes on, any given codebase will tend to grow in complexity. As each new feature or bug fix is applied, there can be a tendency for tweaks to be needed in more than one place, and these tweaks accumulate over time. Some growth of complexity is unavoidable, but without a conscious awareness from all members of the team, and a coordinated effort, unnecessary complexity can quickly take hold and grow.

Design for change

One of the most effective ways to control software entropy is to ruthlessly apply good design principles, especially early on. The SOLID principles have become a buzzword unfortunately, but the Single Responsibility principle in particular is a powerful tool to minimize entropy. The Single Responsibility principle is that components should have a single responsibility and a single reason for change. The rule is fairly simple, but often misunderstood and misapplied, especially because it seems to directly contradict the principle Don’t Repeat Yourself. According to Single Responsibility, things should be grouped or separated according to their reason for change. In practice this means that there will be components who at a given point in time may look very similar, and a well-intentioned developer may see an opportunity to combine the components or introduce an abstraction. The danger here is that if the two components have a different reason for change, they may need to diverge over time, and the act of combining them, while it can be seen as a local improvement, is detrimental to the overall system. By managing the grouping of components by their usage, and giving them a small, concrete, clear purpose, the components will tend to be more stable once initially developed, and while the codebase may be larger, the complexity is less, and it is easier to reason about the scope and impact of a change to any component.

Have a plan

One of the pitfalls of Agile adoption is the misconception that to follow Agile, a team should abandon design and big picture in favor of small iterations and incremental delivery. This can be a recipe for some serious entropy growth, as in the absence of a big picture or plan for the system, every incremental delivery goes in a subtly different direction. There is a fine balance here, and the aim is not to impose an architecture on day 1 and constrain evolution of the solution. Architecture in agile is very different to the traditional model, and values constant adaptation of a plan based on real-world feedback. One of the challenges is that at the start of the project, a team always know the least they will ever know, and any decisions made at this stage are unlikely to be optimal. The way to counteract this is to recognize the decisions that are being made, to isolate their impact, and to strive to make the decisions reversible if needed. However, it’s really important for the team to share an understanding of the current desired end state, as it evolves over time – and for the team to actively feed into this evolution. On a day to day basis, every modification to the system should either nudge the solution towards this desired state, or should be used to refine what the desired state is, being shared through the team and adopted as the new common approach. The best teams have an implicit understanding of how their change will fit into the big picture, and trigger discussions if they think a significant change of direction will be needed. It’s ok to deliberately change direction in response to new information – this is what agile is all about – but meandering accidentally because of the lack of a big picture understanding can be very harmful.

Test driven development

On a day to day basis, the way that any change to a system is implemented is what accumulates to eventually define that system. Teams that adopt methodologies such as test-driven-development (TDD) will suffer far less from the adverse impacts of entropy for a number of reasons. One of the artefacts of applying TDD is that the test suite created tends to give good coverage of the solution; well-written tests will describe clearly what the intended outcomes are and why. Having confidence in the test suite gives developers a safety net, and the ability to make changes to a system with visibility of the impact of that change. Tools like continuous test runners (NCrunch for .Net and WallabyJS for JavaScript are my current favorites) give me feedback with literally every keystroke as to whether the changes I’m making are causing tests to fail. The most common reason that making a change can introduce unexpected regressions is over-reuse of a component, which goes back to Single Responsibility principle, but it is the presence of an effective test suite that gives this visibility. Applying TDD also encourages cleanly separated components, and a more robust approach to dependency management, because these principles lead to writing code that easy to isolate for testing.

Write malleable code

I often explain to new developers that if I can only choose 1 property between malleability and correctness, I would choose malleability every time; on the basis that correctness is a function of the requirements at the time, and can change over time, while malleability allows me to make the changes needed to make correct code, either now or in future. Malleability describes how easily something can be worked with or molded, its opposite being brittleness; unfired clay is malleable, it can be shaped easily, while baked clay is brittle, it resists change and is likely to break if you try to change its form. One of the books that has most shaped my career is Clean Code by Robert (Uncle Bob) Martin. The core tenet is that code is not written for the compiler, but for humans, and that paying attention to the readability of code, and how easy it is to work with for others, are first-class concerns. Applying the approaches described in Clean Code will result in code that is far easier to reason about, and to make changes to in future. A couple of examples of these approaches are to name things in a way that makes their intent and purpose obvious, to use subcomponents to make sure any one subcomponent is of a small enough size to reason about, and to write code that is “self-documenting” – that is, don’t add comments to code when it would be possible to make the code describe itself better.

There are a couple of trends in the modern software ecosystem that help developers to write more malleable code. One of these is a trend towards favoring immutability and a declarative style, another is the growth in using higher-order functions. Immutability makes it much easier to reason about the state of a system, because you know instantly that what you see is what you get – in a system where state is mutated, changes can happen in other modules, or on other threads, and it’s hard to know how an object’s state came to be. Some languages, notably Erlang, take this to the extreme of making all state immutable, which even results in compiler and garbage collection optimizations that can be made with the knowledge that state cannot be changed. A declarative style values expressing what you want to happen, rather than how to do it; in the .Net space, things like the LINQ paradigm are a great example of a shift towards declarative (and as it happens, immutable) models, as well as using higher-order functions, or “functions as variables”, normally resulting in code that is easier to reason about.