Choosing databases

The recent explosion of database technologies gives teams a wide range of options, and the chance to tick off “NoSQL” on the buzzword bingo scorecard. Each paradigm, and each product, has its own set of features, that may or may not make it applicable to any given project. The easiest trap to fall into is to choose a more exotic data option too quickly, perhaps one where the features and strengths aren’t needed, but where the weaknesses significantly impede the delivery of the functionality the business actually needs. Without a good architectural design that isolates the database and other infrastructure from the core of the application, this can be very expensive to rectify. I’ve seen this go well for teams, and I’ve seen it go very badly.

The main thing to understand with the plethora of different database options is that each has its own strengths, where it is designed from the ground up to excel, but this is often a trade-off, sometimes coming with whole other areas of functionality omitted completely or (deliberately) sacrificed. I try to think of each as a “radar chart” of its relative strengths – draw a circle with the key attributes or functionalities around the edge, and score each by placing a dot at the corresponding distance from the centre. A strong all-round database will have something that looks like a circle – probably never being the outright winner for any criteria, but never being particularly bad either. A highly specialized database will have sharp spikes for its strengths, but may have some other features that don’t exist at all. For example, the tried-and-tested SQL databases are a good example of a generalist; they have evolved so much that there are very few use cases that are particularly bad or impossible, but at the same time there are many use cases where a specialist will beat them, such as ease of development relative to a schemaless document store, or for retrieving a single complex object relative to a key-value store. Meanwhile a specialist database like MongoDb has cases where it is very strong, like loading and saving a single entity, but areas such as aggregation and querying, or modelling relationships between entities, where it is much more limited than other options. If you have a business domain that needs lots of complex queries – such as a reporting tool – it’s highly likely that a document store won’t be the right fit, but if you’re making something like a mobile game, where each user acts in relative isolation from others, the overhead of joins and relational models is just not needed.

 

By far the worst thing I’ve seen a team do is make the decision to write their own database. Serious considerations in choosing something that will hold your business’s valuable production data should include the less shiny but essential things – security, reliability, performance at scale, robustness, ease of maintenance and so on. Hundreds or thousands of man-years or more go into battle-testing databases, and asking your business to pay that cost or to gamble on skipping that is extremely irresponsible to say the least. There are going to be lots of edge cases you just don’t have the time to engineer for, and frankly it’s a massive waste of money when there are so many options out there.

I’d highly recommend a newcomer to the world of NoSql databases – which many interpret as “not only SQL” rather than “no SQL” – a great place to start is by reading Martin Fowler’s book. The book describes some of the most popular database paradigms, with examples based on a popular product in each category. It’s also not difficult to find a lot of comparisons other people have already done, with big feature tables and opinion. Take every opinion in context though; if the reviewer’s needs don’t match your project, then the weighting they attach to features will be almost irrelevant. The opinions I find most valuable are those from people who have got an experience of supporting the database in production – they might be able to save you the dreaded awkward 2am phone calls!

Consider the delivery team you’re working with; what will give them the best delivery experience? Also, respect the skills and knowledge your organization have and their degree of willingness to pick up new things; if you have an organization with a more traditional operations setup, you as a developer will need to win them over to anything unfamiliar – make sure time and resources for that knowledge sharing are included. It’s a good idea to start early, and involve the people who will be supporting the application. Get a list of their requirements and the things they want to know how to do, and treat these support requirements fairly in prioritization.

Another factor to consider is your overall solution architecture. One of the big benefits of adopting a service oriented architecture, where each service is independent and responsible for its own behavior and data, is that the services don’t all have to be the same. If there is a service that is particularly suited to a certain category of database, then the other services shouldn’t even know, because the decision is encapsulated. The data owned by a service should be exposed by application level APIs only, rather than offering integration directly at database level. This also lowers the risk of changing database technology later, because you will be able to reason about the extent of the impact of the change, which should be only the service that owns the particular database.

Often the start of a project is when teams know least about their requirements, which will emerge over time. An approach I quite like is to choose a generalist database – normally SQL – to start off with, as it will be familiar to the team who are probably already busy learning the new domain and project and any other new technologies that have been introduced. Make this choice in the full knowledge it could be wrong, and encourage the team to make every effort not to “paint themselves into a corner”. As you learn more about the problem space, it will either become obvious that there is a better option, in which case switch to it when the time is right, or it will become obvious that the original choice was “good enough” and that there is no need to change. It may also become obvious that there are sub-parts of the solution that have very different needs, which may be the point at which these become separate services, broken off from the others and with their own separate database. Either way, this deferred decision is made with far more information to guide it, and that’s what agility is all about.