What if NoSQL means Not Only SQL

I was having a conversation earlier today with a fellow software consultant who felt he was in a predicament: he had a problem domain that needed the ability to write versatile queries over related data, but he felt he couldn’t recommend the team use SQL! The buzz and hype around the range of available NoSQL choices had got to a point where he found himself having to justify recommending a tool that he felt would fit the requirements well, but if he put forward an alternative – which after comparing feature sets he felt would be tougher to fit to the team’s needs – it would be an easier sell. Obviously, it’s unfortunate that many teams find themselves under pressure to go in a certain direction – sometimes this is a CTO with a particular itch to scratch, or a (often first time!) tech lead who wants to exercise authority, and this is not limited to data. However, it seems to me that the NoSQL movement needs to be taken in the right spirit.

There is a wide range of database paradigms out there – and that has been true for a long time. The fact that relational databases, specifically SQL, became the de facto choice for so many organizations for so long wasn’t healthy; it came about mostly in response to the emergence of major players in the field who were perhaps too effective at selling their products, and it is fair to say that for much of that time, the alternatives weren’t given enough consideration. What has been great about the NoSQL movement is that attention has been spread to these other technologies, but the danger is that this becomes an “anything but SQL” reaction, rather than a positive opportunity to consider SQL alongside other technologies on an equal footing.

All database paradigms have their own prioritized set of features, which translate to strengths and weaknesses at performing particular tasks. Understanding these trade-offs and how they apply to your team’s project is the key skill. If you’re going to take an alternative technology, and spend your whole time trying to work around it or turn it into something it’s not designed to be, that’s at best a huge waste of time, and more than likely going to be the cause of big problems. Whatever domain you have, the odds are there are a product – or some permutation of products – that suit it well, so why waste time choosing something that doesn’t fit when there are options out there!

Modern application design tends towards thinking of databases as an implementation detail; something that an application owns and uses for a specific task; this is in contrast to database-centric design thinking that had been prevalent before that. Paradigms like micro-services or service oriented architecture embrace the construction of large systems from several small ones, ideally with their own independent data stores. This is great for a number of reasons – firstly if you know that only one small system can make changes, then there are no “whodunit” investigations to try and understand what’s going on, and you also know that you can make changes and have a smaller surface area to ensure backwards compatibility with. Secondly, you have no technical need to make a “one size fits all” choice, but can choose based on the needs of each component. If you have a small problem space where most of the data usage is to work with 1 record at a time, perhaps a document store like MongoDb will suit you well; if you’re dealing with transient data like a shopping cart, maybe something like Redis will be a good match; if you’re dealing with a highly interconnected domain like a social network, then why not try Neo4J. However, if you’re building a component that is going to be working with rich queries and aggregations against related data, SQL is still probably a decent choice!

Another scenario I often see is where an organization has a trade-off between online and offline processing, and may be concerned about having a highly reliable and easy to maintain “source of truth”. There are many database technologies out there that perform very well in their chosen specialism, but simply are not suited to being the master source for mission critical data. Many other databases may well be technically able to do this job well, but finding people with the skills to confidently maintain these systems, or finding an adequate level of documentation is a different story. In this kind of environment it might be an avenue worth investigating to write data to multiple sources; write it to a specialized NoSQL data store for real-time performant usage, and write it to SQL in the background for disaster recovery and other operational purposes. Of course if you’re going to do this, make sure you finish the job and test restoring the other sources from the master backup source; after all it’s only a disaster recovery solution if you know for a fact the recovery works! One example of this that works really well is in a frequent read/infrequent write scenario, like a product catalogue, where you might chose to duplicate the catalogue data into a read-performant system, probably updating this store asynchronously as a background task, while maintaining an offline master store in SQL. One thing SQL really shines at is its transactional guarantees and backup/restore options, which are tried-and-tested and well known – but you also get the benefits of a secondary technology for the other use cases.

It’s important to not get too attached to a particular approach, and also not to unfairly or incorrectly stigmatize an alternative. Any potential solution is only as good as how well it fits the problem it’s being applied to, and there is no such thing as a silver bullet! Resist the temptation to always choose the shiny new thing, especially if you have a hunch on some level that it’s not going to provide all the functionality you need. Better still, find a way to break down the requirements in such a way that the best tool can be used for each job, and get the benefits from more than one technology.