Hammers, Screws and NoSQL

There’s a clichéd expression I like which is “when all you have is a hammer everything looks like a nail” – the meaning of course is that sometimes when a person only knows one way to solve problems, in their mind every problem can be solved that way. In the modern software ecosystem, there are such a wide diversity of tools and technologies available that the main constraint in designing a solution is not really one of technology being available, but being suitable for the skills and ways of thinking of the people who will design implement and maintain it.

When a new paradigm of a particular technology is very obviously different, the need to apply a different way of thinking is also obvious. The learning curve can be steep, but at least everyone will recognize that they need to learn and respect the different paradigm. I’ve seen teams struggle much more when a technology looks on the surface like it can be bent to fit the team’s existing thinking, because as a means of marketing the product, features that look at a glance like they map to other products have been added.

A great example of this is that some can perceive the indexes offered by some document databases, and queries against a SQL database, to be broadly equivalent. There is some degree of similarity – both offer a means of searching the database by an arbitrary set of parameters. However, the mechanism by which the two work is very different – and recognizing and leveraging those differences is key. I’ll stick with our indexing example, which happens to be based on RavenDb, a popular document database built on the .Net framework. The design of RavenDb is deliberately designed to be welcoming to newcomers, and a lot of things that the designers don’t intend the database to be good at will still work, as a way to be accommodating… up to a point. Critics of the database often share the opinion that it would be preferable to present a more consistent experience by not allowing these things, ever; that way there would be no surprises later on when a limit is hit.

One such thing that RavenDb does to help newcomers is that it will dynamically create indexes at runtime; however crucially, all indexing is asynchronous, and so if a newly created index is executed immediately, it will always give 0 results – because the index hasn’t been populated in the background yet. This creates a situation where it is possible to run an identical query twice on a database with no writes taking place, and get vastly different results – which will of course be very confusing! The paradigm here that newcomers need to embrace is eventual consistency, whereby the system will settle to a consistent state given time, unlike SQL databases where the consistency model for queries is much simpler. There is nothing inherently wrong with either behavior, but there are situations where one could be preferable to the other for the specific context, and understanding the difference and applying it to your context is essential.

A temptation for newcomers who are seeing this behavior and who haven’t designed for it is to try to work around it rather than embrace it. RavenDb does offer an option on every query to “wait for non-stale results”, which gives the same behavior as SQL, in that the read results will be consistent every time. The trade-off here is obvious and is in the name – the way RavenDb counters eventual consistency if you ask it to is by simply waiting. In a development or test environment, this probably won’t take long, and won’t be an issue. When the project hasn’t been live for long and doesn’t have much data, it probably won’t take long either, but this is very dangerous, because sooner or later as a system grows in its data volume and user traffic, using the “wait for non-stale” variants will cause major problems.

(It should be noted that RavenDb does offer an extensive configuration model, where a lot of these behaviors including the automatic indexing model I’ve described here can be adjusted to suit your needs, and RavenDb’s documentation explicitly cautions strongly against using the “wait for non-stale” variants outside of integration tests).

So what can you do about it? Embrace the paradigms of your tools and technology! In this example, the problem is because people are thinking in the patterns of another technology. Modelling data in a relational SQL database and in a non-relational document store such as RavenDb are very different. One pattern that works nicely in document stores is to treat transaction boundaries as documents, which gives a very different shape to the well-established entity modelling that is the de-facto approach in SQL databases. Querying across multiple documents in document stores can be done, but it is not the database’s main strength. However design for loading by key, and shape the data differently when writing, and the user experience will be much better. Eventual consistency can seem tricky at first – but if you know and understand that it is there, that’s a good start. Thinking in terms of the new technology will allow you to get the best from it, rather than trying to turn it into something it’s not!