In message driven systems, there are two types of events that can be used, with very different semantic meaning. These message types transcend specific technology choices, although some of the desirable traits are harder to achieve with some technology stacks. When designing for a Microservices or service-oriented architecture, understanding the difference between the message types can lead to a far more robust and maintainable solution.
A Command message is a request to do something, usually named in the imperative voice (DoSomething). This means that:
- The sender knows that the destination service offers this functionality
- The sender chooses when to send the command
- The sender has to know the identity, and possibly the location of the destination
- The command is owned by the recipient, and each command must have exactly 1 destination
- Commands may be sent from anywhere
- The recipient is the authority on processing this command
- The recipient may reject the command
- The command may fail
As software developers, Commands are something we are taught very early on. Straightforward method calls are examples of commands, and all that changes when moving this paradigm into a distributed system is the mechanism of invocation. Commands have their place, but their use should be limited. One of the problem with commands is that the sender has to have quite a lot of knowledge about the command they are sending, and this leads to a leakage of knowledge from the recipient’s domain. In a robustly designed set of services, each service is logically independent, but the introduction of commands undermines this by introducing coupling. A more appropriate use of Commands is as part of the external-facing surface area, and a mechanism for user interaction.
One example of a situation where Commands are applicable is when there is an infrastructure-heavy activity, and the orchestration of that infrastructure is made available by a domain-agnostic service whose sole responsibility is orchestration. The classic case for this is sending emails, where the infrastructure coordination is significant enough that it is worth keeping in one place, which then takes responsibility for things like retries and failed delivery, keeping the other services free of these concerns. Coordinating communications through an API gateway is another case where Commands can provide a buffer between logic and infrastructure.
One of the challenges of using commands is that they can never be fully asynchronous, in that the sender will normally want to know whether the command has been processed. If the recipient rejects a command, it is normally desirable to tell the sender why it was rejected. In a system built on queues, this doesn’t work particularly well, because the recipient should not know the origin of commands. My preferred approach is to use HTTP for commands, such that the HTTP request serves as a transactional send of the command, and the response code indicates the status of the request. The sender can then interpret the response code and make informed decisions about what to do next. A criticism of using HTTP for commands is that there is a temporal coupling – the recipient has to be running and able to receive commands, or there will be an error. My opinion is that the coupling doesn’t come from the dispatch mechanism, but from the logical and knowledge leaks of using a command in the first place, and so if this is a problem a command shouldn’t be used. Another approach, especially for long-running command processing, is to use HTTP but to return a 201 created response with a “job id” to the sender in the first request, so that they can then poll for status.
If commands are being used, the key is to not make inappropriate promises to the sender by having the wrong interface. Offering a “Send” method for command with no indication of whether the command was processed makes the implicit promise that the command will eventually be handled, but gives no true way of knowing whether this happened. Commands should not return data (that would make it a query), but a simple indication of status or a handle to get an indication of status. In the .Net framework, the Task object is a suitable response.
Queries should also be very familiar for all developers; they are the equivalent of Commands that return data. All of the properties of Commands listed above are also true for Queries, but additionally Queries have a requirement to provide a response in a timely manner. Use of Queries between services is usually not appropriate, because it implies a leakage of domain knowledge and tight coupling; the point of services is to be autonomous and independent, and these principles are violated by using cross-service Queries. As with Commands, Queries should be used mainly as an external interface, and for user interaction.
It is again a fairly natural choice to use HTTP for queries, as it is well suited to providing data in the response. In the .Net framework, Task<TResponse> is a suitable type for modelling Query interactions, because it allows for a success/fail indication with reasons, and for returning data.
In my experience, Events are the harder paradigm for teams to understand and apply correctly. Things like the Reactive manifesto are making the paradigm more common, but it is very different to most of the traditional teachings in software design and hence less familiar. For Events:
- Events describe something that has happened
Events cannot semantically be rejected by recipients, as they describe a change that has happened
- Rejecting or failing to process an event results in a corrupted state that must be repaired
- Would-be recipients of an event need to subscribe
The publisher does not logically know the identity of subscribers
- At runtime, dispatching events requires knowledge of the subscribers, but this is an infrastructure rather than logical concern
- The publisher does not know or care how subscribers handle an event
- Each event must have exactly 1 publisher
- Events may be subscribed to by multiple subscribers, and any service may subscribe to any Event
As Events describe something that has happened in the past, the key thing (that I know I’ve repeated!) is that recipients can choose how they process an Event, but they cannot reject it. Events are normally named in the past tense (SomethingHappened), and it is normally possible and desirable to make these names map neatly to the business domain and vocabulary. One anti-pattern that is commonly applied by novices to designing message-based systems is to produce events that describe data changes – EntityCreated, EntityUpdated, EntityDeleted. There are almost always better names for these things – perhaps CustomerPlacedOrder, CustomerAmendedShippingAddress and CustomerCancelledOrder as examples. Events should contain enough information as to accurately and completely describe what has happened, without a need to back-query. The anti-pattern here would be EntityCreated(entityId), where a subscriber cannot do anything with that message other than reply to ask for more information. The exception to this can be where there is a lot of information being exchanged, at which point it may be an indication that the boundaries between services are incorrect, and one service is not able to be independent enough.
It is important as the designer of an event-driven architecture to refrain from designing Events “to go to the X service”. This can be a sign that the Event is actually a passive-aggressive Command – it suggests that the sender knows about the recipient, and desires a particular response. Events should by their nature be multicast, with any number of interested parties subscribing. The publisher should be designed to be agnostic of this.
A nice way to implement Events is as a stream – this way the stream can be read by any number of consumers, who each maintain their own pointer. The great thing about this is that new services can be spun up, and simply run through the stream from the start to catch up, and events can be replayed by any combination of services by rewinding their pointer. This also means that the publisher is completely independent from subscribers, and no subscription records are needed, as the publisher simply publishes to 1 feed, and multiple consumers read from it.
The more traditional model for implementing events is via queues and bus/broker systems. One way this can work is that subscribers send a “Subscribe” message, which contains a destination to publish future messages to, which the publisher keeps a record of. Another way is via broker systems that support topics, where the broker takes care of managing delivery. These approaches work perfectly well, but once messages are consumed once, they cannot be easily replayed.