In this post we’re looking at idempotency – the ability of a system to handle sets of related messages irrespective of the sequence in which they arrive and to ignore duplicate messages. Idempotency is important because it allows systems to operate at speed and cater for the high message volumes processed by post-trade systems.
In many of the conversations we have with clients, whether we’re talking from a technology or a deployment perspective, a key concern is: What happens if there’s a failure or a break? How are transactions recovered? In particular, clients are keen to know whether in that recovery process we can ensure that there are no duplicates or omitted messages around the post-trade systems operation.
So how do we handle that issue? The business context concerns high value transactions being processed through the post-trade systems environment, so data integrity is paramount. Unreliable data values create fundamental problems. So two main questions arise. The first is: How do you ensure that the data being consumed is not corrupted or in an incorrect or unusable form? The second is: How do you build an infrastructure that is scalable yet maintains the highest degree of data integrity?
Breaking the transaction down to ‘state changes’
It is helpful to view an end-to-end transaction consisting of many stages as a sequence of events which result in “state changes”. The full range of state changes which trigger events are pre-defined in the workflow. One example of an event resulting in a state change might be a transaction has been proposed for removal from a netting set. A subsequent state change may be that the proposal has been accepted by the counterparty and the netting set has been amended. State changes take place at various stages on the path from the point of origination of the transaction to its point of completion. The main sequential path of the transaction process can also ‘fork’ into one or more sub-processes before rejoining the main transaction path. For instance, netting – itself made up of a number of stages – might form a sub-process of the main transaction process.
Message pipes
The architecture that we have designed at Baton Systems ensures that events enter the system through discrete and different pipes. Maintaining separate pipes increases speed and efficiency, reduces risk and provides a clearer picture of data flows to allow for scalability and issue resolution. We deploy a mix of the two leading pipe platforms MQ (message-queue) and Kafka. Pipes transmit various message types including trade matching which could be defined as event 1 type or E1, with payment events defined as E2s, etc. The pipes are configured to handle very high throughputs and peaking – you don’t want any breaks in the message flow.
Guaranteed delivery vs. Total Order Broadcast
One of the ways in which pipes can be configured is for ‘guaranteed delivery’. This means that the complete set of events are guaranteed to be delivered at least once. However, the order or sequence in which they arrive may change, in the event of failures and retries. So for instance, event E1, as above, may arrive after E2. With guaranteed delivery, duplication of messages is also possible. The implications of this are that the consumer database which is configured to accept messages on the basis of guaranteed delivery must be able to understand the non-sequential events and de-dupe the data by means of matching event hashes.
The alternative approach is sometimes known as a ‘total order broadcast’ where the order of messages is preserved. Naturally, if the order is preserved it is easier to process on arrival but this approach results in the message flow being ‘throttled’, slowing the arrival of messages very significantly. So the trade off of configuring the system so that the data arrives in sequence is speed reduction.
The power of idempotent systems
This is where the construct of idempotency comes in. Idempotent systems can comprehend a related set of messages e.g. all those that relate to a single transaction even if they are sent multiple times ie. as duplicates.
A key aspect of idempotent systems is recording the metadata about messages. This, together with the workflow, allows the consumer system to comprehend the data set for an entire transaction. So each state change that results from an event such as E1 is recorded independently in the data store and grouped under the overarching transaction of T1. The system also records when a given event has been seen. This allows the system to recognise and filter out subsequent duplicates.
We can summarise the key benefits of an idempotent system using MQ pipes as having a fast throughflow and being able to process messages received on a ‘guaranteed delivery’ – a non-sequential basis including duplicates. Idempotent systems tend to be very efficient at handling massive message volumes while maintaining data integrity. To give you an idea of the volumes, our record number of trades per second was 10K.
During the transaction process there may be a disturbance to the data flow resulting in a break due to a failure by the producer, the consumer or the pipe itself. Idempotent systems recover well from these breaks – they’re inherently designed to be resilient to disruption of the message order and replaying of events. Banks depending upon old tech which is not idempotency, are likely to encounter issues arising from frequent delays in messaging.
Why design for idempotency?
When we’re designing our high-throughput systems, the scalability of message pipes is a fundamental consideration. If you think about designing a system handling transactions with multiple stages from first principles, it’s important to differentiate the data behind single events – effectively ledger entries – from the collected data comprising the entire lineage of a transaction eg. a PvP or a margin movement.
This approach to system design – a set of ledger entries forming an entire lineage governed by a smart workflow – is based upon the Domain Model. The approach supports extensibility – for instance, parties can collaborate to incorporate customised contract terms into the workflow. It also supports flexibility. If, for example, an event producer such as a CCP – a Central Counterparty Clearing House – adds data fields to a form which are irrelevant to the transaction, the consumer system merely ignores this new data and continues to consume required data following the workflow.
I hope this has provided a clear understanding of how idempotent systems are delivering the fast and robust technology required for the new generation of post-trade systems.