Architecting Critical Payment Systems for Multi-Region Resiliency

via aws.amazon.com => original post link

A wide range of critical systems architected to run on AWS Cloud use active-active patterns for high availability (HA) across multiple AWS Availability Zones (AZs) and active-passive patterns for disaster recovery (DR) across multiple AWS Regions. Financial services customers choose to utilize the active-active approach across multiple Regions to achieve near-zero Recovery Time Objective (RTO). As network latency is directly proportional to distance between Regions, maintaining strong data consistency across Regions becomes difficult. Consensus algorithms such as Raft, Paxos and Two-Phase Commit achieve consistency, but do not satisfy performance and latency requirements for the payment businesses.

In this blog post, we explain the design of a critical payment system taking advantage of the ISO 20022 messaging standard and AWS Serverless capabilities. We demonstrate how to achieve exactly-once processing by employing an active-active multi-Region approach with all steps of the same transaction executed within the same Region. We describe the process of cross-Region failure detection combined with a self-healing mechanism by leveraging the ISO 20022 status codes.