Abstract digital artwork featuring smooth, overlapping curved shapes in shades of green and blue on a black background.

How Barclays uses Apache Kafka in production

Table of contents

Factor House
May 11th, 2026
xx min read

Most banks that adopt Apache Kafka keep it in the cloud or on commodity Linux hardware. Barclays does something less common: the bank runs separate Confluent Kafka deployments on Amazon EKS and on IBM Z-Linux, bringing the streaming layer into direct contact with mainframe workloads rather than bridging them through a connector from outside. That architectural choice sits alongside a Solace PubSub+ messaging layer that has been in place since 2009, creating a layered messaging estate that reflects the complexity of a global bank operating across front, middle, and back office.

Barclays is listed on the official Apache Kafka Powered By page with the description "Barclays utilizes Kafka for streaming and analytical information." That is the bank's only public characterisation of its Kafka usage. The rest of what is known comes from official job postings, vendor case studies, and industry award citations, all of which point to Confluent Kafka as the chosen distribution and to an engineering model built around SRE principles and Infrastructure as Code.

Company overview

Barclays is a British multinational bank headquartered in London, operating across retail banking, corporate banking, and investment banking. The bank serves tens of millions of customers and processes millions of financial transactions daily. Its investment banking arm, Barclays Investment Bank, operates across equities, fixed income, foreign exchange, and commodities.

In 2025, Barclays won the Databricks Data Intelligence Financial Services Industry Award for adopting "a scalable and unified data platform that supports global trade analytics, complex data warehousing use cases and real-time insights," with the platform delivering "real-time streaming capabilities, native machine learning, generative AI capabilities and comprehensive governance."

Key milestones in the bank's messaging and streaming history:

  • 2009: Barclays deploys Solace PubSub+ as its enterprise-wide high-speed messaging platform, integrating applications across front, middle, and back office.
  • 2025: Barclays wins the Databricks Data Intelligence Financial Services Industry Award for its unified data platform with real-time streaming capabilities.
  • April 2026: Barclays posts simultaneous open roles for Confluent Kafka engineers across three distinct specialisms: cluster management, AWS deployment, and IBM Z-Linux deployment.

Barclays' Kafka use cases

Barclays' public description of its Kafka usage is deliberately broad: streaming and analytical information. The bank has not published engineering blog posts that map specific products or teams to Kafka pipelines, so the breakdown here is limited to what can be inferred from official sources.

The job postings describe engineering teams responsible for "event-driven architectures (EDA) using industry-standard messaging and streaming platforms," with ownership of "Kafka-centric systems" framed as a platform engineering concern rather than a product-specific one. The phrasing suggests Kafka is positioned as shared infrastructure serving multiple consuming teams, with a central platform team responsible for cluster operations, capacity planning, and governance.

In Barclays' post-trade technology estate, a microservices-based settlement platform handles cash settlement processing. A Camunda case study attributed to Shakir Ahmed (Director of Operations Technology and Strategy) and Larisa Kvetnoy (MD Post Trade Technology) describes this system as using a message bus for distributed microservices communication, targeting 500,000 or more daily settlement processes and handling T+1 regulatory settlement requirements. The case study does not explicitly name Kafka as the message bus; it is included here as context for the broader event-driven architecture at the bank.

Barclays' Kafka architecture

The most distinctive aspect of Barclays' Kafka setup is the split between two separate deployment environments, each with its own engineering specialism.

AWS cluster

Barclays runs Confluent Kafka on Amazon EKS. The cluster topology supports multi-region AWS deployments in both active-active and active-passive configurations. Engineers are required to design "multi-region AWS architectures, including active-active and active-passive deployments, with clear failover, disaster recovery, and data consistency strategies." Kafka workloads run as containerised applications on Kubernetes, using StatefulSets, persistent volumes, services, and ingress controllers.

The active-active configuration is the more demanding of the two patterns: both regional clusters serve traffic simultaneously, which requires consistent offset management across regions, careful handling of replication lag, and a defined approach to data consistency for consumers that may be reading from either cluster. Barclays' job requirements call for explicit expertise in this area. The specific replication mechanism (MirrorMaker 2, Confluent Replicator, or an alternative) is not named in available sources.

IBM Z-Linux cluster

A separate Confluent Kafka deployment runs on IBM Z-Linux. This is a less common pattern. The typical approach for mainframe-to-Kafka integration is to run Kafka on separate infrastructure and bridge the gap using Kafka Connect with an IBM MQ connector, IBM Data Gate for Confluent, or IBM's Open Enterprise SDK for Apache Kafka. Barclays instead co-locates Confluent Kafka brokers on the IBM Z platform itself, running on the Linux partition of the mainframe.

The rationale for this approach is not documented in public sources, but the engineering case is straightforward: applications running natively on IBM Z can produce and consume Kafka events without traversing a network boundary to reach a separate Kafka cluster. For latency-sensitive workloads on mainframe systems, removing that hop is meaningful. The Z-Linux role requires understanding of "hybrid platform interoperability within regulated enterprise environments" and compliance with Barclays' UK standards.

Coexisting messaging layer

Solace PubSub+ has been Barclays' enterprise-wide high-speed messaging platform since 2009. The bank consolidated its messaging needs onto the Solace platform to integrate applications across front, middle, and back office, simplifying infrastructure and reducing licensing, datacenter, development, and support costs. The relationship between the Solace layer and the Confluent Kafka deployments is not described in any available source. Both appear to be active parts of Barclays' messaging estate.

Producer and consumer architecture

Barclays' Kafka engineering standards cover both sides of the producer-consumer boundary. On the producer side, engineers are expected to understand event ordering, partitioning strategy, replay capability, and exactly-once vs at-least-once semantics. On the consumer side, offset management and schema evolution are listed as required competencies. Confluent Schema Registry is part of the deployment, with schema evolution identified as an area requiring hands-on expertise. The schema encoding format (Avro, Protobuf, or JSON Schema) is not specified in available sources.

Kafka Connect ecosystem

Kafka Connect is in use across the Confluent Kafka deployments. The specific source and sink connectors in production are not named in any public source.

Special techniques and engineering innovations

Confluent Kafka on IBM Z-Linux. Running Kafka brokers natively on IBM Z-Linux is not a common deployment pattern. It requires a build of the JVM and Confluent runtime that targets the s390x architecture, and it places the streaming layer inside the mainframe's security and compliance boundary rather than outside it. For a regulated institution like Barclays, keeping data within a known, auditable infrastructure footprint can simplify governance. The approach also avoids the latency introduced by external connectors.

KRaft mode. The IBM Z-Linux job posting references both ZooKeeper and KRaft mode, indicating Barclays is running or actively evaluating Confluent Kafka in KRaft (ZooKeeper-free) mode. KRaft became generally available in Kafka 3.3 and removes the external ZooKeeper dependency, simplifying cluster operations and reducing the number of processes to monitor and manage. Whether both the AWS and Z-Linux clusters are in KRaft mode is not stated in available sources.

Multi-region active-active on AWS. An active-active multi-region Kafka topology is more operationally complex than an active-passive warm-standby. Barclays' requirements for this pattern specify explicit expertise in "failover, disaster recovery, and data consistency strategies," suggesting the architecture is designed to stay up and serving traffic in the event of a regional AWS outage, not merely to recover to a secondary region after a failover event.

Operating Kafka at scale

Deployment model. The AWS cluster is deployed on Amazon EKS (self-managed Confluent Kafka on Kubernetes). The IBM Z-Linux cluster runs on IBM Z hardware, managed separately. Both deployments use Confluent Kafka rather than open-source Apache Kafka.

Infrastructure as Code. Cluster provisioning uses Terraform and AWS CloudFormation. The job posting describes this as enabling "repeatable, auditable, and scalable cloud provisioning." Ansible handles configuration management and automation across the streaming platforms.

CI/CD and DevSecOps. GitLab is the source control and pipeline platform. Engineers apply DevSecOps practices, integrating security controls into the CI/CD pipeline rather than treating them as a separate gate.

SRE practices. Barclays applies "BUK Service First and SRE principles" to its Kafka platform. Engineers are accountable for platform stability, performance tuning, capacity planning, and incident response. DORA metrics, covering Deployment Frequency, Lead Time for Change, Change Failure Rate, and Mean Time to Recovery, are the stated mechanism for tracking delivery and operational health.

Testing. The engineering standard for Kafka services includes contract testing with PACT, unit testing with JUnit, integration and performance testing with JMeter, and mutation testing. Contract testing between producers and consumers is worth noting specifically: it catches schema or payload incompatibilities before deployment rather than at runtime, which reduces the risk of consumer failures caused by upstream producer changes.

Monitoring and alerting. The specific monitoring stack (metrics exporters, dashboards, alerting tools) is not named in any available source. The SRE accountability model implies alert-based on-call rotation, but the tooling is not described publicly.

Full tech stack

Category Tools Notes
Message broker Confluent Kafka Two separate deployments: AWS/EKS and IBM Z-Linux
Enterprise messaging Solace PubSub+ Enterprise-wide high-speed messaging for front, middle, and back office; in place since 2009
Schema registry Confluent Schema Registry Schema evolution required; encoding format not specified in public sources
Connectors Kafka Connect Specific source and sink connectors not named publicly
Container orchestration Amazon EKS (Kubernetes) StatefulSets, persistent volumes, ingress controllers; used for AWS Kafka deployment
Infrastructure as Code Terraform, AWS CloudFormation Repeatable, auditable cloud provisioning
Configuration management Ansible Automation across cloud and streaming platforms
CI/CD GitLab Source control and DevSecOps pipelines
Containerisation Docker Local development environment
Workflow orchestration Camunda Platform, Camunda 8 Post-trade settlement processing and T+1 regulatory compliance
Data platform Databricks Unified platform for trade analytics, warehousing, ML, and generative AI
Testing PACT, JUnit, JMeter Contract, unit, and performance testing for Kafka services; mutation testing also applied
Mainframe platform IBM Z-Linux (zLinux) Host environment for IBM Z-resident Confluent Kafka deployment

Key contributors

Name Title Contribution
Shakir Ahmed Director of Operations Technology and Strategy Named in the Camunda case study discussing Barclays' microservices-based post-trade settlement platform and T+1 compliance programme
Larisa Kvetnoy MD Post Trade Technology Named alongside Shakir Ahmed in the Camunda case study; responsible for Barclays' post-trade technology estate

Key takeaways for your own Kafka implementation

Running Kafka where your data lives reduces integration complexity. Barclays' decision to deploy Confluent Kafka on IBM Z-Linux rather than bridging the mainframe through an external connector keeps the streaming layer inside the same infrastructure boundary as the workloads it serves. If you have latency-sensitive applications on a mainframe or another non-commodity platform, consider whether bringing the broker closer is viable before adding a connector hop.

Multi-region active-active is a design commitment, not just a configuration choice. Barclays' AWS deployment explicitly covers active-active and active-passive patterns as separate engineering concerns. An active-active topology requires upfront decisions about consumer group management, replication, and data consistency that cannot be retrofitted easily. If you need this level of resilience, design for it from the start rather than treating it as a later scaling step.

Treat Kafka platform operations as an SRE discipline. Barclays applies formal SRE practices to its Kafka clusters, including DORA metrics, capacity planning accountabilities, and incident management runbooks. Kafka is not a fire-and-forget system at scale; defining the same operational standards you apply to application services is a practical way to maintain reliability as the platform grows.

Contract testing between producers and consumers is worth the overhead. Including PACT-based contract testing in the Kafka engineering standard catches schema incompatibilities before deployment, which matters more as the number of independent producer and consumer teams grows. The cost of running contract tests in CI is lower than the cost of debugging a consumer failure caused by an upstream schema change in production.

Multiple messaging layers can coexist if their roles are distinct. Barclays runs both Solace PubSub+ and Confluent Kafka as active parts of its messaging estate. That is not a transitional state; PubSub+ has been in place since 2009. If you are evaluating whether to consolidate onto a single messaging technology, Barclays' architecture suggests that retaining a low-latency broker for specific front-office use cases alongside a general-purpose streaming platform may be a deliberate long-term choice rather than a migration that never completed.

Sources and further reading

Primary sources:

Try Kpow for your Kafka clusters:

If you manage Confluent Kafka clusters and want deeper visibility into consumer lag, topic throughput, and schema registry state, Kpow connects to any Kafka cluster in minutes. You can trial it free for 30 days and deploy via Docker, Helm, or JAR.