Abstract digital artwork featuring smooth, overlapping curved shapes in shades of green and blue on a black background.

How Adidas uses Apache Kafka in production

Table of contents

Factor House
May 23rd, 2026
xx min read

Adidas runs Apache Kafka at the centre of two distinct platform layers: a business activity monitoring pipeline introduced around 2015, and an internal observability platform called HOLMES that scaled to 100 billion messages per day during the shift to digital commerce in 2020. The engineering decisions behind those platforms — custom GoLang Kafka tooling to avoid JVM overhead, ksqlDB user-defined functions for field-level data masking, and an AsyncAPI-driven GitOps pipeline for self-service topic provisioning — are documented in detail across the company's engineering blog and Confluent conference talks.

Company overview

Adidas is a global sportswear manufacturer and retailer operating e-commerce, retail, and supply chain systems across multiple regions. Its platform engineering team maintains the streaming infrastructure that underpins real-time business monitoring, observability, and data integration across those systems.

Kafka adoption at Adidas began around 2015 with the Business Activity Monitoring 2.0 project. By mid-2018 that platform was ingesting events from 29 source systems across 74 topics at 6 million messages per day. The more significant inflection came in 2020, when Adidas's shift toward direct digital commerce drove the HOLMES observability platform to 100 billion messages per day — a roughly 16,000-fold increase in two years that required architectural decisions well beyond standard Kafka configurations.

The most recent publicly documented development, presented at Current London 2025, is a self-service Kafka platform that replaced manual central-team provisioning with a GitOps pipeline backed by a custom domain-specific language and AsyncAPI specifications, cutting provisioning time from days to seconds.

Key Kafka milestones:

  • ~2015: Business Activity Monitoring 2.0 initiated; Apache Kafka selected as the streaming backbone
  • April 2018: Adidas attends Kafka Summit London alongside Apple, Audi, IBM, BBC, and ING
  • June 2018: Iñaki Alzorriz publishes "From Monitoring to Data Streaming" on adidoescode; platform at 74 topics, 29 sources, 6 million messages per day
  • 2020: HOLMES observability platform scales to 100 billion messages per day during Adidas's digital commerce expansion
  • May 2021: Jose Manuel Cristobal presents "Navigating the Observability Storm with Kafka" at Kafka Summit Europe 2021
  • July 2021: Adil Houmadi publishes article on extending ksqlDB with custom UDFs for regional data masking
  • February 2023: Gabriel Barreras documents Kafka Connect event sourcing pitfalls and solutions on adidoescode
  • 2025: Jose Manuel Cristobal and Guillermo Lagunas present the self-service Kafka platform at Current London 2025

Adidas's Kafka use cases

Business activity monitoring

Kafka serves as the event transport layer for Adidas's Business Activity Monitoring 2.0 platform, carrying events from 29 source systems into a complex event processing engine for real-time analytics and reporting. The platform implements a fan-out pub-sub pattern for data extraction, stateful event processing, and data replication across internal teams. Iñaki Alzorriz described this architecture in a June 2018 post on the adidoescode Medium publication.

Observability and SRE (HOLMES)

HOLMES is Adidas's internal observability system. Kafka is the streaming backbone that ingests all infrastructure logs and metrics from Kubernetes-based services, enabling problem detection, root cause analysis, and predictive alerting across e-commerce systems. At peak adoption in 2020, the platform processed 100 billion messages per day. Jose Manuel Cristobal covered the full architecture in a talk at Kafka Summit Europe 2021.

Event sourcing via Kafka Connect

Adidas uses Kafka Connect with a JDBC Source Connector to capture inserts and updates from Oracle 19c (deployed on AWS RDS) and transform them into Kafka events for downstream consumers. Gabriel Barreras documented the implementation and its pitfalls in a February 2023 adidoescode post.

Regional data streaming with ksqlDB

ksqlDB handles the splitting of a main Kafka topic into regional sub-topics, filtering data by region and masking sensitive fields using a custom user-defined function. Adil Houmadi described this pattern in a July 2021 adidoescode post.

Self-service Kafka platform

The current state of the platform, presented at Current London 2025, enables Kafka stakeholders to directly manage topics, permissions, schemas, and connectors without involving the central platform team. A GitOps pipeline backed by a custom DSL and AsyncAPI specifications handles all asset provisioning. Guillermo Lagunas and Jose Manuel Cristobal presented the details in "From Days to Seconds: Adidas' Journey to Scalable Kafka Self-Service" at Current London 2025.

Scale and throughput

Adidas's Kafka architecture

HOLMES observability platform

HOLMES is structured around four layers, deployed on Kubernetes in a single region across 3 availability zones.

The ingestion layer collects Prometheus metrics per Kubernetes namespace using a Kubernetes Operator deployed with Helm Charts. A custom GoLang application called Prom2Kafka uses the Prometheus Remote Write protocol with protobuf data models to push those metrics into Kafka. Log collection is handled by Fluentd, which is auto-deployed to Kubernetes namespaces using annotations, requiring no manual configuration per service.

The streaming layer is Apache Kafka, deployed open-source with SSL and mutual TLS for all client connections. The cluster is multitenant, with ACLs and connection quotas enforcing isolation between tenants.

The storage layer uses two custom GoLang tools: KafkaToPromMetrics consumes from Kafka and writes metrics into Victoria Metrics tenants; Filebeat and Logstash handle log forwarding with suppression capabilities into OpenDistro (an Apache 2.0 build of Elasticsearch). Kafka Streams suppressors filter non-compliant logs and high-rate metrics before they reach storage, controlling ingest volume at 100 billion messages per day.

The consumption layer provides Grafana for metrics dashboarding and alerting, and Kibana for log analysis with multi-tenancy support.

Self-service Kafka platform

The current platform is described as vendor-agnostic and non-opinionated, meaning it does not depend on Confluent-specific managed services. Teams manage topics, permissions, schemas, and connectors through a GitOps pipeline using a custom DSL specification. AsyncAPI specifications serve as both the documentation standard and the authoritative source of truth for resource provisioning. The data catalogue is built on top of these AsyncAPI specs, allowing teams to discover available topics, message schemas, and ownership metadata without consulting the platform team.

Producer architecture

Metrics are ingested via Prom2Kafka using the Prometheus Remote Write protocol and protobuf serialisation. For application event pipelines, Avro is the documented serialisation format with Schema Registry enforcement. Spring Boot (Java) is used in Kafka consumer and producer services for application-layer event streaming.

Consumer architecture

For HOLMES, KafkaToPromMetrics is a GoLang Kafka consumer writing metrics from Kafka into Victoria Metrics. The GoLang client was chosen specifically to avoid the JVM overhead that would be significant at 100 billion messages per day. ksqlDB consumers handle regional topic splitting with a single persistent query.

Stream processing

Kafka Streams is used for stateful stream processing in HOLMES, specifically for suppressing high-rate metrics and filtering non-compliant log events before they reach the storage layer. ksqlDB handles SQL-based stream transformations, including region filtering and field-level hashing via custom UDFs.

Kafka Connect ecosystem

Kafka Connect with a JDBC Source Connector pulls inserts and updates from Oracle 19c on AWS RDS. The source connector polls the database on a configurable interval using timestamp-based watermarking.

Special techniques and engineering innovations

Custom GoLang Kafka tooling to avoid JVM overhead

At 100 billion messages per day, Adidas chose to build two bespoke GoLang applications rather than use standard JVM-based Kafka clients. Prom2Kafka handles ingestion from Prometheus into Kafka via the Remote Write protocol and protobuf, and KafkaToPromMetrics consumes from Kafka and writes into Victoria Metrics. Both were written specifically to avoid the memory footprint and latency characteristics of JVM-based clients at that throughput. Jose Manuel Cristobal documented this decision in the Kafka Summit Europe 2021 slide deck.

ksqlDB user-defined functions for field-level data masking

Rather than building a separate pipeline stage for PII handling, Adidas extended ksqlDB with a custom Java UDF that applies SHA-256 hashing to sensitive fields. A single persistent query filters events by region and applies the UDF to produce separate regional topics with masked data. Adil Houmadi documented the implementation, including the Java class structure and deployment steps, in the July 2021 adidoescode post.

AsyncAPI as infrastructure config, not just documentation

The self-service platform uses AsyncAPI specifications as the source of truth for provisioning Kafka resources including topics, schemas, connectors, and ACLs. This goes beyond the typical documentation use case for AsyncAPI: the specifications are consumed directly by a GitOps pipeline that provisions resources on commit. The result is that teams create and modify Kafka infrastructure through pull requests against AsyncAPI specs rather than through tickets to a central team. Guillermo Lagunas and Jose Manuel Cristobal presented this architecture at Current London 2025.

Kafka Streams suppressors for ingest cost control

HOLMES uses Kafka Streams suppressors to filter events before they reach Victoria Metrics and OpenDistro. Non-compliant logs and metrics producing at unusually high rates are suppressed at the streaming layer rather than at the storage layer, which limits the storage write volume at 100 billion messages per day without requiring changes to the producing services.

Operating Kafka at scale

Deployment model: Self-managed open-source Kafka on Kubernetes, deployed in a single region across 3 availability zones. The HOLMES cluster uses a multitenant configuration with SSL and mTLS for all client connections. Adidas does not document use of a managed Kafka service for these workloads.

Security and multi-tenancy: All Kafka clusters enforce authentication and encryption via SSL and mutual TLS. Quotas and ACLs provide isolation between tenants within multitenant clusters. The Adidas API Guidelines document TLS requirements and ACL patterns as platform standards for all Kafka topics.

GitOps-driven lifecycle management: Topics, permissions, schemas, and connectors are managed declaratively through a GitOps pipeline backed by AsyncAPI specifications. Provisioning changes are introduced as pull requests; the pipeline applies them automatically on merge. This eliminates the need for the central platform team to manually provision resources and makes the state of Kafka infrastructure version-controlled and auditable.

SLA tracking and adoption KPIs: After moving to the self-service model, Adidas tracks resolution time SLAs and adoption KPIs to measure operational improvement. The published outcome is a reduction from days to seconds for provisioning tasks, which the team uses as an ongoing measure of platform health.

Prometheus and Helm Chart deployment: The Prometheus collection layer in HOLMES is deployed via Kubernetes Operator with Helm Charts. Fluentd is deployed automatically using Kubernetes namespace annotations, keeping the observability ingest layer fully infrastructure-as-code without requiring per-service configuration.

Data catalogue via AsyncAPI: Adidas's Platform and Engineering team maintains a data catalogue built on AsyncAPI specifications so teams can discover available topics, message schemas, field definitions, headers, and ownership metadata. This replaces informal knowledge of what topics exist and who owns them with a structured, searchable registry backed by the same specs that provision the infrastructure.

Challenges and how they solved them

Central provisioning became a bottleneck as adoption grew

As Kafka adoption scaled across Adidas, the central platform team was responsible for manually creating topics, assigning permissions, registering schemas, and configuring connectors for every team that needed access. The process took days per request and was a source of delays and configuration errors.

Adidas built a vendor-agnostic self-service platform using a custom DSL and AsyncAPI specifications as the provisioning layer, fed through a GitOps pipeline. Individual teams now manage their own Kafka resources through pull requests. Provisioning time dropped from days to seconds. Guillermo Lagunas and Jose Manuel Cristobal presented the before and after at Current London 2025.

Race condition in Kafka Connect JDBC Source Connector causing silent data loss

When concurrent database transactions overlapped with the JDBC Source Connector's query window, records whose commit timestamps fell between the connector's query executions were silently skipped. The connector's watermark advanced past those records, and they were never re-ingested.

The immediate fix was tuning timestamp.delay.interval.ms to introduce a buffer period, ensuring that pending transactions complete before the next poll cycle advances the watermark. Gabriel Barreras documented the root cause and the configuration fix in the February 2023 adidoescode post, and also noted that for use cases requiring consistent low-latency change data capture, Debezium or a CDC-native database approach is preferable to JDBC Source Connectors.

Scaling observability ingest to 100 billion messages per day

The shift to digital commerce during 2020 required HOLMES to ingest an order of magnitude more traffic than it was originally designed for, without a proportional increase in infrastructure cost.

Adidas handled this through a combination of architectural choices: GoLang Kafka clients for the ingestion and storage layers to avoid JVM overhead, Kafka Streams suppressors to reduce the volume of events reaching storage, and Victoria Metrics as a cost-effective time-series backend. The result was a platform that reached 100 billion messages per day without requiring a full re-architecture.

Full tech stack

Category Tools Notes
Message broker Apache Kafka Open-source deployment, self-managed on Kubernetes
Stream processing Kafka Streams Stateful processing; used for log and metrics suppression in HOLMES
Stream processing ksqlDB SQL-based transformations; regional topic splitting and PII masking
Kafka client (custom) Prom2Kafka (GoLang) Prometheus Remote Write ingestion into Kafka using protobuf
Kafka client (custom) KafkaToPromMetrics (GoLang) Kafka consumer writing metrics from Kafka into Victoria Metrics
Connectors Kafka Connect (JDBC Source) Database event sourcing from Oracle 19c (AWS RDS)
Schema management Schema Registry Schema management and enforcement for Kafka topics
Serialisation Apache Avro Message serialisation format used with Schema Registry
Log collection Fluentd Auto-deployed in Kubernetes via namespace annotations
Metrics collection Prometheus Per-namespace metrics collection; deployed via Kubernetes Operator and Helm Charts
Metrics storage Victoria Metrics Time-series metrics storage backend; described as fast, cost-effective, scalable
Log forwarding Filebeat + Logstash Lightweight log forwarding with suppression into OpenDistro
Log storage OpenDistro (Elasticsearch) Apache 2.0 Elasticsearch build for log indexing and search
Dashboards and alerting Grafana Metrics dashboarding and alerting on top of Victoria Metrics
Log analysis Kibana Multi-tenant log analysis and visualisation
Orchestration Kubernetes Container orchestration for HOLMES and the data streaming platform
Infrastructure config AsyncAPI Documentation standard and GitOps provisioning DSL for all Kafka resources
Source database Oracle 19c (AWS RDS) Source for Kafka Connect JDBC event sourcing
Application framework Spring Boot (Java) Used in Kafka consumer and producer services

Key contributors

  • Iñaki Alzorriz (Director of Platform Engineering, Adidas): Authored the 2018 adidoescode post describing the Kafka-based data streaming initiative and Business Activity Monitoring 2.0. adidoescode, June 2018
  • Jose Manuel Cristobal (Senior Platform Engineer / Director Platform Engineering, Adidas): Presented "Navigating the Observability Storm with Kafka" at Kafka Summit Europe 2021; co-presented "From Days to Seconds" at Current London 2025. Kafka Summit Europe 2021; Current London 2025
  • Guillermo Lagunas (Platform Engineering, Adidas): Co-presented the self-service Kafka platform at Current London 2025. Current London 2025
  • Gabriel Barreras (Platform Engineering, Adidas): Documented Kafka Connect JDBC Source Connector race conditions and solutions. adidoescode, February 2023
  • Adil Houmadi (Platform Engineering, Adidas): Documented ksqlDB UDF extension for regional data masking. adidoescode, July 2021

Key takeaways for your own Kafka implementation

  • Choose your Kafka client language based on throughput requirements. At 100 billion messages per day, Adidas replaced JVM-based clients with GoLang to reduce memory footprint and latency. If you are running high-throughput observability or telemetry pipelines, the client runtime overhead is worth evaluating early.
  • ksqlDB user-defined functions let you extend SQL-based pipelines without a separate processing stage. Adidas used a custom Java UDF to apply SHA-256 hashing within a persistent ksqlDB query, handling both regional routing and PII masking in a single step. If your stream processing requirements push beyond what built-in ksqlDB functions cover, UDFs are worth considering before adding a separate processing tier.
  • AsyncAPI can serve as infrastructure config, not just documentation. Adidas uses AsyncAPI specifications as the source of truth for a GitOps pipeline that provisions topics, schemas, connectors, and ACLs directly. If you are managing Kafka resources manually or via tickets, treating API specifications as executable configuration is an approach that scales better as the number of topics and teams grows.
  • JDBC Source Connectors require careful watermark configuration for correctness. The timestamp-based polling model in the Kafka Connect JDBC Source Connector can miss records when concurrent transactions span query boundaries. Setting timestamp.delay.interval.ms to buffer the watermark advance is a necessary tuning step; for strict correctness requirements, a CDC-native connector such as Debezium is worth evaluating instead.
  • Suppression at the streaming layer is more cost-effective than suppression at storage. Adidas used Kafka Streams suppressors to filter high-rate metrics and non-compliant logs before they reached Victoria Metrics and Elasticsearch. This limits storage write volume without requiring changes to producing services and avoids paying ingestion costs for data you will filter out anyway.

Sources and further reading

Primary sources

  1. Iñaki Alzorriz, "From Monitoring to Data Streaming — Data Streaming Initiative in Adidas" (June 2018)
  2. Jose Manuel Cristobal, "Navigating the Observability Storm with Kafka" — Kafka Summit Europe 2021
  3. Jose Manuel Cristobal, "Navigating the Observability Storm with Kafka" — slide deck
  4. Guillermo Lagunas and Jose Manuel Cristobal, "From Days to Seconds: Adidas' Journey to Scalable Kafka Self-Service" — Current London 2025
  5. Gabriel Barreras, "Event Sourcing with Kafka Connect: Inconsistency, Pitfalls & Solutions" (February 2023)
  6. Adil Houmadi, "Extending ksqlDB Built-in Capability" (July 2021)
  7. Adidas Platform & Engineering, Adidas API Guidelines — Kafka Asynchronous Guidelines

Try Kpow with your Kafka cluster

If you are monitoring a Kafka cluster at any scale, you can try Kpow free for 30 days. It connects to any Kafka cluster in minutes and deploys via Docker, Helm, or JAR.