
How Adidas uses Apache Kafka in production
Table of contents
Adidas runs Apache Kafka at the centre of two distinct platform layers: a business activity monitoring pipeline introduced around 2015, and an internal observability platform called HOLMES that scaled to 100 billion messages per day during the shift to digital commerce in 2020. The engineering decisions behind those platforms — custom GoLang Kafka tooling to avoid JVM overhead, ksqlDB user-defined functions for field-level data masking, and an AsyncAPI-driven GitOps pipeline for self-service topic provisioning — are documented in detail across the company's engineering blog and Confluent conference talks.
Company overview
Adidas is a global sportswear manufacturer and retailer operating e-commerce, retail, and supply chain systems across multiple regions. Its platform engineering team maintains the streaming infrastructure that underpins real-time business monitoring, observability, and data integration across those systems.
Kafka adoption at Adidas began around 2015 with the Business Activity Monitoring 2.0 project. By mid-2018 that platform was ingesting events from 29 source systems across 74 topics at 6 million messages per day. The more significant inflection came in 2020, when Adidas's shift toward direct digital commerce drove the HOLMES observability platform to 100 billion messages per day — a roughly 16,000-fold increase in two years that required architectural decisions well beyond standard Kafka configurations.
The most recent publicly documented development, presented at Current London 2025, is a self-service Kafka platform that replaced manual central-team provisioning with a GitOps pipeline backed by a custom domain-specific language and AsyncAPI specifications, cutting provisioning time from days to seconds.
Key Kafka milestones:
- ~2015: Business Activity Monitoring 2.0 initiated; Apache Kafka selected as the streaming backbone
- April 2018: Adidas attends Kafka Summit London alongside Apple, Audi, IBM, BBC, and ING
- June 2018: Iñaki Alzorriz publishes "From Monitoring to Data Streaming" on adidoescode; platform at 74 topics, 29 sources, 6 million messages per day
- 2020: HOLMES observability platform scales to 100 billion messages per day during Adidas's digital commerce expansion
- May 2021: Jose Manuel Cristobal presents "Navigating the Observability Storm with Kafka" at Kafka Summit Europe 2021
- July 2021: Adil Houmadi publishes article on extending ksqlDB with custom UDFs for regional data masking
- February 2023: Gabriel Barreras documents Kafka Connect event sourcing pitfalls and solutions on adidoescode
- 2025: Jose Manuel Cristobal and Guillermo Lagunas present the self-service Kafka platform at Current London 2025
Adidas's Kafka use cases
Business activity monitoring
Kafka serves as the event transport layer for Adidas's Business Activity Monitoring 2.0 platform, carrying events from 29 source systems into a complex event processing engine for real-time analytics and reporting. The platform implements a fan-out pub-sub pattern for data extraction, stateful event processing, and data replication across internal teams. Iñaki Alzorriz described this architecture in a June 2018 post on the adidoescode Medium publication.
Observability and SRE (HOLMES)
HOLMES is Adidas's internal observability system. Kafka is the streaming backbone that ingests all infrastructure logs and metrics from Kubernetes-based services, enabling problem detection, root cause analysis, and predictive alerting across e-commerce systems. At peak adoption in 2020, the platform processed 100 billion messages per day. Jose Manuel Cristobal covered the full architecture in a talk at Kafka Summit Europe 2021.
Event sourcing via Kafka Connect
Adidas uses Kafka Connect with a JDBC Source Connector to capture inserts and updates from Oracle 19c (deployed on AWS RDS) and transform them into Kafka events for downstream consumers. Gabriel Barreras documented the implementation and its pitfalls in a February 2023 adidoescode post.
Regional data streaming with ksqlDB
ksqlDB handles the splitting of a main Kafka topic into regional sub-topics, filtering data by region and masking sensitive fields using a custom user-defined function. Adil Houmadi described this pattern in a July 2021 adidoescode post.
Self-service Kafka platform
The current state of the platform, presented at Current London 2025, enables Kafka stakeholders to directly manage topics, permissions, schemas, and connectors without involving the central platform team. A GitOps pipeline backed by a custom DSL and AsyncAPI specifications handles all asset provisioning. Guillermo Lagunas and Jose Manuel Cristobal presented the details in "From Days to Seconds: Adidas' Journey to Scalable Kafka Self-Service" at Current London 2025.
Scale and throughput
- 100 billion messages per day processed by HOLMES at peak adoption in 2020 (Jose Manuel Cristobal, Kafka Summit Europe 2021)
- 6 million messages per day, 74 topics, 29 source systems at the BAM 2.0 pilot stage in mid-2018 (Iñaki Alzorriz, 2018)
- Single region, 3 availability zones for the HOLMES Kafka deployment, with a multitenant configuration serving 2 tenants at the time of the 2021 talk (Jose Manuel Cristobal, 2021)
Adidas's Kafka architecture
HOLMES observability platform
HOLMES is structured around four layers, deployed on Kubernetes in a single region across 3 availability zones.
The ingestion layer collects Prometheus metrics per Kubernetes namespace using a Kubernetes Operator deployed with Helm Charts. A custom GoLang application called Prom2Kafka uses the Prometheus Remote Write protocol with protobuf data models to push those metrics into Kafka. Log collection is handled by Fluentd, which is auto-deployed to Kubernetes namespaces using annotations, requiring no manual configuration per service.
The streaming layer is Apache Kafka, deployed open-source with SSL and mutual TLS for all client connections. The cluster is multitenant, with ACLs and connection quotas enforcing isolation between tenants.
The storage layer uses two custom GoLang tools: KafkaToPromMetrics consumes from Kafka and writes metrics into Victoria Metrics tenants; Filebeat and Logstash handle log forwarding with suppression capabilities into OpenDistro (an Apache 2.0 build of Elasticsearch). Kafka Streams suppressors filter non-compliant logs and high-rate metrics before they reach storage, controlling ingest volume at 100 billion messages per day.
The consumption layer provides Grafana for metrics dashboarding and alerting, and Kibana for log analysis with multi-tenancy support.
Self-service Kafka platform
The current platform is described as vendor-agnostic and non-opinionated, meaning it does not depend on Confluent-specific managed services. Teams manage topics, permissions, schemas, and connectors through a GitOps pipeline using a custom DSL specification. AsyncAPI specifications serve as both the documentation standard and the authoritative source of truth for resource provisioning. The data catalogue is built on top of these AsyncAPI specs, allowing teams to discover available topics, message schemas, and ownership metadata without consulting the platform team.
Producer architecture
Metrics are ingested via Prom2Kafka using the Prometheus Remote Write protocol and protobuf serialisation. For application event pipelines, Avro is the documented serialisation format with Schema Registry enforcement. Spring Boot (Java) is used in Kafka consumer and producer services for application-layer event streaming.
Consumer architecture
For HOLMES, KafkaToPromMetrics is a GoLang Kafka consumer writing metrics from Kafka into Victoria Metrics. The GoLang client was chosen specifically to avoid the JVM overhead that would be significant at 100 billion messages per day. ksqlDB consumers handle regional topic splitting with a single persistent query.
Stream processing
Kafka Streams is used for stateful stream processing in HOLMES, specifically for suppressing high-rate metrics and filtering non-compliant log events before they reach the storage layer. ksqlDB handles SQL-based stream transformations, including region filtering and field-level hashing via custom UDFs.
Kafka Connect ecosystem
Kafka Connect with a JDBC Source Connector pulls inserts and updates from Oracle 19c on AWS RDS. The source connector polls the database on a configurable interval using timestamp-based watermarking.
Special techniques and engineering innovations
Custom GoLang Kafka tooling to avoid JVM overhead
At 100 billion messages per day, Adidas chose to build two bespoke GoLang applications rather than use standard JVM-based Kafka clients. Prom2Kafka handles ingestion from Prometheus into Kafka via the Remote Write protocol and protobuf, and KafkaToPromMetrics consumes from Kafka and writes into Victoria Metrics. Both were written specifically to avoid the memory footprint and latency characteristics of JVM-based clients at that throughput. Jose Manuel Cristobal documented this decision in the Kafka Summit Europe 2021 slide deck.
ksqlDB user-defined functions for field-level data masking
Rather than building a separate pipeline stage for PII handling, Adidas extended ksqlDB with a custom Java UDF that applies SHA-256 hashing to sensitive fields. A single persistent query filters events by region and applies the UDF to produce separate regional topics with masked data. Adil Houmadi documented the implementation, including the Java class structure and deployment steps, in the July 2021 adidoescode post.
AsyncAPI as infrastructure config, not just documentation
The self-service platform uses AsyncAPI specifications as the source of truth for provisioning Kafka resources including topics, schemas, connectors, and ACLs. This goes beyond the typical documentation use case for AsyncAPI: the specifications are consumed directly by a GitOps pipeline that provisions resources on commit. The result is that teams create and modify Kafka infrastructure through pull requests against AsyncAPI specs rather than through tickets to a central team. Guillermo Lagunas and Jose Manuel Cristobal presented this architecture at Current London 2025.
Kafka Streams suppressors for ingest cost control
HOLMES uses Kafka Streams suppressors to filter events before they reach Victoria Metrics and OpenDistro. Non-compliant logs and metrics producing at unusually high rates are suppressed at the streaming layer rather than at the storage layer, which limits the storage write volume at 100 billion messages per day without requiring changes to the producing services.
Operating Kafka at scale
Deployment model: Self-managed open-source Kafka on Kubernetes, deployed in a single region across 3 availability zones. The HOLMES cluster uses a multitenant configuration with SSL and mTLS for all client connections. Adidas does not document use of a managed Kafka service for these workloads.
Security and multi-tenancy: All Kafka clusters enforce authentication and encryption via SSL and mutual TLS. Quotas and ACLs provide isolation between tenants within multitenant clusters. The Adidas API Guidelines document TLS requirements and ACL patterns as platform standards for all Kafka topics.
GitOps-driven lifecycle management: Topics, permissions, schemas, and connectors are managed declaratively through a GitOps pipeline backed by AsyncAPI specifications. Provisioning changes are introduced as pull requests; the pipeline applies them automatically on merge. This eliminates the need for the central platform team to manually provision resources and makes the state of Kafka infrastructure version-controlled and auditable.
SLA tracking and adoption KPIs: After moving to the self-service model, Adidas tracks resolution time SLAs and adoption KPIs to measure operational improvement. The published outcome is a reduction from days to seconds for provisioning tasks, which the team uses as an ongoing measure of platform health.
Prometheus and Helm Chart deployment: The Prometheus collection layer in HOLMES is deployed via Kubernetes Operator with Helm Charts. Fluentd is deployed automatically using Kubernetes namespace annotations, keeping the observability ingest layer fully infrastructure-as-code without requiring per-service configuration.
Data catalogue via AsyncAPI: Adidas's Platform and Engineering team maintains a data catalogue built on AsyncAPI specifications so teams can discover available topics, message schemas, field definitions, headers, and ownership metadata. This replaces informal knowledge of what topics exist and who owns them with a structured, searchable registry backed by the same specs that provision the infrastructure.
Challenges and how they solved them
Central provisioning became a bottleneck as adoption grew
As Kafka adoption scaled across Adidas, the central platform team was responsible for manually creating topics, assigning permissions, registering schemas, and configuring connectors for every team that needed access. The process took days per request and was a source of delays and configuration errors.
Adidas built a vendor-agnostic self-service platform using a custom DSL and AsyncAPI specifications as the provisioning layer, fed through a GitOps pipeline. Individual teams now manage their own Kafka resources through pull requests. Provisioning time dropped from days to seconds. Guillermo Lagunas and Jose Manuel Cristobal presented the before and after at Current London 2025.
Race condition in Kafka Connect JDBC Source Connector causing silent data loss
When concurrent database transactions overlapped with the JDBC Source Connector's query window, records whose commit timestamps fell between the connector's query executions were silently skipped. The connector's watermark advanced past those records, and they were never re-ingested.
The immediate fix was tuning timestamp.delay.interval.ms to introduce a buffer period, ensuring that pending transactions complete before the next poll cycle advances the watermark. Gabriel Barreras documented the root cause and the configuration fix in the February 2023 adidoescode post, and also noted that for use cases requiring consistent low-latency change data capture, Debezium or a CDC-native database approach is preferable to JDBC Source Connectors.
Scaling observability ingest to 100 billion messages per day
The shift to digital commerce during 2020 required HOLMES to ingest an order of magnitude more traffic than it was originally designed for, without a proportional increase in infrastructure cost.
Adidas handled this through a combination of architectural choices: GoLang Kafka clients for the ingestion and storage layers to avoid JVM overhead, Kafka Streams suppressors to reduce the volume of events reaching storage, and Victoria Metrics as a cost-effective time-series backend. The result was a platform that reached 100 billion messages per day without requiring a full re-architecture.
Full tech stack
Key contributors
- Iñaki Alzorriz (Director of Platform Engineering, Adidas): Authored the 2018 adidoescode post describing the Kafka-based data streaming initiative and Business Activity Monitoring 2.0. adidoescode, June 2018
- Jose Manuel Cristobal (Senior Platform Engineer / Director Platform Engineering, Adidas): Presented "Navigating the Observability Storm with Kafka" at Kafka Summit Europe 2021; co-presented "From Days to Seconds" at Current London 2025. Kafka Summit Europe 2021; Current London 2025
- Guillermo Lagunas (Platform Engineering, Adidas): Co-presented the self-service Kafka platform at Current London 2025. Current London 2025
- Gabriel Barreras (Platform Engineering, Adidas): Documented Kafka Connect JDBC Source Connector race conditions and solutions. adidoescode, February 2023
- Adil Houmadi (Platform Engineering, Adidas): Documented ksqlDB UDF extension for regional data masking. adidoescode, July 2021
Key takeaways for your own Kafka implementation
- Choose your Kafka client language based on throughput requirements. At 100 billion messages per day, Adidas replaced JVM-based clients with GoLang to reduce memory footprint and latency. If you are running high-throughput observability or telemetry pipelines, the client runtime overhead is worth evaluating early.
- ksqlDB user-defined functions let you extend SQL-based pipelines without a separate processing stage. Adidas used a custom Java UDF to apply SHA-256 hashing within a persistent ksqlDB query, handling both regional routing and PII masking in a single step. If your stream processing requirements push beyond what built-in ksqlDB functions cover, UDFs are worth considering before adding a separate processing tier.
- AsyncAPI can serve as infrastructure config, not just documentation. Adidas uses AsyncAPI specifications as the source of truth for a GitOps pipeline that provisions topics, schemas, connectors, and ACLs directly. If you are managing Kafka resources manually or via tickets, treating API specifications as executable configuration is an approach that scales better as the number of topics and teams grows.
- JDBC Source Connectors require careful watermark configuration for correctness. The timestamp-based polling model in the Kafka Connect JDBC Source Connector can miss records when concurrent transactions span query boundaries. Setting
timestamp.delay.interval.msto buffer the watermark advance is a necessary tuning step; for strict correctness requirements, a CDC-native connector such as Debezium is worth evaluating instead. - Suppression at the streaming layer is more cost-effective than suppression at storage. Adidas used Kafka Streams suppressors to filter high-rate metrics and non-compliant logs before they reached Victoria Metrics and Elasticsearch. This limits storage write volume without requiring changes to producing services and avoids paying ingestion costs for data you will filter out anyway.
Sources and further reading
Primary sources
- Iñaki Alzorriz, "From Monitoring to Data Streaming — Data Streaming Initiative in Adidas" (June 2018)
- Jose Manuel Cristobal, "Navigating the Observability Storm with Kafka" — Kafka Summit Europe 2021
- Jose Manuel Cristobal, "Navigating the Observability Storm with Kafka" — slide deck
- Guillermo Lagunas and Jose Manuel Cristobal, "From Days to Seconds: Adidas' Journey to Scalable Kafka Self-Service" — Current London 2025
- Gabriel Barreras, "Event Sourcing with Kafka Connect: Inconsistency, Pitfalls & Solutions" (February 2023)
- Adil Houmadi, "Extending ksqlDB Built-in Capability" (July 2021)
- Adidas Platform & Engineering, Adidas API Guidelines — Kafka Asynchronous Guidelines
Try Kpow with your Kafka cluster
If you are monitoring a Kafka cluster at any scale, you can try Kpow free for 30 days. It connects to any Kafka cluster in minutes and deploys via Docker, Helm, or JAR.