
How Shopify uses Apache Kafka in production
Table of contents
Shopify processes 1.75 trillion Apache Kafka messages every month across a fleet of self-managed clusters running on Google Kubernetes Engine. At the centre of that infrastructure is a change data capture pipeline that streams mutations from more than 100 MySQL database shards at a sustained 65,000 records per second, peaking at 100,000 records per second on Black Friday. Kafka is how Shopify moves data between its sharded monolith, its real-time processing layer, and its data warehouse — and has been since 2014, when the company also wrote and open-sourced its own Go Kafka client library because no suitable one existed.
Company overview
Shopify is a commerce platform that lets merchants of any size build and run online and physical retail operations. As of 2020, the platform had processed over $40 billion in total merchant sales across 175 countries, with a baseline of 2 million requests per minute scaling to 10 million at peak. The engineering organization is primarily Ruby on Rails and Go.
Shopify adopted Apache Kafka in 2014 as a central event bus for log aggregation and internal event collection. The initial deployment replaced a plain log-file batch system that introduced latency between event generation and availability in downstream dashboards and Hadoop storage. From 2014 to 2016, clusters were managed across data centre regions using Chef. In 2017, Shopify began migrating to Google Cloud Platform, completing the move to Kubernetes-managed Kafka by 2018. The change data capture system, which would become one of the most technically demanding parts of the Kafka estate, was rebuilt on Kafka Connect and Debezium in 2021.
Key milestones:
- 2013: Shopify open-sources Sarama, a Go client library for Apache Kafka 0.8, built internally to avoid a JVM dependency
- 2014: Kafka adopted as the internal data bus; Ruby on Rails producer pipeline built using SysV message queues and Go Sarama
- 2016: Multi-tenant Kafka clusters managed across data centre regions via Chef
- 2017: Cloud migration begins; flash sales Kafka architecture presented at Kafka Summit San Francisco
- 2018: Full migration to Kubernetes StatefulSets on GCP; Sam Obeid and Christopher Vollick present the work at Kafka Summit London
- 2020: Platform processes 1.75 trillion Kafka messages per month; CDC pipeline peaks at 100,000 records/sec during BFCM
- 2021: Log-based CDC pipeline published; 400TB of CDC data stored in Kafka
- 2022: HybridSource pattern for Flink and Kafka archival documented; BFCM live map pipeline using Flink, Kafka, and SSE published
- 2024: BFCM data push reaches 12TB per minute; Kafka partition scaling identified as critical for analytics data freshness
Shopify's Kafka use cases
Event collection and log aggregation
The original use case from 2014. Kafka serves as the company-wide event bus for collecting events and aggregating log data from all internal systems and data centres. The data flows from producers through regional clusters, mirrors to an aggregate cluster, and lands in the data warehouse formatted as Apache Parquet. This pipeline replaced a batch log-file system that could not provide timely data for internal dashboards.
Change data capture
Shopify's core application is a sharded monolith backed by more than 100 independent MySQL database shards. The CDC pipeline streams binary log mutations from every shard into Kafka, presenting downstream consumers with a single compacted topic per logical table rather than requiring them to subscribe to 100+ shard-specific topics.
Before 2021, this was handled by Longboat, a query-based extraction tool that polled tables periodically. The limitation was that hard deletes were invisible: once a row was removed, Longboat could not capture the deletion. The replacement system uses Kafka Connect and Debezium to read MySQL binary logs directly, capturing every insert, update, and delete with a P99 latency of under 10 seconds from database write to Kafka availability.
Real-time buyer signal pipeline (Shopify Inbox)
Shopify Inbox is the merchant-to-customer messaging product. A real-time pipeline combines two Kafka event types: Monorail events (Shopify's internal structured event abstraction) and CDC events. These feed into Apache Beam jobs running on Google Cloud Dataflow that score customer intent signals and inform the Inbox response suggestion system. At non-peak hours, the pipeline processes tens of thousands of cart and checkout events per second.
BFCM live map
During Black Friday and Cyber Monday, Shopify runs a live map showing real-time global sales. Apache Flink reads from Kafka topics, computes per-region aggregations, and publishes results back to Kafka topics. A Golang SSE server subscribes to those output topics and pushes data to web clients. The 2021 BFCM event ingested 323 billion rows of data with end-to-end visualization latency of 21 seconds and 100% uptime throughout the event.
Elasticsearch multi-DC replication
Kafka is used to replicate Elasticsearch index updates across geographic regions, enabling a transition from active-passive to active-active multi-region search.
Microservice messaging
Ruby on Rails and Go services produce events through Monorail, Shopify's internal event abstraction layer. Monorail enforces schemas, supports schema versioning, and defines explicit producer-consumer contracts on top of raw Kafka topics.
Scale and throughput
MetricValueSource / ContextMonthly Kafka messages1.75 trillionShopify Engineering Blog, December 2020Events per day (all clusters)BillionsShopify Engineering Blog, November 2018CDC average throughput65,000 records/secCDC architecture post, March 2021CDC peak throughput (BFCM 2020)100,000 records/secCDC architecture post, March 2021CDC data stored in Kafka400TB+CDC architecture post, March 2021Debezium connectors~150 across 12 Kubernetes podsCDC architecture post, March 2021Cart/checkout events per second (non-peak)Tens of thousandsShopify Inbox pipeline post, December 2021BFCM 2021 rows ingested323 billionSSE data streaming post, November 2022BFCM 2024 data push12TB per minuteBFCM readiness post, November 2025Producer pipeline throughput (2014)Thousands of events/sec; billions per weekRails producer pipeline post, July 2014
Shopify's Kafka architecture
Cluster topology
Shopify runs multiple regional Kafka clusters, one per GCP region, and a single aggregate cluster. The aggregate cluster receives all data mirrored from the regional clusters via MirrorMaker and serves as the feed for the data warehouse. Clusters use 30-node configurations managed as Kubernetes StatefulSets with Persistent Volumes.
Kafka was previously deployed on VMs managed by Chef. The migration to Kubernetes, completed in 2018, was executed as a three-step process to avoid any downtime: deploy new cloud clusters, mirror the existing on-premises clusters to both aggregate clusters simultaneously, then migrate clients from the data centre to the cloud clusters.
Producer architecture
Rails and Go applications do not connect directly to Kafka brokers. Instead, they write events to a local SysV message queue. A Go-based producer process reads from that queue and produces to Kafka using the Sarama client library. When Shopify began containerising services with Docker, namespace isolation broke the SysV IPC mechanism, so a TCP-to-SysV proxy layer was added on the host to allow containerised services to continue writing to the queue.
This indirection fully decouples application code from Kafka availability: the SysV queue is sized to hold two hours of events, so Kafka cluster maintenance or restarts have zero impact on producers.
CDC pipeline architecture
The CDC pipeline for Shopify's sharded monolith follows a four-stage pattern:
- One Debezium MySQL connector per database shard reads from the MySQL binary log.
- A RegexRouter transform consolidates events from all shards into intermediate Kafka topics.
- A custom Kafka Streams application reads from the intermediate topics and demultiplexes records by table.
- Output: one compacted Kafka topic per logical table, partitioned by primary key.
Consumers subscribe to the per-table output topics and see a unified view of each table regardless of how many shards the underlying data spans. Confluent Schema Registry provides data discovery and dependency tracking across CDC topics.
Stream processing
A large portion of Shopify's Flink applications use Kafka as both source and sink. The BFCM live map is one example: Flink reads from Kafka, computes aggregations, and publishes results back to Kafka. The Streaming Capabilities team also manages HybridSource pipelines, described in the special techniques section below.
Consumer architecture
Kafka topics have per-topic retention policies. Expired data is archived to Google Cloud Storage rather than deleted, enabling Flink applications to backfill from historical data using the HybridSource connector before switching to live Kafka consumption.
Special techniques and engineering innovations
SysV message queue producer decoupling
Shopify's producer pipeline places a SysV message queue between application code and the Kafka producer process. This pattern predates widespread use of producer buffering at the client level and gives Shopify a host-local buffer that survives Kafka cluster disruptions. The queue holds up to two hours of events, making cluster maintenance transparent to producers.
Compacted per-table CDC topics with cross-shard demultiplexing
Each logical database table is represented as a single compacted Kafka topic partitioned by primary key. Compaction keeps only the most recent record per key, giving consumers a current-state view without replaying full history. The complexity of 100+ underlying MySQL shards is absorbed entirely by the CDC pipeline: a RegexRouter transform consolidates shard-specific events into intermediate topics, and a custom Kafka Streams application repartitions records by table into the final output topics.
GCS Serde for large records
CDC events from large database rows can exceed Kafka's 1MB per-message limit. Shopify implemented a custom Serde that detects oversized payloads, stores them in Google Cloud Storage, and writes a GCS pointer to the Kafka topic. Consumers use the same Serde to transparently retrieve the payload from GCS. Standard Kafka consumers remain compatible without modification.
HybridSource for Flink backfill
Kafka topics at Shopify have configured retention limits. When a topic expires data, that data transitions to GCS archives partitioned across thousands of splits. Flink applications that need to backfill historical data use the HybridSource connector to read from the GCS archive first, then seamlessly transition to the live Kafka topic. This removes the need for manual coordination between historical batch jobs and real-time stream processing.
Kubernetes rack-awareness and rolling restart safety
Kafka broker pods use inter-pod anti-affinity rules to prevent co-located replicas, and Kubernetes zone labels are mapped to Kafka rack configuration to ensure replicas are distributed across availability zones. Readiness probes gate rolling restarts, preventing more than one broker from going offline simultaneously during configuration changes or upgrades. Kafka's sensitivity to concurrent broker restarts can trigger terabytes of partition rebalancing, and the pod anti-affinity and readiness probe combination is how Shopify avoids this.
Open-source contribution to Debezium
During initial CDC deployment, full table snapshots held MySQL read locks for hours at a time, blocking writes on production databases. A Shopify engineer implemented a lock-free snapshot mode for Debezium's MySQL connector and contributed it upstream to the open-source project. This mode has since been available to all Debezium users.
Operating Kafka at scale
Deployment model: Self-managed on Google Kubernetes Engine. Brokers run as Kubernetes StatefulSets with dedicated Persistent Volumes. Kafka pods have resource-monitoring sidecars, node affinity rules to place brokers on dedicated nodes, and inter-pod anti-affinity to spread replicas.
Retention and archival: All Kafka topics have per-topic retention policies. Expired data is archived to Google Cloud Storage to support Flink HybridSource backfills rather than being discarded.
BFCM capacity planning (Game Days): Shopify runs annual chaos-engineering exercises called Game Days starting in spring, simulating 150% of the previous year's BFCM peak load across three GCP regions. Game Days have identified Kafka-specific capacity gaps: the analytics infrastructure required partition count increases to maintain data freshness during traffic spikes, and these increases are now part of the pre-BFCM preparation checklist.
CDC connector management: Approximately 150 Debezium connectors are managed across 12 Kubernetes pods, with one connector per MySQL shard to isolate failure domains. Schema evolution for CDC consumers was an active area of work as of 2021, with the team implementing governance through Confluent Schema Registry for dependency tracking.
Cloud migration protocol: The migration from on-premises VMs to GCP Kubernetes was executed in three phases: deploy cloud clusters, run MirrorMaker to replicate from on-premises to both aggregate clusters simultaneously, then migrate clients. No downtime was required during the transition.
Challenges and how they solved them
JVM dependency for the Kafka client Kafka's initial client libraries were Java and Scala only. Deploying a JVM instance on all of Shopify's Go-native servers was considered impractical. In 2013, engineer Evan Huus built Sarama, a Go client library for Apache Kafka, and Shopify open-sourced it under the MIT license. Sarama is now maintained by IBM and remains a widely used Go Kafka client.
Docker namespace isolation breaking SysV IPC When Shopify containerised Ruby on Rails services, Docker's process namespace isolation broke the SysV message queue mechanism used to buffer events for Kafka. The producer process ran on the host, but containerised services could no longer write to the host's IPC namespace. The solution was a TCP-to-SysV proxy layer on the host that accepted messages over a TCP socket and wrote them into the SysV queue. Containerised services connect over TCP; the rest of the pipeline is unchanged.
CDC table snapshot locking Debezium's initial snapshot mode for MySQL connector held a global read lock for the duration of a full table snapshot. For Shopify's large tables, this meant hours of blocked writes on production databases. A Shopify engineer contributed a lock-free snapshot mode to the upstream Debezium project, reading table data without holding a lock. This resolved the blocking but introduced a different constraint: the lock-free mode cannot guarantee a consistent point-in-time snapshot while binlog events are also being consumed. Remaining limitations include tables too large to snapshot within practical timeframes.
Records exceeding the 1MB Kafka message size limit Certain database rows, particularly those with large text or blob fields, produced CDC events that exceeded Kafka's default message size limit. Rather than raising the broker message size limit globally (which affects all topics), Shopify implemented a custom Serde that externalises oversized payloads to GCS and writes a pointer record to Kafka. Consumers using the same Serde retrieve the payload from GCS transparently.
100+ shard complexity for CDC consumers Shopify's monolith spans more than 100 MySQL shards. A naive CDC design would require consumers to subscribe to shard-specific topics for every table they care about. Shopify's solution abstracts the sharding entirely: a RegexRouter transform and a custom Kafka Streams demultiplexer consolidate all shard events into unified per-table output topics. Consumers interact only with the per-table topics.
Schema evolution in CDC topics Breaking changes to internal database schemas propagate to all consumers of the corresponding CDC topics. As of the March 2021 blog post, this was an active challenge with mitigation strategies under development. Confluent Schema Registry is used for dependency tracking.
Kafka partition count insufficient for BFCM analytics throughput Game Days exercises revealed that analytics Kafka topics needed higher partition counts to sustain data freshness at BFCM scale. Partition count increases are now a standard item in the pre-BFCM preparation checklist.
Full tech stack
Key contributors
Key takeaways for your own Kafka implementation
- Decouple producers from brokers at the host level. Shopify's SysV message queue layer means application code never holds a direct connection to Kafka. A two-hour local buffer absorbs cluster maintenance windows without producer-side changes. If your producers are tightly coupled to broker availability, consider an intermediary buffer before the producer client.
- Build a CDC abstraction that hides database sharding from consumers. A one-connector-per-shard approach with a RegexRouter and a Kafka Streams demultiplexer lets you present unified per-table topics to downstream consumers regardless of how many shards exist. This makes the shard count a deployment detail rather than an API contract.
- Compacted topics are a viable current-state store for CDC data. Shopify uses compacted per-table Kafka topics partitioned by primary key to give consumers the latest record per row without requiring them to replay full history. This is a practical alternative to maintaining a separate read database for certain consumer patterns.
- Plan for large records before they cause incidents. If your data includes variable-length text or binary fields, the 1MB Kafka message limit will eventually be reached. Shopify's GCS Serde externalises oversized payloads before they reach the broker, keeping the solution transparent to standard consumers and avoiding broker-level configuration changes that affect every topic.
- Treat BFCM-scale capacity planning as a year-round process. Shopify's Game Days exercises start in spring and run through autumn. The discovery that Kafka partition counts needed to increase for analytics freshness came from load testing, not from production incidents. If your traffic has a predictable peak season, model Kafka throughput requirements explicitly before the peak rather than scaling reactively.
Sources and further reading
- Running Apache Kafka on Kubernetes at Shopify — Sam Obeid, Shopify Engineering Blog, November 2018
- Kafka Producer Pipeline for Ruby on Rails — Simon Eskildsen, Shopify Engineering Blog, July 2014
- Capturing Every Change From Shopify's Sharded Monolith — John Martin and Adam Bellemare, Shopify Engineering Blog, March 2021
- Building a Real-time Buyer Signal Data Pipeline for Shopify Inbox — Ashay Pathak and Selina Li, Shopify Engineering Blog, December 2021
- 3 (More) Tips for Optimizing Apache Flink Applications — Kevin Lam and Rafael Aguiar, Shopify Engineering Blog, December 2022
- How to Reliably Scale Your Data Platform for High Volumes — Arbab Ahmed and Bruno Deszczynski, Shopify Engineering Blog, December 2020
- How we prepare Shopify for BFCM (2025) — Kyle Petroski and Matthew Frail, Shopify Engineering Blog, November 2025
- Using Server Sent Events to Simplify Real-time Streaming at Scale — Bao Nguyen, Shopify Engineering Blog, November 2022
- Shopify open-sources Sarama, a client for Kafka 0.8 written in Go — Evan Huus, Shopify Technology Blog, August 2013
- Shopify Flash-Sales and Apache Kafka — Kafka Summit San Francisco 2017
- Kafka Summit London 2018 speaker announcement — Christopher Vollick and Sam Obeid, Confluent, 2018
If you want visibility into your own Kafka estate, including consumer lag, partition health, and topic throughput, give Kpow a try with a free 30-day trial. You can connect it to any Kafka cluster in minutes and deploy it via Docker, Helm, or JAR.