
How Airbnb uses Apache Kafka in production
Table of contents
Airbnb runs Apache Kafka across at least six production systems simultaneously, from the analytics pipeline that ingests more than 35 billion events per day to the merge queue that serialises thousands of code changes across its engineering monorepos. Few companies have found this many distinct architectural roles for a single piece of infrastructure, which makes Airbnb's Kafka story worth examining in detail.
The engineering problem Kafka solves at Airbnb is not a single one. It provides the event backbone for real-time personalisation, change data capture, distributed materialised views, and durable write-ahead logging, each with different throughput, ordering, and latency requirements handled through separate cluster and pipeline designs.
Company overview
Airbnb operates a two-sided marketplace connecting hosts and guests across more than 220 countries. At the scale Airbnb operates, the platform generates a continuous stream of user activity events: searches, listing views, booking requests, messages, and availability updates, each of which needs to reach multiple downstream systems in near real time.
Airbnb began building Kafka-backed streaming infrastructure around 2016-2017, when Youssef Francis and Jun He described their streaming logging pipeline at Kafka Summit New York 2017. The early architecture used Kafka as an event bus feeding Airstream, an internal Spark Streaming framework, into HBase and Hive. Over the following years the team expanded Kafka's role substantially: adding a CDC system (SpinalTap, open-sourced in 2019), migrating real-time personalisation from Spark to Flink, building a Kafka Streams-based merge queue (Evergreen, presented at Kafka Summit London 2022), and adopting Kafka as the write-ahead log in their distributed key-value store (Mussel).
Airbnb's Kafka use cases
Analytics event ingestion
The foundational use case is also the highest-volume one. Clients (mobile apps, web browsers) and online services publish logging events directly to Kafka. A Spark Streaming job built on Airstream, Airbnb's internal streaming framework, reads from Kafka continuously, writes events to HBase for deduplication, then dumps them hourly into Hive partitions for downstream ETL and analytics jobs. The Hive-based data ingestion framework processes more than 35 billion Kafka event messages and more than 1,000 tables per day.
Kafka brokers were migrated from on-premises infrastructure to VPC-hosted infrastructure, producing a 4x throughput improvement. To support capacity planning, the team tracks QPS per Kafka partition and adds partitions proactively as event volumes grow.
Real-time personalisation via the User Signals Platform
Every user interaction on the Airbnb platform, a search query, a listing view, a booking, generates an event that flows into the User Signals Platform (USP). More than 100 Apache Flink jobs consume these Kafka events, transform them into structured "User Signals," write the results to a key-value store, and re-emit signals as Kafka events for downstream consumer jobs handling session engagement and user segmentation. The USP service layer handles 70,000 queries per second.
The decision to use Flink rather than Spark Streaming was deliberate and driven by a specific latency constraint: Spark's micro-batch model processed events in small batches rather than one at a time, which introduced delays that were incompatible with real-time personalisation requirements. Flink's event-by-event processing model met those latency requirements.
Change data capture via SpinalTap
SpinalTap is Airbnb's open-source CDC service, available on GitHub at airbnb/SpinalTap. It detects data mutations in MySQL, DynamoDB, and proprietary in-house storage systems, then propagates them as standardised events to Kafka using Apache Thrift serialisation, which supports both the Ruby and Java consumers that read from those topics.
SpinalTap supplies several downstream systems from a single Kafka stream of mutations. Search indexing: mutations flow from Kafka into Elasticsearch for products like review search and inbox search. Cache invalidation: downstream services subscribe to mutation events and evict or update entries in Memcached and Redis without blocking the request path. Service signalling: dependent services subscribe to data changes in near real time, with the Availability service monitoring Reservation changes as a concrete example.
SpinalTap provides at-least-once delivery with zero data-loss tolerance, per-record ordering (commit-order preservation), and epoch-based split-brain mitigation for high-availability deployments. A continuous validation pipeline runs in both pre-production and production environments.
Distributed materialised views via Riverbed
Some of Airbnb's highest-traffic features require joining data across multiple distinct databases. Fetching that data at query time produces latency that is incompatible with user-facing response time requirements. Riverbed is Airbnb's Lambda-like framework that solves this by pre-computing and maintaining distributed materialised views.
The streaming component of Riverbed consumes CDC events via Kafka. Rather than partitioning those events by their source record ID, Riverbed repartitions them in Kafka by the materialised view document ID. This ensures that all CDC events affecting a given materialised view document are routed to the same consumer and processed serially, eliminating concurrent-write race conditions without requiring distributed locking.
Riverbed currently powers 50+ materialised views, processes 2.4 billion events per day, and writes 350 million documents per day.
Write-ahead log in the Mussel key-value store
Mussel is Airbnb's distributed key-value store for derived data. Rather than writing directly to the backend database, every write is first persisted to Kafka, which acts as the durable write-ahead log. Downstream Replayer and Write Dispatcher components consume from Kafka and apply writes to the backend database in order. In Mussel V1, the Kafka topic used 1,024 partitions aligned with the shard structure, so each shard's data was processed by a single consumer in commit order.
Mussel V2, re-architected in 2024-2025, pairs a NewSQL backend with Kubernetes-native control and a stateless horizontally-scalable Dispatcher layer. Kafka continues to serve as the common durable log. The system sustains more than 100,000 streaming writes per second and supports tables exceeding 100 terabytes, with p99 read latencies under 25 milliseconds. During the V1-to-V2 migration, Kafka's role as a shared log allowed both versions to run simultaneously and enabled data replay for validation.
Developer infrastructure: the Evergreen merge queue
Most of Airbnb's engineering work happens inside large monolithic repositories. With thousands of changes being merged every day, there is a meaningful probability that changes passing automated checks independently will fail when integrated with concurrent changes on the mainline. Evergreen is Airbnb's merge queue system that guarantees serializability of changes.
At its core, Evergreen uses an actor model built on Kafka Streams. A state machine applies a pure function on events and transforms them into actions executed by workers; workers may in turn produce additional events. The team selected Kafka Streams for its exactly-once processing guarantees (critical for a system where duplicate or missing merges would be costly), its built-in load balancing, and its minimal external dependencies. Janusz Kudelka and Joel Snyder from Airbnb's Developer Infrastructure team presented the architecture and its production learnings at Kafka Summit London 2022.
Scale & throughput
Airbnb operates multiple Kafka clusters segmented by function. Youssef Francis and Jun He referenced separate clusters for analytics, change data capture, and inter-service communication as early as their 2017 Kafka Summit talk, a pattern consistent with the distinct pipeline architectures described above.
Airbnb's Kafka architecture
Producer architecture
Producers span a wide range of application types. Mobile clients and web browsers publish logging events directly. Online services publish both logging events and mutation events (via SpinalTap's binlog-parsing source layer). SpinalTap's destination layer buffers outbound events in bounded in-memory queues and uses destination pools with thread-based partitioning to absorb traffic spikes without applying back-pressure upstream.
Serialisation varies by pipeline. SpinalTap uses Apache Thrift for cross-language support across Ruby and Java producers and consumers. The USP and Riverbed pipelines are Java-based.
Consumer architecture
Consumer group design reflects the isolation requirements of each system. The USP runs 100+ independent Flink jobs, each consuming from specific Kafka topics and maintaining its own consumer group state. Hot-standby Task Managers are provisioned to take over Kafka partition assignments immediately on failure, avoiding rebalancing delays.
For idempotency in the USP, the team made a deliberate storage-layer decision: rather than deduplicating events in the stream processor (which adds state complexity under at-least-once delivery), computed signals are written to an append-only key-value store where event timestamps serve as version keys. A later write for the same key with a higher timestamp wins, making the storage layer naturally idempotent.
Offset management in the logging ingestion pipeline is coupled to partition count monitoring. QPS per partition is tracked as the primary metric for determining when to add partitions.
Stream processing
Two stream processing systems run on top of Kafka.
Apache Flink powers the User Signals Platform. The Flink-based architecture was chosen specifically because Spark Streaming's micro-batch model introduced event delays that made it unsuitable for real-time personalisation. To lower the barrier to authoring Flink jobs, the USP team built a config-driven layer: engineers define signal transformations declaratively, and a setup script generates the corresponding Flink job configuration, batch backfill files, and monitoring alerts automatically.
Kafka Streams powers Evergreen. The stateful actor model is implemented as a Kafka Streams topology, where the state machine and its transitions are expressed as stream-processing operations. Exactly-once processing semantics are a hard requirement: the merge queue must not execute a merge twice or skip one.
Kafka Connect ecosystem
SpinalTap functions as a custom producer framework rather than a standard Kafka Connect source connector: it parses MySQL binary logs and DynamoDB streams directly and writes standardised Thrift-encoded mutation events to Kafka. The downstream consumers (Elasticsearch indexers, cache invalidators, Hive exporters) are custom applications rather than Connect sink connectors. No use of standard Kafka Connect connectors is referenced in Airbnb's public engineering writing.
Special techniques & engineering innovations
Partition alignment in Mussel (ordering without coordination)
Mussel V1 used 1,024 Kafka partitions aligned with the 1,024-shard structure of the backend store. Because each partition maps to exactly one shard, all writes to a shard flow through a single Kafka partition and are applied by a single consumer in order. This gives per-shard write ordering without needing a separate coordination layer or distributed lock.
Partitioning by document ID in Riverbed (race condition prevention)
CDC events in Riverbed are repartitioned in Kafka by the materialised-view document ID rather than the source database record ID. Because a materialised view document can be affected by mutations across multiple upstream tables, naive partitioning by source record would route concurrent updates to the same document to different consumers, creating write races. Routing by document ID ensures serial processing per document, which eliminates the race without requiring distributed locking or explicit conflict resolution.
Balanced Kafka reader (decoupling Spark parallelism from partition count)
In Airbnb's early Spark Streaming logging pipeline, the degree of Spark task parallelism was directly tied to the number of Kafka partitions. Adding throughput capacity required repartitioning Kafka topics, which is an operationally disruptive change. Hao Wang's team built a balanced Kafka reader that allows Spark parallelism to be configured independently of partition count, enabling flexible capacity increases without forced repartitioning.
Kafka as a migration backbone in Mussel V2
During the multi-version migration from Mussel V1 to V2, Kafka's WAL role made it possible for both versions to operate simultaneously against the same event stream. The new Replayer and Write Dispatcher components could consume from the same Kafka log as V1, allowing engineers to validate V2 correctness by replaying production events against the new backend before cutting over. This reduced the risk of a hard migration cutover.
Operating Kafka at scale
Deployment model: Kafka is self-managed at Airbnb. The migration of broker infrastructure to VPC provided a 4x throughput improvement. No use of Amazon MSK or Confluent Cloud appears in any primary engineering sources.
Monitoring: Partition-level QPS is the primary capacity signal for the analytics ingestion pipeline. The USP generates per-job monitoring alerts automatically through the config-driven job setup. Mussel V2 provides namespace-level quotas and dashboards.
Upgrade and migration strategy: The Mussel V1-to-V2 migration used Kafka as the shared log to run both versions in parallel, supporting incremental validation and rollback. SpinalTap's continuous validation pipeline runs in pre-production alongside production, allowing schema and routing changes to be tested against real mutation streams before deployment.
Developer experience: SpinalTap's continuous validation pipeline, the USP's config-driven job generator, and Evergreen's Kafka Streams actor model all reflect a deliberate effort to abstract Kafka's operational complexity away from application engineers. The goal in each case is to expose a declarative or event-driven interface while the platform team owns the Kafka infrastructure underneath.
Challenges & how they solved them
Latency incompatibility between Spark Streaming and real-time personalisation
Problem: The User Signals Platform was initially built on Spark Streaming, which processes events in micro-batches rather than one at a time. This introduced processing delays that could not be reduced below a threshold determined by the micro-batch interval, making the system unsuitable for real-time personalisation features.
Root cause: A fundamental processing model constraint in Spark Streaming, not a configuration or scaling issue.
Solution: Migrated to Apache Flink, which processes events one-by-one. The USP team also built a config-driven layer to absorb the additional complexity Flink introduces for new engineers.
Outcome: The USP now processes more than 1 million events per second across 100+ Flink jobs, serving the personalisation service layer at 70,000 QPS.
Concurrent-write race conditions in Riverbed
Problem: Materialised view documents in Riverbed are computed from data spread across multiple upstream tables. When mutations in different upstream tables triggered concurrent updates to the same materialised view document, writes from two consumers could interleave and produce a corrupted result.
Root cause: The partition key used for Kafka routing did not correspond to the unit of write exclusion (the materialised view document ID).
Solution: Repartitioned CDC events in Kafka by materialised view document ID. All updates to a given document are now routed to a single consumer and processed serially.
Outcome: Race conditions eliminated without distributed locking or explicit conflict detection.
Kafka partition count coupling in Spark Streaming
Problem: Airbnb's Spark Streaming logging pipeline tied Spark task parallelism directly to Kafka partition count. Scaling throughput meant repartitioning Kafka topics, an operationally costly change that required coordinating producer and consumer restarts.
Root cause: The default Spark Kafka integration assigned one Spark task per Kafka partition.
Solution: Developed a custom balanced Kafka reader that maps Kafka partitions to Spark tasks independently, allowing Spark parallelism to be increased without changing partition count.
Outcome: The logging pipeline could grow event throughput without mandatory Kafka repartitioning operations.
One-record-at-a-time throughput in Evergreen
Problem: Evergreen's Kafka Streams state machine processed one record at a time, which became a throughput bottleneck as the volume of daily merge requests grew. Replaying records for debugging purposes was also difficult.
Root cause: Kafka Streams' processing model, combined with the actor pattern's sequential state transitions.
Status: Identified as a known production challenge. Janusz Kudelka and Joel Snyder described it as a learning from operating Evergreen at Kafka Summit London 2022. No specific resolution is described in available public sources.
Full tech stack
Key contributors
Key takeaways for your own Kafka implementation
- Use partition key selection as an ordering contract, not just a routing decision. Mussel aligns partition count to shard count to guarantee per-shard ordering without coordination. Riverbed partitions by document ID rather than source record ID to make CDC processing race-free. In both cases, the partition key was chosen to match the unit of consistency required by the consuming system, not just to distribute load evenly.
- Micro-batch stream processing is incompatible with some latency targets. Airbnb's USP found that Spark Streaming's micro-batch model had a latency floor that could not be reduced to meet real-time personalisation requirements. If your use case has a latency constraint below what a micro-batch interval can provide, Flink's event-by-event processing model is worth the additional operational complexity.
- Kafka as a write-ahead log simplifies distributed migrations. Mussel V2's migration succeeded in part because Kafka already held a complete ordered log of writes. The new backend could consume the same stream independently, enabling parallel validation before cutover. If you are migrating a stateful system, having a durable Kafka WAL in place before the migration substantially reduces cutover risk.
- Decouple stream processor parallelism from Kafka partition count. Airbnb's balanced Kafka reader broke the coupling between Spark task count and partition count, allowing throughput to be increased without repartitioning. This is worth considering in any system where repartitioning is operationally costly, particularly where partition count affects ordering guarantees downstream.
- Abstract Kafka complexity with a declarative layer if your engineering organisation is large. The USP config-driven Flink job generator, SpinalTap's structured pipeline model, and Evergreen's actor abstraction all serve the same goal: letting application engineers define what they need from a streaming pipeline without needing to understand Kafka consumer groups, offset management, or Flink topology design. At Airbnb's scale, this separation of concerns is what allows Kafka to serve six distinct systems without six separate platform teams.
Sources and further reading
- Hao Wang, "Scaling Spark Streaming for Logging Event Ingestion," Airbnb Engineering Blog: https://medium.com/airbnb-engineering/scaling-spark-streaming-for-logging-event-ingestion-4a03141d135d
- Hao Wang, Kafka Summit NYC 2019 talk: https://videos.confluent.io/watch/b94DNHsNfzTt8apLDmDLHP
- Youssef Francis, Jun He, Kafka Summit NYC 2017: https://www.confluent.io/kafka-summit-nyc17/every-message-counts-kafka-foundation-highly-reliable-logging-airbnb/
- Kidai Kwon, "Building a User Signals Platform at Airbnb," Airbnb Engineering Blog: https://medium.com/airbnb-engineering/building-a-user-signals-platform-at-airbnb-b236078ec82b
- Jad Abi-Samra, Litao Deng, Zuofei Wang, "Capturing Data Evolution in a Service-Oriented Architecture," Airbnb Engineering Blog: https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f
- SpinalTap on GitHub: https://github.com/airbnb/SpinalTap
- Amre Shakim, "Riverbed: Optimizing Data Access at Airbnb's Scale," Airbnb Engineering Blog: https://medium.com/airbnb-engineering/riverbed-optimizing-data-access-at-airbnbs-scale-c37ecf6456d9
- Xiangmin Liang, Sivakumar Bhavanari, Amre Shakim, "Riverbed data hydration — Part 1," Airbnb Engineering Blog: https://medium.com/airbnb-engineering/riverbed-data-hydration-part-1-e7011d62d946
- InfoQ, "Distributed Materialized Views: How Airbnb's Riverbed Processes 2.4 Billion Daily Events": https://www.infoq.com/news/2023/10/airbnb-riverbed-introduction/
- Airbnb Engineering Blog, "Mussel — Airbnb's Key-Value Store for Derived Data": https://airbnb.tech/data/mussel-airbnbs-key-value-store-for-derived-data/
- InfoQ, "Airbnb's Mussel V2: Next-Gen Key Value Storage to Unify Streaming and Bulk Ingestion": https://www.infoq.com/news/2025/10/airbnb-nextgen-kv-storage-mussel/
- Janusz Kudelka, Joel Snyder, Kafka Summit London 2022: https://www.confluent.io/events/kafka-summit-london-2022/evergreen-building-airbnbs-merge-queue-with-kafka-streams/
Try Kpow with your own Kafka cluster: If you are running Kafka in production and want visibility into consumer lag, partition throughput, and topic health across multiple clusters, Kpow offers a free 30-day trial. You can connect it to any Kafka cluster in minutes and deploy via Docker, Helm, or JAR.