How Tencent uses Apache Kafka in production

Kafka

Factor House·June 2, 2026·15 min read

Tencent’s platform group processes 20 trillion messages per day across 700 Apache Kafka clusters, with individual applications generating burst traffic of up to 100 Gb/s. To operate at that volume, Tencent’s engineers built a federated cluster architecture that presents hundreds of physical Kafka clusters as a single logical deployment to producers and consumers, and contributed two Kafka improvement proposals back to the open-source community in the process.

The core engineering problem Kafka solves for Tencent is scale beyond what any single cluster can deliver, combined with the need to absorb overnight traffic spikes of up to 20x without requiring application-side changes.

Company overview

Tencent is a Chinese technology conglomerate operating social platforms (WeChat, QQ), digital media (Tencent Video, QQ Music), gaming, cloud infrastructure, and financial services. Its Platform and Content Group (PCG) alone spans internet, social, and content products serving hundreds of millions of daily active users.

The scale of data movement across these products is difficult to overstate. WeChat logs interactions across messaging, payments, and mini-programs. Tencent Video streams content and telemetry simultaneously. Real-time recommendation engines and trend detection systems depend on sub-second feature delivery across the same infrastructure.

Tencent began developing its own messaging infrastructure as early as 2013, when the Big Data team built TubeMQ, a Kafka-inspired internal system designed for high-throughput log collection. TubeMQ was open-sourced in 2019 and donated to the Apache Incubator, eventually becoming part of the Apache InLong project. For real-time data pipelines requiring broad ecosystem compatibility, Tencent PCG adopted Apache Kafka directly, and by 2020 had grown that deployment to 10 trillion messages per day. A year later, the figure had doubled.

Key milestones:

2013: Tencent Big Data team develops TubeMQ for internal messaging
2019: TubeMQ open-sourced and donated to the Apache Incubator
Pre-2020: Tencent PCG builds pproxy/cproxy proxy-layer federation for Kafka
2020-08: Kenway Chen, Kahn Chen, and George Shu publish a description of PCG’s federated architecture on the Confluent blog, disclosing 10 trillion messages/day across hundreds of clusters
2021-03 / 2021-06: George Shu submits KIP-694 (partition reduction) and KIP-693 (client-side circuit breaker) to the Apache Kafka community
2021-07: Kahn Chen and Liang Wang present at Kafka Summit APAC 2021, disclosing growth to 20 trillion messages/day and a move to a broker-layer master cluster federation
2023-12: Lu Shilin (CKafka kernel lead) publishes a detailed account of tiered storage at Tencent Cloud, covering local disk and COS hybrid retention

Tencent’s Kafka use cases

Tencent PCG uses Kafka across several distinct pipeline categories, each owned by different product teams.

Cross-region log ingestion: WeChat, QQ, QZone, Tencent Video, and news products route interaction and operational logs through Kafka for centralised ingestion and processing. The breadth of products means the platform must handle heterogeneous write patterns without per-team customisation.

Machine learning feature pipelines: Real-time recommendation engines and trend detection systems depend on Kafka for sub-second feature delivery. Latency requirements here are strict: stale features degrade recommendation quality directly.

Asynchronous microservice communication: Tencent PCG runs its internet, social, and content platforms as microservices communicating through Kafka topics. This decouples services that otherwise would require synchronous calls across product boundaries.

Real-time analytics: Business intelligence pipelines consume Kafka topics for real-time reporting across Tencent product lines, feeding dashboards and alerting systems.

Real-time data warehouse ingestion: Applications including Kandian and WeChat Video collect event data through Kafka into a downstream warehouse built on Hive, HBase, and HDFS, with Apache Flink handling stateful stream processing between the two layers.

Tiered cold/hot data storage (CKafka): Tencent Cloud’s managed Kafka service uses tiered storage to separate hot and cold data, offloading inactive log segments to Tencent Cloud Object Storage (COS) while keeping active data on local disk. This supports long-retention use cases without proportional disk cost growth.

Log collection and monitoring aggregation: CKafka’s product documentation describes compressed log collection and monitoring data aggregation as primary use cases for cloud customers, including integration with Tencent Cloud Log Service (CLS).

Scale and throughput

The scale figures below are drawn from public disclosures by named Tencent engineers at Kafka Summit APAC 2021 and the Confluent blog.

Messages per day: 20 trillion (July 2021)
Kafka clusters: 700 (July 2021)
Brokers (physical): ~500 (2020 figure, before master cluster design) (August 2020)
Maximum brokers per physical cluster: Up to 1,000 in the evolved federation design (July 2021)
Peak throughput (single product): 4 million messages/sec (64 GB/s) (August 2020)
Burst traffic (single application): 100 Gb/s (July 2021)
Maximum cluster bandwidth: 240 Gb/s at ~40% CPU utilisation (August 2020)
Target message loss rate: 0.01% end-to-end (August 2020)
Overnight spike magnitude: Up to 20x from product promotions and experiments (August 2020)

The jump from 10 trillion to 20 trillion messages per day between August 2020 and July 2021 gives a sense of the growth rate Tencent’s infrastructure team was tracking against. Static cluster sizing was not viable at this trajectory.

Tencent’s Kafka architecture

Tencent PCG’s Kafka architecture evolved through two distinct federation phases, both motivated by the need to exceed the scaling limits of individual clusters without requiring application changes.

Phase 1: proxy-layer federation (pproxy/cproxy)

The proxy-layer design presents multiple physical Kafka clusters as a single logical cluster to producers and consumers.

Producer proxy (pproxy) and consumer proxy (cproxy) each implement the Kafka broker protocol, so existing Kafka clients connect to them without modification. A lightweight name service maps client IDs to proxy broker collections. A custom controller service manages federation metadata: topic state across physical clusters, proxy node lifecycle, and partition composition for logical topics.

A logical topic with 8 partitions distributes those partitions across two physical clusters (4 partitions each). Clients interact only with logical topic metadata. When a physical cluster reaches capacity, a new cluster can be added to the federation in approximately 2 minutes. New clusters can be fully provisioned in 10 minutes. Metadata refresh latency runs at approximately 1 second.

The design supported up to 60 physical clusters per logical cluster, with 10 brokers per physical cluster.

Phase 2: broker-layer master cluster federation

The proxy layer imposed roughly 30% operational overhead. Every new Kafka API capability required interface modifications in both the pproxy and cproxy layers, creating code duplication and maintenance burden.

The replacement design moves federation logic into the broker layer. A master cluster aggregates metadata from multiple sub-clusters without running brokers of its own. Client SDKs fetch merged metadata directly from the master cluster, eliminating the proxy hop. Consumer groups and transactions are managed centrally in the master cluster. Redis stores configuration and offset data for disaster recovery.

This design used the KIP-500 role-splitting architecture (which removed ZooKeeper as a dependency) and reduced cluster failure recovery time from 30-60 minutes under standard Kafka to under 1 minute.

CKafka: Tencent Cloud’s managed Kafka service

CKafka is Tencent Cloud’s managed Kafka offering. Its control plane is isolated from the data plane: control operations go through the TencentCloud API SDK, while data operations use the standard Apache Kafka SDK. The service is 100% compatible with the Kafka API from version 0.9 through 2.8, supporting Kafka versions 2.4, 2.8, and 3.2.

CKafka deploys across availability zones with automated fault recovery and integrates with 15+ Tencent Cloud products, including TencentDB, Elasticsearch Service, COS, Tencent Kubernetes Engine (TKE), Oceanus (the managed Flink service), and CLS.

Producer architecture

Tencent’s pproxy layer handled producer-side batching transparently. In the master cluster design, producers connect directly to sub-cluster brokers using merged metadata from the master cluster. The 0.01% end-to-end message loss target shaped producer reliability configuration across the PCG platform.

Consumer architecture

Consumer group management moved to the master cluster in Phase 2, centralising offset tracking and group coordination. CKafka exposes offset reset and message querying by offset or time through the console, which covers the most common operational needs for cloud customers.

Stream processing

Apache Flink sits downstream of Kafka for stateful processing in Tencent’s real-time data warehouse pipelines. Kafka topics feed Flink jobs that produce output to Hive, HBase, and HDFS for downstream analytics.

Kafka Connect ecosystem

No public disclosure from Tencent covers their use of Kafka Connect connectors specifically. CKafka’s integration with TencentDB, CLS, and Oceanus is documented at the product level, but connector implementation details have not been published by named contributors.

Special techniques and engineering innovations

Logical-to-physical partition mapping

The pproxy/cproxy layer remaps logical partition numbers to physical cluster partitions transparently to clients. Capacity can be expanded without data rebalancing or application-side changes, and zero-downtime cluster decommission is supported: the target cluster is marked read-only, producer metadata is updated first to halt writes, and consumer metadata is updated after topics expire.

KIP-693: client-side circuit breaker for partition write errors

George Shu (General Manager, PCG Data Infrastructure and Platforms) submitted KIP-693 to the Apache Kafka community in 2021. The problem it addresses: disk failures and high disk utilisation in some partitions cause long write latency, which fills the shared producer buffer and degrades throughput for all other partitions on the same broker. KIP-693 proposes a configuration-driven circuit-breaking mechanism to mute failing partitions at the client level, isolating the degradation.

KIP-694: support for reducing topic partitions

KIP-694, also submitted by George Shu, addresses the inverse problem: Kafka supports adding partitions to a topic but not removing them. At Tencent’s scale, traffic spikes require rapidly expanding partition counts. After the spike, those partitions cannot be reclaimed, which steadily increases cluster metadata overhead. KIP-694 proposes lossless partition reduction as a native Kafka capability.

Tiered storage: local disk and COS hybrid (CKafka)

Lu Shilin (CKafka kernel lead) documented this architecture in December 2023. Hot data is retained on local cloud disks. Inactive log segments are uploaded asynchronously to Tencent COS (object storage). A “Local Retention” parameter controls how long uploaded segments persist locally; a separate “Retention” parameter governs the full data window.

To avoid I/O contention during upload and download, Tencent implemented parallel transfer with rate limiting. Preloading and prefetching load anticipated hot data into memory pools ahead of consumer reads. Memory management uses heap-off-heap ByteBuffer reuse to reduce GC pressure. Per-topic and per-cluster rollback capability is built in.

The motivation was straightforward: historical data access patterns were polluting the page cache, evicting hot data and degrading SLAs for active consumers.

Operating Kafka at scale

Deployment model: Tencent PCG runs a self-managed Kafka deployment across its private data centres. Tencent Cloud operates CKafka as a managed service for external cloud customers using a separate control plane.

Monitoring and observability: CKafka exposes multi-dimensional metrics at instance, topic, and consumer group granularity. A top-10 topic ranking by traffic is available in the console out of the box. One-click diagnostic inspection identifies cluster issues and risks. Tencent PCG built its own monitoring and automated management tooling internally to operate hundreds of clusters; the specific tooling stack has not been disclosed publicly by named contributors.

Cluster lifecycle management: Clusters can be initialised in 10 minutes and scaled by adding one physical cluster in 2 minutes (proxy-layer design). In the master cluster design, sub-clusters can be added without modifying the federation’s logical topology.

Partition rebalancing: CKafka performs automatic partition balancing during off-peak hours and supports automatic disk watermark adjustment to prevent business disruption during high-growth periods.

Security: ACL, SASL, and SSL authentication are console-configurable in CKafka. Integration with Tencent Cloud Access Management (CAM) and CloudAudit provides audit traceability for enterprise customers.

Developer experience: CKafka’s console supports offset reset and message querying by offset or time. Tiered storage provides granular rollback at topic or cluster level, which supports operational recovery scenarios without requiring full topic replay.

Challenges and how they solved them

20x overnight volume spikes

Product promotions and large-scale experiments routinely caused overnight traffic to spike 20x. Static cluster sizing was either wasteful during normal operation or insufficient during peaks.

Solution: the federated cluster design allows rapid provisioning of additional physical clusters (2 minutes to add one cluster) and decommissioning after peaks, without application-side changes. Clusters are treated as fungible capacity units rather than permanent infrastructure.

ZooKeeper-bounded partition count

Before Kafka 2.8, ZooKeeper imposed a practical ceiling well below one million partitions. For Tencent’s real-time data warehouse, which required a large number of topics with many partitions, this was a hard constraint.

Solution: the broker-layer federation design used the KIP-500 architecture (removing ZooKeeper) to scale beyond this ceiling. KIP-694 was submitted to the community to address the complementary problem of partition reduction after peaks.

5% annual disk failure rate causing 30-60 minute recovery

Standard Kafka’s disaster tolerance operates at the broker level. A 30-60 minute recovery window from cluster failure was incompatible with Tencent PCG’s SLA requirements.

Solution: the master cluster federation model reduced cluster failure recovery to under 1 minute. Recovery is handled at the federation coordination layer rather than requiring full cluster restart and leader election across hundreds of brokers.

30% operational overhead from the proxy layer

The pproxy/cproxy approach required continuous interface modifications to expose new Kafka API capabilities, creating code duplication across two proxy implementations.

Solution: the broker-layer master cluster model eliminated the proxy layer. SDK-level metadata merging replaced the proxy hop, and new Kafka API features became available without proxy modifications.

Keyed message ordering across federated clusters

Adding physical clusters to a federation could route messages with identical keys to different partitions, breaking ordering guarantees for key-partitioned topics.

Tencent’s approach here was pragmatic rather than architectural: they noted that keyed messages are used only occasionally in their pipelines, and they accepted the trade-off in favour of scalability. A complete technical solution for ordering across federated partitions was not described in their public disclosures.

Historical data access polluting page cache (CKafka)

Long-retention topics generated cold data reads that evicted hot data from cache, degrading SLAs for active consumers.

Solution: tiered storage offloads cold segments to COS asynchronously, keeping the page cache available for hot data. Prefetching loads anticipated consumer reads into memory pools before they are requested.

Full tech stack

Category	Tools	Notes
Message broker	Apache Kafka	Versions 2.4, 2.8, 3.2 supported in CKafka; PCG internal version not disclosed
Cluster federation (Phase 1)	pproxy / cproxy (custom)	Producer and consumer proxy implementing the Kafka broker protocol; routes to physical clusters
Cluster federation (Phase 2)	Master cluster (custom broker-layer design)	Aggregates sub-cluster metadata; central consumer group and transaction management
Configuration and offset storage	Redis	Used in master cluster federation for disaster recovery
Stream processing	Apache Flink	Stateful processing; consumes from Kafka for real-time data warehouse and lake ingestion
Remote storage (tiered)	Tencent Cloud Object Storage (COS)	Cold segment target in CKafka tiered storage architecture
Data warehouse	Hive / HBase / HDFS	Downstream storage receiving data via Kafka and Flink pipelines
Managed Kafka service	CKafka (TDMQ for CKafka)	Tencent Cloud’s managed offering; Kafka API-compatible (0.9-2.8)
Alternative messaging (non-Kafka)

Key contributors

Kenway Chen (Tech Lead and Engineering Manager, Tencent PCG): Co-author of the Confluent blog post describing PCG’s federated Kafka architecture (2020)
Kahn Chen (Software Architect, Tencent PCG): Co-author of the Confluent blog post; presented “Enhancing Apache Kafka for large-scale real-time data pipeline at Tencent” at Kafka Summit APAC 2021
George Shu (General Manager, PCG Data Infrastructure and Platforms): Co-author of the Confluent blog post; proposer of KIP-694 and KIP-693 to the Apache Kafka community
Liang Wang (Thirteen Wang) (Senior Engineer, Tencent): Co-presenter with Kahn Chen at Kafka Summit APAC 2021
Lu Shilin (鲁仕林) (Kernel Lead, Tencent Cloud CKafka): Author of “Kafka tiered storage practice and evolution at Tencent Cloud” (Juejin, December 2023)

Key takeaways for your own Kafka implementation

Treat individual clusters as bounded capacity units, not infinite resources. Tencent PCG found that growing individual clusters indefinitely was impractical. Their federation design treats each physical cluster as a bounded unit and routes logical topics across multiple clusters. If you are approaching partition or throughput ceilings, a federated or tiered architecture separates scaling concerns more cleanly than tuning a single cluster.
Separate the control plane from the data plane early. CKafka’s isolation of control operations (via the cloud API) from data operations (via the Kafka SDK) allowed Tencent Cloud to manage the managed service without exposing control-plane complexity to customers. For internal platforms, the same principle applies: provisioning, ACL management, and monitoring should not go through the same path as message production and consumption.
Model your traffic distribution before choosing partition counts. Tencent’s experience with keyed messages across federated partitions highlights that key-based ordering and federation do not compose cleanly. If your pipelines depend on key-partitioned ordering, design for that constraint at the federation boundary before scaling out clusters, rather than discovering the incompatibility under load.
Contribute upstream when you need features that standard Kafka does not provide. KIP-693 and KIP-694 both address real operational problems Tencent encountered at scale: partition write isolation and lossless partition reduction. Submitting KIPs rather than maintaining private forks keeps the internal codebase closer to upstream and benefits the broader community with solutions that have been tested against one of the world’s largest Kafka deployments.
Design tiered storage for access pattern isolation, not just cost reduction. Tencent Cloud’s CKafka tiered storage work shows that the primary motivation was cache isolation: cold data reads were evicting hot data and degrading SLAs. If you are evaluating Kafka tiered storage, measure its impact on page cache hit rates for active consumers as the primary success metric, not just storage cost savings.

Sources and further reading

Primary sources:

Kenway Chen, Kahn Chen, George Shu. “How Tencent PCG scales massive data pipelines with Apache Kafka.” Confluent Blog, 2020-08-03.
Kahn Chen, Liang Wang. “Enhancing Apache Kafka for large-scale real-time data pipeline at Tencent.” Kafka Summit APAC 2021, 2021-07-28.
Kahn Chen, Liang Wang. Kafka Summit APAC 2021 slide deck (SlideShare).
George Shu. KIP-694: Support Reducing Partitions for Topics. Apache Kafka KIP wiki, 2021-03.
George Shu. KIP-693: Client-side Circuit Breaker for Partition Write Errors. Apache Kafka KIP wiki, 2021-06.
Lu Shilin. “Kafka tiered storage practice and evolution at Tencent Cloud.” Juejin, 2023-12-19.
Tencent Cloud. “Comparison with Apache Kafka.” Tencent Cloud documentation.

If you want visibility into your own Kafka clusters, consumer lag, and topic throughput without building custom dashboards from scratch, Kpow gives you a 30-day free trial. You can connect it to any Kafka cluster in minutes and deploy via Docker, Helm, or JAR.