Abstract digital artwork featuring smooth, overlapping curved shapes in shades of green and blue on a black background.

How Salesforce uses Apache Kafka in production

Table of contents

Factor House
May 16th, 2026
xx min read

Salesforce has published more detail about its Kafka operations than most companies its size, which makes its engineering blog an unusually complete record of what it takes to run Apache Kafka at genuine enterprise scale. By 2021, the company's fleet had grown to 100+ clusters, 2,500+ brokers, 50,000+ topics, 300,000+ partitions, 40+ PB of storage, and throughput exceeding 3 trillion events per day — a figure that had been doubling annually.

The engineering problem Kafka solves at Salesforce is not a single use case but a platform-level one: how do you provide a unified, reliable event transport across a multi-tenant SaaS estate that spans hundreds of global data centres, hundreds of thousands of customer orgs, and applications ranging from operational metrics to real-time AI?

Company overview

Salesforce is a cloud-based CRM and enterprise software company founded in 1999 and headquartered in San Francisco. It serves more than 150,000 businesses globally across its Sales Cloud, Service Cloud, Marketing Cloud, and Agentforce AI product lines.

Salesforce adopted Kafka in production no later than May 2014, when it hosted an Apache Kafka edition of the SF Logging Meetup at its headquarters. By June 2014, Rajasekar Elango, a lead developer on the Monitoring and Management Team, was presenting the company's early Kafka setup at a public Kafka Meetup — a 3-node ZooKeeper, 5-broker cluster with SSL/TLS mutual authentication and Avro serialisation.

The trigger for adoption was operational observability: the DVA (Diagnostics, Visibility, and Analytics) team needed a scalable transport layer for metrics and logs across a growing global data centre footprint.

Date Event
May 2014 Apache Kafka in production use confirmed; SF Logging Meetup hosted at Salesforce HQ
June 2014 Rajasekar Elango presents SSL/TLS Kafka setup at public Kafka Meetup
Late 2015 Next-generation log pipeline (project Ajna / DeepSea) development begins
2016 Ajna fleet reaches 30+ clusters and hundreds of billions of events per day
December 2016 Log pipeline completeness measured at only ~25%; reliability programme begins
September 2017 Log pipeline reaches 7-nines completeness (99.99999%), sustained consistently
April 2018 Mirus (custom cross-cluster replication tool) fully replaces MirrorMaker in all production data centres
2021 Kafka Summit Americas: Lei Ye and Paul Davidson disclose fleet at 100+ clusters, 3+ trillion events/day
August 2024 Hyperforce perimeter telemetry pipeline published: Kafka + Spark Streaming + Druid, 60 TB/day
April 2025 Marketing Cloud 760-node zero-downtime migration into Ajna completed
July 2025 Agentforce AI agent audit pipeline published: 20 million model interactions/month over Kafka
May 2026 Conversational AI scaling post published: Kafka introduced for 30K+ concurrent conversations, targeting 100K

Salesforce's Kafka use cases

Salesforce's use cases cover the full breadth of what Kafka is typically used for in a large enterprise, from infrastructure telemetry to customer-facing products to AI pipelines.

Operational metrics and system health (project Ajna)

The DVA team built an internal Kafka-as-a-Service platform code-named Ajna to transport CPU metrics, machine reachability data, system logs, network flow data, application logs, JMX metrics, and custom database metrics across all global data centres to near-real-time dashboards. Ajna is the backbone from which most other Kafka use cases at Salesforce have grown.

Source: Nishant Gupta, Salesforce Engineering Blog

Log shipping pipeline

The Infrastructure group built a next-generation log pipeline on top of Ajna, routing logs from local per-data-centre Kafka clusters through cross-WAN replication to an aggregate cluster in the Secure Zone, and from there into DeepSea (an internal Hadoop/HDFS store) via a MapReduce consumer job called Kafka Camus. The target was 5-nines completeness; it eventually reached 7 nines.

Source: Sanjeev Sahu, Salesforce Engineering Blog

Platform Events and Change Data Capture

Salesforce's Platform Events and Change Data Capture products expose a time-ordered, immutable event stream to hundreds of thousands of multi-tenant customer orgs. Kafka's log-based storage model directly inspired the design: events are offset-based, durable, and replayable. Serialisation uses Apache Avro, with event definitions driven by Salesforce's metadata layer.

Source: Alexey Syomichev, Salesforce Engineering Blog

Real-time ML insights for Einstein Activity Capture

The Activity Platform ingests 100+ million customer interactions per day through Kafka. A chain of Kafka Streams jobs processes the stream: a Gatekeeper spam filter passes clean events to a series of extractors (TensorFlow-based, regex/keyword, Spark ML, and Einstein Conversation Insights), which feed a transformer and then a persistor that writes sales activity insights to Cassandra.

Source: Rohit Deshpande, Salesforce Engineering Blog

Service Cloud chatbot audit and debug

Apache Kafka on Heroku backs the chatbot event pipeline for Service Cloud, targeting tens of millions of debug events daily with up to 7-day retention. The pipeline applies five fault-tolerance patterns on top of Kafka, described further in the special techniques section below.

Source: Mark Holton, Salesforce Engineering Blog

Hyperforce perimeter telemetry

Salesforce's Hyperforce infrastructure generates 60 TB of raw perimeter telemetry daily from commercial CDNs. Spark Streaming normalises this data into protobuf format; Kafka then carries it to an Imply/Druid hypercube (18 dimensions, 13 measurements) serving real-time debugging dashboards for live-site incidents, latency analysis, and DDoS detection with sub-second query latency and 2-minute data freshness.

Source: Srinivas Ranganathan, Salesforce Engineering Blog

Conversational AI context storage for Agentforce

The Conversation Storage Service (CSS) uses Kafka for buffering, batching, and ordering conversational context that powers real-time AI systems — sentiment analysis, agent assist, and supervisor insights — across 50,000+ concurrent conversations at a peak ingestion rate of 30,000+ events per minute. The scaling target is 100,000 concurrent conversations.

Source: Ashima Kochar and Deepak Mali, Salesforce Engineering Blog

Agentforce AI agent audit trail

Kafka pub-sub ingestion handles the variable, spike-prone traffic from 20 million model interactions per month across 500+ enterprise customers, feeding a 30-day audit data retention pipeline backed by Salesforce Data Cloud and S3.

Source: Madhavi Kavathekar, Salesforce Engineering Blog

Cross-cluster replication

Salesforce replaced Apache MirrorMaker with its own open-source tool, Mirus, for cross-data-centre replication. Mirus is built on the Kafka Connect source connector API and has been in production since April 2018.

Source: Paul Davidson, Salesforce Engineering Blog

Scale and throughput

The fleet figures below are from the Kafka Summit Americas 2021 talk by Lei Ye and Paul Davidson, supplemented with per-system figures from individual engineering blog posts.

Metric Value Source
Total clusters 100+ Lei Ye and Paul Davidson, Kafka Summit Americas 2021
Total brokers 2,500+ (largest clusters: 250+ brokers each) Lei Ye and Paul Davidson, Kafka Summit Americas 2021
Topics 50,000+ Lei Ye and Paul Davidson, Kafka Summit Americas 2021
Partitions 300,000+ Lei Ye and Paul Davidson, Kafka Summit Americas 2021
Storage 40+ PB Lei Ye and Paul Davidson, Kafka Summit Americas 2021
Throughput 3+ trillion events/day (doubling annually) Lei Ye and Paul Davidson, Kafka Summit Americas 2021
Fleet availability Over 99.9% Lei Ye and Paul Davidson, Kafka Summit Americas 2021
Marketing Cloud cluster (pre-migration 2025) 760+ nodes, 12 clusters, 1 million messages/second, 15 TB/day Dheeraj Bansal and Ankit Jain, April 2025
Einstein Activity Capture 100+ million customer interactions/day Rohit Deshpande
Hyperforce telemetry 60 TB raw perimeter data/day Srinivas Ranganathan, August 2024
Agentforce AI 20 million model interactions/month, 2 billion predictions/month Madhavi Kavathekar, July 2025
Conversational AI (CSS) 30,000+ events/minute peak; scaling target of 100K concurrent conversations Ashima Kochar and Deepak Mali, May 2026
Chatbot pipeline Tens of millions of events/day, 7-day retention Mark Holton
Ajna fleet (2016 snapshot) 30+ clusters, ~300 Mbps aggregate throughput, hundreds of billions of events/day Nishant Gupta, July 2016

Salesforce's Kafka architecture

Ajna: Kafka-as-a-Service

Ajna is Salesforce's internal Kafka platform. Each production data centre runs an "Ajna Local" cluster for ingest. Data replicates over WAN to an "Ajna Aggregate" cluster in a Secure Zone (DMZ), from which consumers — Argus (time series), DeepSea (Hadoop/HDFS), and stream-processing applications — pull downstream. Before April 2018, replication used Apache MirrorMaker; since then it has used Mirus.

The platform runs on a hybrid deployment model: on-premises clusters orchestrated with Choria (self-healing automation) and public cloud clusters deployed on Kubernetes. Continuous automated patching is applied across the fleet.

Cross-site connectivity

Rather than using Kafka's advertised listeners over a VPN, Salesforce implemented a "Native Kafka Endpoint" using a standard load balancer fronting Envoy running in Layer 4 SNI-routing mode. This gives clients a stable single endpoint for cross-data-centre Kafka connections without VPN tunnels.

Source: Lei Ye and Paul Davidson, Kafka Summit Americas 2021

Topic design and schema management

Platform Events and Change Data Capture events use Apache Avro serialisation, with event definitions driven by Salesforce's metadata layer. For the Conversation Storage Service, messages are keyed by conversation ID, ensuring all events for a given conversation land in the same partition and preserve ordering for downstream AI systems.

Producer architecture

The Agentforce audit pipeline uses Kafka's pub-sub model to absorb bursty, business-hour traffic spikes before handing off to Data Cloud. Dynamic flow control mechanisms handle real-time traffic adjustment in that pipeline.

Consumer architecture

For the Conversation Storage Service, Salesforce applies curated consumer lag limits to maintain ordering guarantees for real-time AI workflows. Where consumer lag creates read-after-write consistency problems (at 50K concurrent conversations), an in-memory cache (VegaCache) bridges the gap rather than forcing changes to the Kafka consumer configuration.

Stream processing

The Einstein Activity Capture pipeline relies entirely on Kafka Streams for its ML processing chain. The application team chose Kafka Streams specifically because upgrades can be managed without coordinating with the infrastructure team — unlike the prior Storm-based pipeline.

Source: Rohit Deshpande, Salesforce Engineering Blog

Special techniques and engineering innovations

Mirus: custom cross-cluster replication

Mirus is Salesforce's open-source replacement for Apache MirrorMaker, built on the Kafka Connect source connector API. Each MirusSourceTask runs an independent KafkaConsumer/KafkaProducer pair for a subset of partitions, enabling higher throughput over internet links than MirrorMaker's single-producer model. A KafkaMonitor thread tracks partition changes dynamically, allowing configuration changes via the Connect REST API without a process restart. The tool includes custom JMX metrics for replication lag.

Source: Paul Davidson, engineering.salesforce.com and github.com/salesforce/mirus

Envoy as Layer 4 SNI-routing proxy

For cross-site Kafka connectivity, Salesforce runs Envoy in SNI-routing mode behind a standard load balancer, giving clients a stable single endpoint for cross-data-centre connections. This avoids the need to publish per-broker advertised listeners across WAN links.

Conversation-level Kafka partitioning

For the Conversation Storage Service, messages are keyed by conversation ID, ensuring all events for a given conversation land in the same partition and preserve ordering for downstream AI systems that need to reason over conversational context in sequence.

Fault-tolerant chatbot pipeline patterns

The Service Cloud chatbot pipeline layers five fault-tolerance patterns on top of Kafka: a per-endpoint circuit breaker (OPEN/HALF_OPEN/CLOSED states), a bulkhead (concurrent request limits per endpoint), timeout-plus-retry (maximum 6 attempts over 16 hours), UUID-per-event idempotency with upsert semantics, and Kafka partition replication across availability zones.

Source: Mark Holton, Salesforce Engineering Blog

SSL contribution to open-source Kafka

Salesforce's DVA team contributed SSL support to the Kafka 0.8.2 series before it was part of the mainline project, and later advocated for per-topic throttling at the project level.

Source: Nishant Gupta, Salesforce Engineering Blog

4-byte embedded hash ID for deduplication

Because the log pipeline uses "at least once" delivery semantics, a 4-byte hashed unique ID is embedded in each log message so downstream consumers can detect and drop duplicates on replay.

Source: Sanjeev Sahu, Salesforce Engineering Blog

Operating Kafka at scale

Deployment model: Hybrid. On-premises clusters use Choria for automated patching and self-healing. Public cloud clusters run on Kubernetes. Continuous automated patching applies across the entire fleet.

Monitoring and observability stack:

Tool Role
Cruise Control (LinkedIn open-source) Automated cluster workload rebalancing
Xinfra Monitor (LinkedIn open-source) Synthetic end-to-end availability and latency checks via synthetic workloads
Argus (Salesforce open-source) OpenTSDB-based time series monitoring and alerting for Kafka and downstream systems
Splunk Log analysis for Kafka operations
Wrangler (Salesforce internal) Cross-cluster UI for topic management and fleet-wide visibility
Fidelity Assessment Service (internal) Tracks log pipeline completeness end-to-end; results surfaced in Einstein Analytics dashboards

Source: Lei Ye and Paul Davidson, Kafka Summit Americas 2021

Automated recovery: Salesforce built a Kafka Auto-Fix Operations Tool that detects offline or under-replicated partitions, frozen brokers, and disk failures. It identifies the most advanced replica, then runs leader election and partition reassignment automatically. The target mean time to recovery is approximately 2 minutes with a zero data loss guarantee.

Fault injection testing: The infrastructure team uses a bespoke testing framework comprising a Test Coordinator (control plane web service), Load Generator (simulates producer/consumer traffic), Ops Tools (cluster configuration), a Fault Injection Framework built on Apache Trogdor and Kibosh, and a Result Analysis component (log and metric tracking). Test scenarios cover functional, performance, load, scale, upgrade/downgrade, and patching runs on deterministic infrastructure.

Migration monitoring: During the 2025 Marketing Cloud migration, the team ran 15 custom monitoring dashboards (cluster-level to node-level) with 24/7 automated alerting throughout.

Source: Dheeraj Bansal and Ankit Jain, Salesforce Engineering Blog

Challenges and how they solved them

MirrorMaker instability on WAN replication

Problem: Apache MirrorMaker 0.8.x was unstable and required a full process restart for any configuration change. Its single-producer model created throughput bottlenecks on internet-based WAN links between data centres.

Root cause: MirrorMaker's architecture — one process per cluster pair, one producer per process — could not scale to Salesforce's cross-data-centre replication volumes.

Solution: Salesforce built and open-sourced Mirus, a Kafka Connect-based tool with multiple independent consumer-producer pairs per worker process and dynamic REST API configuration.

Outcome: Mirus fully replaced MirrorMaker across all production data centres from April 2018.

Source: Paul Davidson, Salesforce Engineering Blog

Log pipeline completeness: from 25% to seven nines

Problem: Between December 2016 and September 2017, the log shipping pipeline could not reliably deliver logs to DeepSea. Completeness was as low as 25% at the start of the programme.

Root causes: Six distinct issues were identified: multi-threaded consumer logic couldn't handle dynamic volume changes; MapReduce fault tolerance was set too low; Logstash had file-skipping and zombie-state bugs; MirrorMaker batch size (~1 MB) caused batch rejections; and "at least once" semantics produced duplicates.

Solutions: Multi-threaded consumers with dynamic volume adjustment; MapReduce fault tolerance raised to 50%; Logstash reconfigured to read from the beginning; a novel buffer replay for zombie states; MirrorMaker batch size cut from ~1 MB to 250 KB; a 4-byte embedded hash ID per log record for deduplication.

Outcome: Completeness reached 99.99999% (7 nines) by September 2017 and has been sustained consistently since.

Source: Sanjeev Sahu, Salesforce Engineering Blog

Marketing Cloud migration: 760 nodes, zero downtime

Problem: Marketing Cloud ran its own Kafka fleet — 760+ nodes, 12 clusters, 1 million messages/second — on CentOS 7 with a different Kafka version and authentication mechanism from the central Ajna platform (RHEL 9). Unifying it into Ajna required a zero-downtime migration under live production traffic.

Root cause: Accumulated divergence between Marketing Cloud's self-managed Kafka and the centralised Ajna system over time.

Solution: An automated orchestration pipeline with pre-, in-flight, and post-validation checks; disk-level checksum validation before and after each step; rack-aware replica placement across separate physical racks; phased node rollout with health checks; 15 custom monitoring dashboards; and synthetic data validation post-migration with baseline signature matching.

Outcome: Zero-downtime migration completed; Marketing Cloud Kafka unified into Ajna.

Source: Dheeraj Bansal and Ankit Jain, Salesforce Engineering Blog

Read-after-write consistency at 50K concurrent conversations

Problem: At 50,000 concurrent conversations, Kafka consumers lagged behind producers, causing AI workflows to read stale conversational context immediately after a write.

Root cause: Kafka's durability-vs-latency trade-off: freshly written messages were not yet visible to consumers by the time the AI system queried for the latest state.

Solution: VegaCache, an in-memory cache, was introduced to serve recent writes directly, bypassing the consumer lag for latency-sensitive reads while keeping Kafka as the durable ordered source of truth.

Outcome: Read-after-write consistency maintained at scale.

Source: Ashima Kochar and Deepak Mali, Salesforce Engineering Blog

Agentforce audit: bursty traffic and Data Cloud ingestion mismatch

Problem: AI agent traffic is highly bursty (business-hour spikes); Data Cloud's ingestion API expected customer-owned S3 buckets, but Agentforce needed Salesforce-controlled S3 buckets — a file-size and ownership mismatch that blocked the integration.

Root cause: Architectural mismatch between Data Cloud's design assumptions and Agentforce's multi-tenant security model.

Solution: Kafka's pub-sub model absorbs traffic spikes before handoff to Data Cloud; dynamic flow control mechanisms handle real-time traffic adjustment; iterative proof-of-concept work resolved the S3 ownership problem.

Outcome: The system supports 500+ enterprise customers with 20 million model interactions per month under a 30-day contractual audit retention window.

Source: Madhavi Kavathekar, Salesforce Engineering Blog

Full tech stack

Category Tools Notes
Message broker Apache Kafka Kafka 2.6 across main fleet (2021); early versions 0.8.2 and 0.9
Cross-cluster replication Mirus (Salesforce open-source), Apache MirrorMaker (legacy) Mirus replaced MirrorMaker in all data centres from April 2018; built on Kafka Connect source connector API
Schema registry / serialisation Apache Avro Used for Platform Events, Change Data Capture, and early monitoring pipeline
Stream processing Kafka Streams, Apache Spark Streaming Kafka Streams for Einstein Activity Capture ML chain; Spark Streaming for Hyperforce telemetry normalisation
Connectors Kafka Connect (Mirus source connector), Kafka Camus Camus is a MapReduce job consuming from Ajna Aggregate to DeepSea (HDFS)
Cluster coordination Apache ZooKeeper 60+ ZooKeeper nodes in Marketing Cloud fleet alone before 2025 migration
Proxy / networking Envoy Layer 4 SNI-routing proxy for cross-site Native Kafka Endpoint
Monitoring Cruise Control, Xinfra Monitor, Argus, Splunk, Wrangler, Datadog Vector Cruise Control for workload rebalancing; Argus is OpenTSDB-based; Wrangler is Salesforce's internal cross-cluster UI; Vector collects CDN telemetry
Deployment / infra Choria (on-premises), Kubernetes (public cloud) Choria provides self-healing and automated patching for on-premises clusters
ML inference TensorFlow, Spark ML Both used in Einstein Activity Capture extractor chain
Container orchestration (app layer) Nomad (migrated from DC/OS) Used by Einstein Activity Capture pipeline
Storage sinks Apache Hadoop/HDFS (DeepSea), Cassandra, S3 (AWS) DeepSea for log retention; Cassandra for Einstein insights; S3 for Agentforce audit and Mirus replication target
Real-time analytics Imply / Apache Druid 18-dimension, 13-measurement hypercube for Hyperforce telemetry; sub-second query latency
Caching VegaCache (Salesforce internal) In-memory cache for read-after-write consistency in Conversation Storage Service
Log processing Logstash Runs on each host, publishing to local Ajna Kafka cluster
Fault injection Trogdor, Kibosh Foundation for Salesforce's bespoke fault injection testing framework
Database PostgreSQL Configuration storage in Einstein pipeline; initial database for Conversation Storage Service before Kafka introduction

Key contributors

Name Role Contribution
Nishant Gupta Sr. Director of Engineering, DVA team Authored "Expanding Visibility with Apache Kafka"; presented Ajna platform publicly in July 2016
Rajasekar Elango Lead Developer, Monitoring and Management Team Presented Secure Kafka at Salesforce at Kafka Meetup, June 2014
Alexey Syomichev Engineer, Salesforce Authored "How Apache Kafka Inspired Our Platform Events Architecture"
Sanjeev Sahu Engineer, Infrastructure Authored "Our Journey to a Near Perfect Log Pipeline"
Rohit Deshpande Engineer, Activity Platform / Einstein Authored "Real-time Einstein Insights Using Kafka Streams"; contributed three Kafka Improvement Proposals (KIPs) to the Apache Kafka project
Paul Davidson Software Architect, Kafka Platform Original developer of Mirus; co-presented at Kafka Summit Americas 2021; authored "Open Sourcing Mirus"
Lei Ye Principal Software Engineer, Infrastructure Co-presented "Tales from the Four-Comma Club" at Kafka Summit Americas 2021
Mark Holton Engineer, Service Cloud Authored both chatbot pipeline blog posts
Dheeraj Bansal Principal Member of Technical Staff Led the Marketing Cloud 760-node Kafka migration (April 2025)
Ankit Jain Senior Director of Software Engineering, Platform Architecture and Distributed Systems Co-authored the Marketing Cloud migration post
Srinivas Ranganathan Director of Software Engineering Authored the Hyperforce perimeter telemetry pipeline post (August 2024)
Madhavi Kavathekar Director of Software Engineering Authored the Agentforce AI agent auditing systems post (July 2025)
Ashima Kochar Lead Software Engineer, Service Cloud Lead author on the conversational AI scaling post (May 2026)
Deepak Mali Engineer, Service Cloud Co-author on the conversational AI scaling post (May 2026)

Key takeaways for your own Kafka implementation

  • Build for replication from the start. Salesforce outgrew Apache MirrorMaker's single-producer model once WAN replication volumes increased. If cross-datacenter or cross-region replication is in your roadmap, evaluate whether your replication tool can handle dynamic configuration changes without a process restart and whether its throughput model scales with your partition count.
  • Log pipeline completeness requires end-to-end measurement. Salesforce's log pipeline sat at 25% completeness for months before a dedicated completeness-tracking service (the Fidelity Assessment Service) made the problem visible end-to-end. Instrumenting each stage independently is insufficient; you need a view that correlates events from ingest to sink.
  • "At least once" delivery requires an idempotency strategy. At Salesforce's scale, duplicates from at-least-once delivery are not hypothetical edge cases. Embedding a hashed unique ID per message and using upsert semantics at the consumer is a concrete, low-overhead approach that avoids the complexity of exactly-once transaction semantics in many pipeline architectures.
  • Consumer lag is a first-class consistency concern for stateful workloads. The Conversation Storage Service's read-after-write consistency problem is common when Kafka is used as a state transport for AI or event-sourcing workloads. An in-memory cache for recent writes, rather than tightening consumer lag SLOs on the Kafka cluster itself, can resolve the issue with less operational risk.
  • Invest in automated recovery tooling before you need it. Salesforce built its Kafka Auto-Fix Operations Tool to achieve a ~2-minute MTTR for partition and broker failures. At 2,500+ brokers, manual recovery runbooks cannot achieve this. Automating leader election and partition reassignment for predictable failure modes is worth the investment before the fleet reaches a size where manual intervention becomes the bottleneck.

Sources and further reading

# Source Author Date
1 Expanding Visibility with Apache Kafka Nishant Gupta 2016
2 How Apache Kafka Inspired Our Platform Events Architecture Alexey Syomichev 2018
3 Real-time Einstein Insights Using Kafka Streams Rohit Deshpande 2019
4 Our Journey to a Near Perfect Log Pipeline Sanjeev Sahu December 2017
5 Open Sourcing Mirus Paul Davidson October 2017
6 Building a Scalable Event Pipeline with Heroku and Salesforce Mark Holton 2020
7 Building a Fault-Tolerant Data Pipeline for Chatbots Mark Holton 2020
8 Inside Marketing Cloud's Kafka Cluster Upgrade: Migrating 760 Nodes with 1,000,000 Messages/Second Dheeraj Bansal, Ankit Jain April 2025
9 Architecting AI Agent Auditing Systems in Agentforce Madhavi Kavathekar July 2025
10 Scaling AI-Driven Conversations from 10K to 100K While Maintaining Real-Time Consistency Ashima Kochar, Deepak Mali May 2026
11 Unlocking Real-Time Insights: Engineering the Hyperforce Experience Srinivas Ranganathan August 2024
12 Tales from the Four-Comma Club — Managing Kafka as a Service at Salesforce Lei Ye, Paul Davidson Kafka Summit Americas 2021
13 Enabling Real-Time Scenarios at Scale Using Kafka Nishant Gupta July 2016
14 Secure Kafka at Salesforce.com Rajasekar Elango June 2014
15 Mirus — GitHub Paul Davidson (primary maintainer) Last release June 2021

If you are running Kafka at scale and want visibility into consumer lag, partition health, and broker performance across your clusters, Kpow is a Kafka management and monitoring tool built for platform and data engineering teams. You can try it free for 30 days and connect it to any Kafka cluster in minutes via Docker, Helm, or JAR.