Abstract digital artwork featuring smooth, overlapping curved shapes in shades of green and blue on a black background.

Apache Kafka 4.3.0: A guide for platform engineers

Table of contents

Factor House
May 23rd, 2026
xx min read

Kafka 4.3.0 was released on 22 May 2026 with 25 KIPs and 600+ commits since 4.2.0. If you manage Kafka in production, the release notes cover what changed in full. This article focuses on which of those changes are likely to affect your upgrade window, your monitoring setup, and the operational issues that have been quietly causing friction.

This is primarily an operator's release. The headline features are broker cordoning, partition size metrics, share group tuning, classic protocol deprecation, and a set of tiered storage improvements. These changes address the operational experience of running Kafka at scale rather than developer-facing functionality — the kind of work that accumulates in Jira tickets like "decommission broker without manual coordination" and "surface retention headroom per partition in monitoring."

This article walks through the operationally significant changes, calls out the deprecations that deserve a calendar entry, and closes with a checklist for teams preparing an upgrade.

Broker and log directory cordoning — KIP-1066

The new cordoned.log.dirs configuration marks a log directory as off-limits for new partition replica placement. The broker stays up; existing replicas continue serving reads and writes. New partition assignments route around the cordoned directory. Kafka's 4.3 operations documentation covers the full decommissioning workflow.

Before 4.3, draining a broker or a specific disk required careful manual coordination: partition reassignment scripts, watching ISR (in-sync replica) lag, and accepting more operational risk than most teams wanted because the tooling made it genuinely awkward. Cordoning does not solve every decommissioning problem, but it addresses the most friction-heavy part: stopping the cluster from making the situation worse while you work.

The workflow now looks like:

  1. Cordon the log directory (or broker, via cordoned.log.dirs)
  2. Let the cluster route new placements elsewhere; no new replicas land on that target
  3. Drain existing replicas at your own pace using partition reassignment
  4. Decommission once drained

This is particularly useful in cloud-native environments where disk replacement or node cycling is a routine event rather than an incident. Teams running Kubernetes-based Kafka deployments will find this materially reduces the operational surface area of a node drain.

Test this in staging first. Understand the interaction with your replica placement strategy and rack-awareness configuration before rolling it to production.

Partition size percentage metrics — KIP-1257

Retention headroom has always been one of the more tedious Kafka metrics to derive. The raw data exists (log size, retention bytes), but assembling per-partition headroom from JMX required either custom tooling or cobbled-together dashboards that most teams quietly abandoned.

KIP-1257 fixes this directly. New JMX metrics expose what percentage of maximum retention each topic-partition currently occupies. This gives you a first-class signal for:

  • Identifying partitions approaching retention thresholds before they cause data loss or unexpected compaction behaviour
  • Setting meaningful alerts on retention pressure without writing custom collectors
  • Spotting topics where retention config is badly misaligned with actual throughput

This metric belongs in every Kafka operations dashboard and is a straightforward addition once you upgrade.

For teams who want these metrics surfaced alongside consumer lag, broker health, and throughput in a single view, Kpow by Factor House ingests JMX and Kafka-native metrics and will pick up these new partition size indicators without additional configuration.

Share groups: real operational handles — KIP-1240 + KIP-1263

Share groups (Kafka's queue-semantics consumer model, introduced in 4.0) arrive in 4.3 with the operational tuning options that early adopters have been waiting for.

KIP-1240 adds new broker-level and group-level configurations to control share group behaviour. The specifics depend on your use case, but the significance is structural: share groups now have the configuration surface area you would expect from a production-grade feature. Teams that adopted early on 4.0 or 4.1 and found themselves with insufficient levers should review the new configs in the 4.3 documentation.

KIP-1263 is quieter but arguably more impactful for clusters with high consumer group churn. The group coordinator's assignment logic is improved to avoid recomputing assignments when not necessary. If you have environments with frequent consumer restarts, scaling events, or high group cardinality, this directly reduces coordinator overhead. It is not the kind of change that appears in a benchmark, but it smooths out the coordinator's CPU profile in a busy cluster.

Where share groups stand: 4.3 is a meaningful maturation milestone. If your team is on 4.2 and has not started with share groups yet, this release is a good prompt to re-evaluate the roadmap. If you are on 4.0 or 4.1 and already running share groups, upgrade and take the new configs seriously. If you were waiting for the project to stabilise before committing, the pace of development across recent releases gives a reasonable basis for planning.

Classic rebalance protocol deprecation — KIP-1251 + KIP-1274

If your consumers are still using the classic rebalance protocol, 4.3 introduces Phase 1 of the deprecation process: warning only, with no removal yet. The direction is unambiguous.

When a consumer starts with the classic rebalance protocol, 4.3 logs a recommendation to migrate. This is the precursor to a formal deprecation signalled for a future release. The group.coordinator.rebalance.protocols broker configuration is also deprecated in this release, marked for removal in Kafka 5.0.

KIP-1251 is the complementary improvement: member epoch validation logic is tightened to reduce unnecessary fencing of group members. Concretely, this means fewer spurious rebalances triggered by epoch mismatches during consumer group stabilisation. If you have been seeing rebalance noise in stable groups, this is worth tracking after upgrading.

Action for operators: Audit your consumer configurations now. Identify any consumer groups still using the classic protocol and build a migration timeline. The cooperative protocol has been the correct choice for new consumers since 2.4. Teams that defer this work until Kafka 5.0 will have less time to address it comfortably.

Tiered storage: safer and smarter — KIP-1023 + KIP-1208 + KIP-1235

Three tiered storage changes in 4.3: two operational and one correctness fix that deserves immediate attention.

KIP-1235 — check this first if you run tiered storage. The default min.insync.replicas (ISR) for the internal __remote_log_metadata topic was incorrect and did not align with what you would expect for a durable internal topic. 4.3 introduces a dedicated remote.log.metadata.topic.min.isr configuration to set this explicitly. If you are running tiered storage today, check your current __remote_log_metadata topic ISR configuration and align it with remote.log.metadata.topic.min.isr. This is a correctness issue rather than a tuning recommendation.

KIP-1023 introduces follower.fetch.last.tiered.offset.enable (default: false). When enabled, bootstrapping a new follower starts from the last tiered offset rather than the beginning of the local log. The practical effect: new brokers joining a cluster with tiered storage sync faster, and leader amplification during follower bootstrap is reduced. This is directly relevant to teams scaling clusters with large tiered storage configurations.

KIP-1208 adds the remote.log.metadata.admin. prefix for configuring the admin client used by tiered storage's RemoteLogMetadataManager. This is a configuration hygiene improvement that makes the namespace cleaner and the scoping more explicit.

Taken together, tiered storage in 4.3 is safer to run (KIP-1235), faster to scale (KIP-1023), and better configured (KIP-1208).

KRaft fetch size controls — KIP-1219

New configurations: controller.quorum.fetch.max.bytes and controller.quorum.fetch.snapshot.max.bytes.

These cap the amount of data retrieved by KRaft Fetch and FetchSnapshot requests respectively. For most teams, these will never need to be changed from default. If you run large clusters where the KRaft controller shares resources with broker workloads, or where snapshot fetch operations have caused controller memory spikes, these controls give you a lever that did not previously exist.

This is a niche but real pain point for teams running high-metadata-throughput clusters. If you have seen controller memory pressure during snapshot fetches, the relevance will be apparent.

OAuth client assertions — KIP-1258

Support for client assertion authentication in the client_credentials OAuth grant type.

This is a security posture improvement for teams using OAuth with enterprise identity providers (Okta, Azure Active Directory, and similar) that prefer or require signed client assertions over shared secrets. If your Kafka security implementation already uses client_credentials and you have been working around the absence of assertion support, 4.3 resolves it natively. If this does not describe your environment, it is unlikely to affect your upgrade decision.

Kafka Streams: what changed for stream processors

Not a Streams-heavy release, but several targeted improvements are worth cataloguing for teams running Streams in production.

KIP-1259: Automatic local state cleanup on startup. The new state.cleanup.dir.max.age.ms configuration tells Streams to delete state directories that have not been modified within the specified duration on startup. This addresses a genuine hygiene problem in containerised Streams deployments where stale state accumulates across restarts and pod cycling. Previously, cleaning this required external intervention or custom initialisation logic. If you run Streams in Kubernetes or any environment with ephemeral containers, add this to your config review.

KIP-1271 + KIP-1285: Headers in state stores. Record headers can now be stored in and retrieved from state stores, exposed through both the Processor API (KIP-1271) and the DSL (KIP-1285). Teams using headers to carry routing metadata, tracing context, or correlation IDs will find this closes a meaningful gap — previously those headers had to be stripped or handled separately before writing to state.

KIP-1270: ProcessingExceptionHandler for GlobalKTable threads. Exception handling now extends to GlobalKTable threads via the processing.exception.handler.global.enabled configuration. Before 4.3, exceptions in the global thread could cause opaque failures that were difficult to surface cleanly. This is a reliability improvement for any topology that uses global state.

KIP-1035: StateStore changelog offset management. Adds methods to the StateStore API for changelog offset management. This is an internal change primarily relevant to custom StateStore implementations. If you maintain a custom state store, review the updated API.

KIP-1247: Bytes utility class promoted to public API. The Bytes class is now part of the public API and will appear in the Javadoc. Minor, but useful for teams that have been importing it from internal packages.

KIP-1250: In-memory state store size metrics. New metrics tracking the number of keys in in-memory state stores. If your Streams topologies use in-memory stores and you have limited visibility into store growth, this is a monitoring addition worth wiring up.

KIP-1244: streams-scala module deprecated. If your team uses the Scala DSL for Kafka Streams, 4.3 is the start of the deprecation clock. Marked for removal in Kafka 5.0. Plan migration now rather than at the 5.0 deadline.

Kafka Connect

KIP-1239: RemoteClusterUtils.translateOffsets() now accepts multiple consumer groups in a single call. For teams running MirrorMaker 2 across clusters with many groups, this is a meaningful quality-of-life improvement that reduces the number of round trips required for offset translation workflows.

KIP-1273: A new ConnectPlugin interface standardises methods across all Kafka Connect plugin types. Teams building or maintaining custom connectors should review this — it establishes a common contract that all plugin types now implement.

KIP-1280: MirrorMaker metric names are being updated. The new metric.names.formats configuration on MirrorSourceConnector and MirrorCheckpointConnector lets you opt into the new metric names. Existing metric names are deprecated and will be removed in Kafka 5.0. If you have dashboards or alerts built on MirrorMaker metrics, schedule the migration before 5.0.

Deprecation roundup

Everything deprecated in 4.3.0 in one place. Set calendar reminders. Kafka 5.0 removals tend to arrive faster than teams expect.

WhatKIPRemoval targetstreams-scala moduleKIP-1244Kafka 5.0group.coordinator.rebalance.protocols broker configKIP-1237Kafka 5.0Existing MirrorMaker metric namesKIP-1280Kafka 5.0Classic consumer rebalance protocolKIP-1274 (Phase 1 — warning only)Future release

The classic rebalance protocol deprecation is Phase 1 only in 4.3: no removal yet, just logging. The 5.0 items are the ones that need concrete action.

Upgrade readiness checklist

A practical checklist for teams preparing a 4.3.0 upgrade. Work through this before scheduling your maintenance window.

  • [ ] Read the official 4.3.0 upgrade notes — specifically any version-specific upgrade requirements
  • [ ] Audit consumer configurations — identify any groups still configured with the classic rebalance protocol and build a migration timeline
  • [ ] If running tiered storage: check your current __remote_log_metadata topic ISR configuration and align with the new remote.log.metadata.topic.min.isr config
  • [ ] If tiered storage with large follower bootstrap latency is a concern: evaluate follower.fetch.last.tiered.offset.enable
  • [ ] Update MirrorMaker dashboards and alerts to account for new metric names via metric.names.formats — plan migration before 5.0
  • [ ] Identify any streams-scala usage and scope a migration plan if applicable
  • [ ] Wire up new partition size percentage metrics in your monitoring stack
  • [ ] Test the broker and log directory cordoning workflow in staging before relying on it in production decommissioning
  • [ ] Review share group configurations and new tuning options in 4.3 if you are running or evaluating share groups
  • [ ] Remove any references to the deprecated group.coordinator.rebalance.protocols broker config from your configuration management

The operator signal

Kafka 4.3.0 continues a pattern visible across the 4.x series. The feature set is not about new consumer semantics or Streams DSL additions — it is about the operational experience of running Kafka at scale. Broker cordoning, retention headroom metrics, coordinator performance, tiered storage correctness, KRaft fetch controls: these are the kinds of improvements that accumulate into a platform that operations engineers can rely on.

Teams evaluating Kafka adoption or upgrade timing should note that 4.x has been consistently improving the operational story with each release. 4.3.0 continues that pattern.

Keeping up with a release cadence like this is easier with good cluster visibility. If you are upgrading to 4.3 and want to make the most of the new metrics and operational improvements, Kpow by Factor House gives you an observability layer that covers partition size headroom, consumer group health, tiered storage state, and broker-level metrics in a single interface. You can try it for yourself with a free 30-day trial.

Resources