The unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, meaning one license will unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
The new unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, so you only need one license to unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
What's changing
Previously, we issued separate community licenses for Kpow and Flex, with different tiers for individuals and organisations. Now, there's just one single Community License that unlocks both products.
What's new:
One license for both products
Three environments for everyone - whether you're an individual developer or part of a team, you get three non-production installations per product
Simplified management - access and renew your licenses through our new self-service portal at account.factorhouse.io
Our commitment to the engineering community
Since first launching Kpow CE at Current '22, thousands of engineers have used our community licenses to learn Kafka and Flink without jumping through enterprise procurement hoops. This unified license keeps that same philosophy: high-quality tools that are free for non-production use.
The Factor House Community License is free for individuals and organizations to use in non-production environments. It's perfect for:
New users: Head to account.factorhouse.io to grab your free Community license. You'll receive instant access via magic link authentication.
Existing users: Your legacy Kpow and Flex Community licenses will continue to work and are now visible in the portal. When your license renews (after 12 months), consider switching to the unified model for easier management.
What's included
Both Kpow CE and Flex CE include most enterprise features, optimized for learning and testing. Includes Kafka and Flink monitoring and management, fast multi-topic search, and Schema registry and Kafka Connect support.
License duration: 12 months, renewable annually
Installations: Up to 3 per product (Kpow CE: 1 Kafka cluster + 1 Schema Registry + 1 Connect cluster per installation; Flex CE: 1 Flink cluster per installation)
Support: Self-service via Factor House Community Slack, documentation, and release notes
Deployment: Docker, Docker Compose or Kubernetes
Ready for production? Start a 30-day free trial of our Enterprise editions directly from the portal to unlock RBAC, Kafka Streams monitoring, custom SerDes, and dedicated support.
What about legacy licenses?
If you're currently using a Kpow Individual, Kpow Organization, or Flex Community license, nothing changes immediately. Your existing licenses will continue to work with their respective products and are now accessible in the portal. When your license expires at the end of its 12-month term, you can easily switch to the new unified license for simpler management.
The unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, meaning one license will unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
The new unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, so you only need one license to unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
What's changing
Previously, we issued separate community licenses for Kpow and Flex, with different tiers for individuals and organisations. Now, there's just one single Community License that unlocks both products.
What's new:
One license for both products
Three environments for everyone - whether you're an individual developer or part of a team, you get three non-production installations per product
Simplified management - access and renew your licenses through our new self-service portal at account.factorhouse.io
Our commitment to the engineering community
Since first launching Kpow CE at Current '22, thousands of engineers have used our community licenses to learn Kafka and Flink without jumping through enterprise procurement hoops. This unified license keeps that same philosophy: high-quality tools that are free for non-production use.
The Factor House Community License is free for individuals and organizations to use in non-production environments. It's perfect for:
New users: Head to account.factorhouse.io to grab your free Community license. You'll receive instant access via magic link authentication.
Existing users: Your legacy Kpow and Flex Community licenses will continue to work and are now visible in the portal. When your license renews (after 12 months), consider switching to the unified model for easier management.
What's included
Both Kpow CE and Flex CE include most enterprise features, optimized for learning and testing. Includes Kafka and Flink monitoring and management, fast multi-topic search, and Schema registry and Kafka Connect support.
License duration: 12 months, renewable annually
Installations: Up to 3 per product (Kpow CE: 1 Kafka cluster + 1 Schema Registry + 1 Connect cluster per installation; Flex CE: 1 Flink cluster per installation)
Support: Self-service via Factor House Community Slack, documentation, and release notes
Deployment: Docker, Docker Compose or Kubernetes
Ready for production? Start a 30-day free trial of our Enterprise editions directly from the portal to unlock RBAC, Kafka Streams monitoring, custom SerDes, and dedicated support.
What about legacy licenses?
If you're currently using a Kpow Individual, Kpow Organization, or Flex Community license, nothing changes immediately. Your existing licenses will continue to work with their respective products and are now accessible in the portal. When your license expires at the end of its 12-month term, you can easily switch to the new unified license for simpler management.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Release 95.1: A unified experience across product, web, docs and licensing
95.1 delivers a cohesive experience across Factor House products, licensing, and brand. This release introduces our new license portal, refreshed company-wide branding, a unified Community License for Kpow and Flex, and a series of performance, accessibility, and schema-related improvements.
Upgrading to 95.1 If you are using Kpow with a Google Managed Service for Apache Kafka (Google MSAK) cluster, you will now need to use either kpow-java17-gcp-standalone.jar or the 95.1-temurin-ubi tag of the factorhouse/kpow Docker image.
New Factor House brand: unified look across web, product, and docs
We've refreshed the Factor House brand across our website, documentation, the new license portal, and products to reflect where we are today: a company trusted by engineers running some of the world's most demanding data pipelines. Following our seed funding earlier this year, we've been scaling the team and product offerings to match the quality and value we deliver to enterprise engineers. The new brand brings our external presence in line with what we've built. You'll see updated logos in Kpow and Flex, refreshed styling across docs and the license portal, and a completely redesigned website with clearer navigation and information architecture. Your workflows stay exactly the same, and the result is better consistency across all touchpoints, making it easier for new users to evaluate our tools and for existing users to find what they need.
New license portal: self-service access for all users
We've rolled out our new license portal at account.factorhouse.io, to streamline license management for everyone. New users can instantly grab a Community or Trial license with just their email address, and existing users will see their migrated licenses when they log in. The portal lets you manage multiple licenses from one account, all through a clean, modern interface with magic link authentication. This could be upgrading from Community to a Trial, renewing your annual Community License, or requesting a trial extension. For installation and configuration guidance, check our Kpow and Flex docs.
We've consolidated our Community licensing into a single unified license that works with both Kpow Community Edition and Flex Community Edition. Your Community license allows you to run Kpow and Flex in up to three non-production environments each, making it easier to learn, test, and build with Kafka and Flink. The new licence streamlines management, providing a single key for both products and annual renewal via the licence portal. Perfect for exploring projects like Factor House Local or building your own data pipelines. Existing legacy licenses will continue to work and will also be accessible in the license portal.
This release brings in a number of performance improvements to Kpow, Flex and Factor Platform. The work to compute and materialize views and insights about your Kafka or Flink resources has now been decreased by an order of magnitude. For our top-end customers we have observed a 70% performance increase in Kpow’s materialization.
Data Inspect enhancements
Confluent Data Rules support: Data inspect now supports Confluent Schema Registry Data Rules, including CEL, CEL_FIELD, and JSONata rule types. If you're using Data Contracts in Confluent Cloud, Data Inspect now accurately identifies rule failures and lets you filter them with kJQ.
Support for Avro Primitive Types: We’ve added support for Avro schemas that consist of a plain primitive type, including string, number, and boolean.
Schema Registry & navigation improvements
General Schema Registry improvements (from 94.6): In 94.6, we introduced improvements to Schema Registry performance and updated the observation engine. This release continues that work, with additional refinements based on real-world usage.
Karapace compatibility fix: We identified and fixed a regression in the new observation engine that affected Karapace users.
Redpanda Schema Registry note: The new observation engine is not compatible with Redpanda’s Schema Registry. Customers using Redpanda should set `OBSERVATION_VERSION=1` until full support is available.
Navigation improvements: Filters on the Schema Overview pages now persist when navigating into a subject and back.
Chart accessibility & UX improvements
This release brings a meaningful accessibility improvement to Kpow & Flex: Keyboard navigation for line charts. Users can now focus a line chart and use the left and right arrow keys to view data point tooltips. We plan to expand accessibility for charts to include bar charts and tree maps in the near future, bringing us closer to full WCAG 2.1 Level AA compliance as reported in our Voluntary Product Accessibility Template (VPAT).
We’ve also improved the UX of comparing adjacent line charts: Each series is now consistently coloured across different line charts on a page, making it easier to identify trends across a series, e.g., a particular topic’s producer write/s vs. consumer read/s.
These changes benefit everyone: developers using assistive technology, teams with accessibility requirements, and anyone who prefers keyboard navigation. Accessibility isn't an afterthought, it's a baseline expectation for enterprise-grade tooling, and we're committed to leading by example in the Kafka and Flink ecosystem.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Streamline your Kpow deployment on Amazon EKS with our guide, fully integrated with the AWS Marketplace. We use eksctl to automate IAM Roles for Service Accounts (IRSA), providing a secure integration for Kpow's licensing and metering. This allows your instance to handle license validation via AWS License Manager and report usage for hourly subscriptions, enabling a production-ready deployment with minimal configuration.
This guide provides a comprehensive walkthrough for deploying Kpow, a powerful toolkit for Apache Kafka, onto an Amazon EKS (Elastic Kubernetes Service) cluster. We will cover the entire process from start to finish, including provisioning the necessary AWS infrastructure, deploying a Kafka cluster using the Strimzi operator, and finally, installing Kpow using a subscription from the AWS Marketplace.
The guide demonstrates how to set up both Kpow Annual and Kpow Hourly products, highlighting the specific integration points with AWS services like IAM for service accounts, ECR for container images, and the AWS License Manager for the annual subscription. By the end of this tutorial, you will have a fully functional environment running Kpow on EKS, ready to monitor and manage your Kafka cluster.
The source code and configuration files used in this guide can be found in the features/eks-deployment folder of this GitHub repository.
About Factor House
Factor House is a leader in real-time data tooling, empowering engineers with innovative solutions for Apache Kafka® and Apache Flink®.
Our flagship product, Kpow for Apache Kafka, is the market-leading enterprise solution for Kafka management and monitoring.
VPC: A Virtual Private Cloud (VPC) that has both public and private subnets is required.
IAM Permissions: A user with the necessary IAM permissions to create an EKS cluster with a service account.
Kpow Subscription:
A subscription to a Kpow product through the AWS Marketplace is required. After subscribing, you will receive access to the necessary components and deployment instructions.
The specifics of accessing the container images and Helm chart depend on the chosen Kpow product:
Kpow Annual product:
Subscribing to the annual product provides access to the ECR (Elastic Container Registry) image and the corresponding Helm chart.
Kpow Hourly product:
For the hourly product, access to the ECR image will be provided and deployment utilizes the public Factor House Helm repository for installation.
Deploy an EKS cluster
We will use eksctl to provision an Amazon EKS cluster. The configuration for the cluster is defined in the manifests/eks/cluster.eksctl.yaml file within the repository.
Before creating the cluster, you must open this file and replace the placeholder values for <VPC-ID>, <PRIVATE-SUBNET-ID-* >, and <PUBLIC-SUBNET-ID-* > with your actual VPC and subnet IDs.
⚠️ The provided configuration assumes the EKS cluster will be deployed in the us-east-1 region. If you intend to use a different region, you must update the metadata.region field and ensure the availability zone keys under vpc.subnets (e.g., us-east-1a, us-east-1b) match the availability zones of the subnets in your chosen region.
Here is the content of the cluster.eksctl.yaml file:
Cluster Metadata: A cluster named fh-eks-cluster in the us-east-1 region.
VPC: Specifies an existing VPC and its public/private subnets where the cluster resources will be deployed.
IAM with OIDC: Enables the IAM OIDC provider, which allows Kubernetes service accounts to be associated with IAM roles. This is crucial for granting AWS permissions to your pods.
Service Accounts:
kpow-annual: Creates a service account for the Kpow Annual product. It attaches the AWSLicenseManagerConsumptionPolicy, allowing Kpow to validate its license with the AWS License Manager service.
kpow-hourly: Creates a service account for the Kpow Hourly product. It attaches the AWSMarketplaceMeteringRegisterUsage policy, which is required for reporting usage metrics to the AWS Marketplace.
Node Group: Defines a managed node group named ng-dev with t3.medium instances. The worker nodes will be placed in the private subnets (privateNetworking: true).
Once you have updated the YAML file with your networking details, run the following command to create the cluster. This process can take 15-20 minutes to complete.
eksctl create cluster -f cluster.eksctl.yaml
Once the cluster is created, eksctl automatically updates your kubeconfig file (usually located at ~/.kube/config) with the new cluster's connection details. This allows you to start interacting with your cluster immediately using kubectl.
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# ip-192-168-...-21.ec2.internal Ready <none> 2m15s v1.32.9-eks-113cf36
# ...
Launch a Kafka cluster
With the EKS cluster running, we will now launch an Apache Kafka cluster into it. We will use the Strimzi Kafka operator, which simplifies the process of running Kafka on Kubernetes.
Install the Strimzi operator
First, create a dedicated namespace for the Kafka cluster.
kubectl create namespace kafka
Next, download the Strimzi operator installation YAML. The repository already contains the file manifests/kafka/strimzi-cluster-operator-0.45.1.yaml, but the following commands show how it was downloaded and modified for this guide.
## Define the Strimzi version and download URL
STRIMZI_VERSION="0.45.1"DOWNLOAD_URL=https://github.com/strimzi/strimzi-kafka-operator/releases/download/$STRIMZI_VERSION/strimzi-cluster-operator-$STRIMZI_VERSION.yaml
## Download the operator manifest
curl -L -o manifests/kafka/strimzi-cluster-operator-$STRIMZI_VERSION.yaml ${DOWNLOAD_URL}
## Modify the manifest to install the operator in the 'kafka' namespace
sed -i 's/namespace: .*/namespace: kafka/' manifests/kafka/strimzi-cluster-operator-$STRIMZI_VERSION.yaml
Now, apply the manifest to install the Strimzi operator in your EKS cluster.
The configuration for our Kafka cluster is defined in manifests/kafka/kafka-cluster.yaml. It describes a simple, single-node cluster suitable for development, using ephemeral storage, meaning data will be lost if the pods restart.
After a few minutes, all the necessary pods and services for Kafka will be running. You can verify this by listing all resources in the kafka namespace.
kubectl get all -n kafka -o name
The output should look similar to this, showing the pods for Strimzi, Kafka, Zookeeper, and the associated services. The most important service for connecting applications is the Kafka bootstrap service.
Now that the EKS and Kafka clusters are running, we can deploy Kpow. This guide covers the deployment of both Kpow Annual and Kpow Hourly products. Both deployments will use a common set of configurations for connecting to Kafka and setting up authentication/authorization.
First, ensure you have a namespace for Kpow. The eksctl command we ran earlier already created the service accounts in the factorhouse namespace, so we will use that. If you hadn't created it, you would run kubectl create namespace factorhouse.
Create ConfigMaps
We will use two Kubernetes ConfigMaps to manage Kpow's configuration. This approach separates the core configuration from the Helm deployment values.
kpow-config-files: This ConfigMap holds file-based configurations, including RBAC policies, JAAS configuration, and user properties for authentication.
kpow-config: This ConfigMap provides environment variables to the Kpow container, such as the Kafka bootstrap address and settings to enable our authentication provider.
The contents of these files can be found in the repository at manifests/kpow/config-files.yaml and manifests/kpow/config.yaml.
kubectl get configmap -n factorhouse
# NAME DATA AGE
# kpow-config 5 ...
# kpow-config-files 3 ...
Deploy Kpow Annual
Download the Helm chart
The Helm chart for Kpow Annual is in a private Amazon ECR repository. First, authenticate your Helm client.
# Enable Helm's experimental support for OCI registries
export HELM_EXPERIMENTAL_OCI=1
# Log in to the AWS Marketplace ECR registry
aws ecr get-login-password \
--region us-east-1 | helm registry login \
--username AWS \
--password-stdin 709825985650.dkr.ecr.us-east-1.amazonaws.com
Next, pull and extract the chart.
# Create a directory, pull the chart, and extract it
mkdir -p awsmp-chart && cd awsmp-chart
# Pull the latest version of the Helm chart from ECR (add --version <x.x.x> to specify a version)
helm pull oci://709825985650.dkr.ecr.us-east-1.amazonaws.com/factor-house/kpow-aws-annualtar xf $(pwd)/* && find $(pwd) -maxdepth 1 -type f -delete
cd ..
Launch Kpow Annual
Now, install Kpow using Helm. We will reference the service account kpow-annual that was created during the EKS cluster setup, which has the required IAM policy for license management.
Note: The CPU and memory values are intentionally set low for this guide. For production environments, check the official documentation for recommended capacity.
Verify and access Kpow Annual
Check that the Kpow pod is running successfully.
kubectl get all -l app.kubernetes.io/instance=kpow-annual -n factorhouse
# NAME READY STATUS RESTARTS AGE
# pod/kpow-annual-kpow-aws-annual-c6bc849fb-zw5ww 0/1 Running 0 46s
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# service/kpow-annual-kpow-aws-annual ClusterIP 10.100.220.114 <none> 3000/TCP 47s
# ...
To access the UI, forward the service port to your local machine.
The Helm values are defined in values/eks-hourly.yaml.
# values/eks-hourly.yaml
env:
ENVIRONMENT_NAME: "Kafka from Kpow Hourly"envFromConfigMap: "kpow-config"volumeMounts:
# ... (volume configuration is the same as annual)
volumes:
# ...
resources:
# ...
Verify and access Kpow Hourly
Check that the Kpow pod is running.
kubectl get all -l app.kubernetes.io/instance=kpow-hourly -n factorhouse
# NAME READY STATUS RESTARTS AGE
# pod/kpow-hourly-kpow-aws-hourly-68869b6cb9-x9prf 0/1 Running 0 83s
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# service/kpow-hourly-kpow-aws-hourly ClusterIP 10.100.221.36 <none> 3000/TCP 85s
# ...
To access the UI, forward the service port to a different local port (e.g., 3001) to avoid conflicts.
In this guide, we have successfully deployed a complete, production-ready environment for monitoring Apache Kafka on AWS. By leveraging eksctl, we provisioned a robust EKS cluster with correctly configured IAM roles for service accounts, a critical step for secure integration with AWS services. We then deployed a Kafka cluster using the Strimzi operator, demonstrating the power of Kubernetes operators in simplifying complex stateful applications.
Finally, we walked through the deployment of both Kpow Annual and Kpow Hourly from the AWS Marketplace. This showcased the flexibility of Kpow's subscription models and their seamless integration with AWS for licensing and metering. You are now equipped with the knowledge to set up, configure, and manage Kpow on EKS, unlocking powerful insights and operational control over your Kafka ecosystem.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
This article dives into the various ways you can delete records in Kafka
Overview
Have you ever wondered how to effectively delete records in a Kafka topic? Well, there are actually several ways to do it, each with their own implications and granularity.
In this article, we'll explore these different approaches in detail, from the complete deletion of a topic to the more granular erasure of individual records. Understanding these methods is essential for anyone working with Kafka, as it can have significant implications for data retention, storage, and processing. By the end of this article, you'll have a better understanding of the different methods available for record deletion in Kafka, and how to choose the best approach for your specific use case.
About Kpow
This article uses Kpow for Apache Kafka as a companion to demonstrate how you can delete records in Kafka.
Kpow is a powerful tool that makes it easy to manage and monitor Kafka clusters, and its intuitive user interface simplifies the process of deleting records.
The most blunt and impactful way of deleting records on a Kafka cluster is by deleting the topic that contains the records.
While this can be an effective way to remove all data associated with a topic, it's important to note that this action is permanent and irreversible. Once a topic is deleted, all data will no longer be available, and any running applications that depend on this topic will likely throw exceptions. If topic auto-create is enabled on the broker, the topic could even get created again with the default topic configuration, potentially causing data loss or other issues.
Despite these risks, there may be cases where deleting a Kafka topic is necessary, such as when the topic is no longer needed or contains sensitive data that must be removed. Much like dropping a table in a traditional relational database, it's important to proceed with caution and have a clear understanding of the potential impacts before deleting a topic.
Deleting topics is simple in Kpow!
Navigate to Topic -> Details in the UI and select the topic you wish to delete.
The result of deleting topics in Kpow (like all other actions) gets persisted to Kpow's audit log for data governance. Kpow also provides a Slack webhook integration to notify a channel when the deletion of a topic has been performed.
2. Truncating Records
Truncating records is another method for deleting data from a Kafka topic, specifically a range of records from a topic partition. Truncation removes all records before a specified offset for a given topic partition. This can be useful when you want to remove a specific range of records without deleting the entire topic.
How topic partitions work in Kafka
In Kafka, all topic partitions have a start and end offset.
The start offset is the offset of the very first record on the topic partition. A fresh topic partition will have a start offset of 0. However, because of topic retention, cleanup policies, or even truncation, the start offset could be any value over time.
And similarly, the end offset is always the last record on a topic partition. The end offset is forever growing as producers write more records to a topic.
One thing to note: producing a single record may not result in a simple increment of the end offset. For example, transactional producers write additional metadata records when committing.
Viewing the start and end offsets inside Kpow is easy! Simply navigate to the topic partitions table in Topic -> Details and select the start and end offset columns.
An example of truncation
Consider a topic partition with 6 records. The start offset is 0 and the end offset is 5.
If we make a request to truncate a topic partition before offset 3, all records highlighted in gray will be deleted.
After we have performed this action the new start offset will be 3 and the end offset will remain as 5.
Truncating records in Kpow
Kpow provides a convenient way to truncate topics with its intuitive UI.
To truncate a topic in Kpow, simply follow these steps:
Navigate to the topic you want to truncate in the UI.
Select the partitions you want to truncate.
Choose to truncate by either the last observed end offset or by group offset.
Click "Truncate" to delete the specified range of records from the topic.
By default, Kpow populates the last observed end offset of each partition in the form. This will delete all records up to and including the specified offset.
Alternatively, you can choose to truncate by group offset, which deletes all records a consumer group has consumed. This has the advantage of not impacting the correctness/behavior of the consumer group, by only deleting records it has read.
It's important to note that truncating a topic is a destructive action and requires careful consideration. If multiple consumers are reading from the topic, truncating by group offset could impact the other consumers.
Implications of truncating a topic
Truncating a topic in Kafka is a less intrusive way of deleting records than deleting the topic entirely. This is because the topic configuration, including the number of partitions and replicas, remains unchanged. Additionally, you have more granular control over which records get deleted.
However, it's important to note that truncating a topic is a destructive action that requires careful consideration. In particular, truncating a topic can cause data loss and may impact the behavior of any consumers reading from the affected partitions.
As a best practice, it's generally recommended to rely on the semantics of how you configure a topic to manage topic growth, rather than resorting to truncation. For example, you can use the retention.ms configuration parameter to automatically age out data after a certain period of time, or configure a cleanup policy to remove old or irrelevant data. This blog post covers how these retention policies work in Kafka. How these get configured will depend on the use case of your topic.
That said, there are still valid reasons to truncate a topic on a running Kafka cluster. For instance, you may want to reset a topic to a specific state for testing or debugging purposes, or you may have encountered a production issue that requires you to delete a range of records from a topic. In these cases, truncation can be a useful tool.
If you do decide to truncate a topic, it's important to be aware of the potential impacts on your Kafka cluster and consumers. For example, truncating a topic may cause consumers to experience data gaps or inconsistencies. As a best practice, you should always test truncation in a non-production environment before running it in a production context.
3. Tombstoning Records
The final and most granular way of a deleting record in Kafka is via tombstoning. Tombstoning deletes an individual record based on its key.
How tombstoning works in Kafka
Tombstoning works by producing a record with a null value and the key of the record that needs to be deleted to a topic. Note: null in this case means a value of 0 bytes. For example, producing the value null with a JSON serializer will not have the same effect.
Tombstoning allows you to delete individual records from a topic without affecting the rest of the data in the topic.
Note: tombstoning will only work when the topic has been configured with a compact.policy of compact or compact,delete.
Compacted topics
Compacted topics in Kafka ensure that only the latest record per message key is retained within the log of data for a single topic partition. This policy is useful for implementing key/value stores or aggregated views where only the most recent state is needed.
For example, a KTable that holds the latest count of Covid-19 cases by country, where each record is keyed by the country, would benefit from a compacted topic.
It is important to note that compaction does not happen automatically and how often it happens depends on your topic and broker configuration. Therefore, deletion does not occur automatically after a tombstone record is produced.
This blog post goes into finer details about the different broker/topic configuration that can have an impact on when compaction happens.
Producing tombstone messages in Kpow
First, we can ensure that compaction has been enabled on our topic by navigating to the Topic Configuration table and selecting our topic and the config value cleanup.policy.
If cleanup.policy hasn't been correctly set, we can click the pencil icon to edit the topic configuration and set it to compact,delete.
Next, navigate to Kpow's Data Produce UI and select None for the value serializer while specifying the key you wish to delete.
Done! You have successfully produced a tombstone message!
Querying for data to be deleted
We can use Kpow to query for data we want to tombstone on a topic.
For example. Consider a topic that contains the following data:
Let's say we want to query for all records that have expired, we could write a kJQ query like so:
.value.expires | from-date < now
kJQ is Kpow's powerful query language for searching data on a Kafka topic. It is our implementation of the jq language with added features built specifically for Kafka.
The above query parses the expires field as an ISO 8601 date time and checks if its before the current date time (now). now will get resolved as the current date during query execution time.
After executing this query in Kpow, we can see a list of results that match our filtered query. These are the expired records!
We can now click the 'Produce results' button and produce these records back to the topic as tombstones, by selecting the value serializer as None.
Done! We have managed to delete a collection of records based on a query filter.
Conclusion
In this article we have demonstrated the various ways you can delete records in Kafka using Kpow.
You should now have a better understanding of deletion, understanding the different implications between each method, and when they might be applicable to use.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Kpow v90.6 introduces a new Dark Mode UI, improved intellisense, and confiugrable persistence settings.
Dark Mode
Start the new year right with the sleek new Dark Mode UI available in Kpow v90.6!
Improved Intellisense
From kJQ filters to Schema Editing, text entry input in Kpow has upgraded intellisense for JSON and EDN data.
Persistence Mode
Powered by two internal Kafka Streams applications, Kpow stores data in the first cluster in your configuration (we call this the Primary Cluster). This storage takes the form of several internal topics that are tuned to retain only a small amount of data.
In addition, an audit log topic is persisted permanently for data governance purposes.
These internal topics provide considerable feature support to Kpow, but there are circumstances in which you might want to turn them off.
Kpow v90.6 introduces a new PERSISTENCE_MODE environment variable that provides the following options to tune data storage:
Persistence Mode: Full (Default)
PERSISTENCE_MODE="full"
full is the current persistence behaviour of Kpow and utilizes the full set of internal topics.
This is the default behaviour of Kpow where no configuration is set.
Persistence Mode: Audit
PERSISTENCE_MODE="audit"
audit is a new persistence mode where the only internal topic that is created is the audit log.
This mode considerably reduces the amount of data written to Kafka, while retaining a full data governance trail.
When this mode is activated, certain features of Kpow run in a modified manner:
Metrics charts are not re-hydrated on a Kpow restart (normally they hydrate from an internal changelog).
Activity metrics (e.g. 'this topic was written to 3 minutes ago') are not persisted/maintained through a Kpow restart.
Kpow Streams Agent integration is disabled
Persistence Mode: None
PERSISTENCE_MODE="none"
none is a new persistence mode where zero data is written to Kafka.
This mode ensures that no internal topics are created and no data is written by Kpow to your Kafka cluster.
When this mode is activated, certain features of Kpow run in a modified manner:
Metrics charts are not re-hydrated on a Kpow restart (normally they hydrate from an internal changelog).
Activity metrics (e.g. 'this topic was written to 3 minutes ago') are not persisted/maintained through a Kpow restart.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
[MELBOURNE, AUS] Apache Kafka and Apache Flink Meetup, 27 November
Melbourne, we’re making it a double feature. Workshop by day, meetup by night - same location, each with valuable content for data and software engineers, or those working with Data Streaming technologies. Build the backbone your apps deserve, then roll straight into the evening meetup.
[SYDNEY, AUS] Apache Kafka and Apache Flink Meetup, 26 November
Sydney, we’re making it a double feature. Workshop by day, meetup by night - same location, each with valuable content for data and software engineers, or those working with Data Streaming technologies. Build the backbone your apps deserve, then roll straight into the evening meetup.
We’re building more than products, we’re building a community. Whether you're getting started or pushing the limits of what's possible with Kafka and Flink, we invite you to connect, share, and learn with others.