The unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, meaning one license will unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
The new unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, so you only need one license to unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
What's changing
Previously, we issued separate community licenses for Kpow and Flex, with different tiers for individuals and organisations. Now, there's just one single Community License that unlocks both products.
What's new:
One license for both products
Three environments for everyone - whether you're an individual developer or part of a team, you get three non-production installations per product
Simplified management - access and renew your licenses through our new self-service portal at account.factorhouse.io
Our commitment to the engineering community
Since first launching Kpow CE at Current '22, thousands of engineers have used our community licenses to learn Kafka and Flink without jumping through enterprise procurement hoops. This unified license keeps that same philosophy: high-quality tools that are free for non-production use.
The Factor House Community License is free for individuals and organizations to use in non-production environments. It's perfect for:
New users: Head to account.factorhouse.io to grab your free Community license. You'll receive instant access via magic link authentication.
Existing users: Your legacy Kpow and Flex Community licenses will continue to work and are now visible in the portal. When your license renews (after 12 months), consider switching to the unified model for easier management.
What's included
Both Kpow CE and Flex CE include most enterprise features, optimized for learning and testing. Includes Kafka and Flink monitoring and management, fast multi-topic search, and Schema registry and Kafka Connect support.
License duration: 12 months, renewable annually
Installations: Up to 3 per product (Kpow CE: 1 Kafka cluster + 1 Schema Registry + 1 Connect cluster per installation; Flex CE: 1 Flink cluster per installation)
Support: Self-service via Factor House Community Slack, documentation, and release notes
Deployment: Docker, Docker Compose or Kubernetes
Ready for production? Start a 30-day free trial of our Enterprise editions directly from the portal to unlock RBAC, Kafka Streams monitoring, custom SerDes, and dedicated support.
What about legacy licenses?
If you're currently using a Kpow Individual, Kpow Organization, or Flex Community license, nothing changes immediately. Your existing licenses will continue to work with their respective products and are now accessible in the portal. When your license expires at the end of its 12-month term, you can easily switch to the new unified license for simpler management.
The unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, meaning one license will unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
The new unified Factor House Community License works with both Kpow Community Edition and Flex Community Edition, so you only need one license to unlock both products. This makes it even simpler to explore modern data streaming tools, create proof-of-concepts, and evaluate our products.
What's changing
Previously, we issued separate community licenses for Kpow and Flex, with different tiers for individuals and organisations. Now, there's just one single Community License that unlocks both products.
What's new:
One license for both products
Three environments for everyone - whether you're an individual developer or part of a team, you get three non-production installations per product
Simplified management - access and renew your licenses through our new self-service portal at account.factorhouse.io
Our commitment to the engineering community
Since first launching Kpow CE at Current '22, thousands of engineers have used our community licenses to learn Kafka and Flink without jumping through enterprise procurement hoops. This unified license keeps that same philosophy: high-quality tools that are free for non-production use.
The Factor House Community License is free for individuals and organizations to use in non-production environments. It's perfect for:
New users: Head to account.factorhouse.io to grab your free Community license. You'll receive instant access via magic link authentication.
Existing users: Your legacy Kpow and Flex Community licenses will continue to work and are now visible in the portal. When your license renews (after 12 months), consider switching to the unified model for easier management.
What's included
Both Kpow CE and Flex CE include most enterprise features, optimized for learning and testing. Includes Kafka and Flink monitoring and management, fast multi-topic search, and Schema registry and Kafka Connect support.
License duration: 12 months, renewable annually
Installations: Up to 3 per product (Kpow CE: 1 Kafka cluster + 1 Schema Registry + 1 Connect cluster per installation; Flex CE: 1 Flink cluster per installation)
Support: Self-service via Factor House Community Slack, documentation, and release notes
Deployment: Docker, Docker Compose or Kubernetes
Ready for production? Start a 30-day free trial of our Enterprise editions directly from the portal to unlock RBAC, Kafka Streams monitoring, custom SerDes, and dedicated support.
What about legacy licenses?
If you're currently using a Kpow Individual, Kpow Organization, or Flex Community license, nothing changes immediately. Your existing licenses will continue to work with their respective products and are now accessible in the portal. When your license expires at the end of its 12-month term, you can easily switch to the new unified license for simpler management.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Release 95.1: A unified experience across product, web, docs and licensing
95.1 delivers a cohesive experience across Factor House products, licensing, and brand. This release introduces our new license portal, refreshed company-wide branding, a unified Community License for Kpow and Flex, and a series of performance, accessibility, and schema-related improvements.
Upgrading to 95.1 If you are using Kpow with a Google Managed Service for Apache Kafka (Google MSAK) cluster, you will now need to use either kpow-java17-gcp-standalone.jar or the 95.1-temurin-ubi tag of the factorhouse/kpow Docker image.
New Factor House brand: unified look across web, product, and docs
We've refreshed the Factor House brand across our website, documentation, the new license portal, and products to reflect where we are today: a company trusted by engineers running some of the world's most demanding data pipelines. Following our seed funding earlier this year, we've been scaling the team and product offerings to match the quality and value we deliver to enterprise engineers. The new brand brings our external presence in line with what we've built. You'll see updated logos in Kpow and Flex, refreshed styling across docs and the license portal, and a completely redesigned website with clearer navigation and information architecture. Your workflows stay exactly the same, and the result is better consistency across all touchpoints, making it easier for new users to evaluate our tools and for existing users to find what they need.
New license portal: self-service access for all users
We've rolled out our new license portal at account.factorhouse.io, to streamline license management for everyone. New users can instantly grab a Community or Trial license with just their email address, and existing users will see their migrated licenses when they log in. The portal lets you manage multiple licenses from one account, all through a clean, modern interface with magic link authentication. This could be upgrading from Community to a Trial, renewing your annual Community License, or requesting a trial extension. For installation and configuration guidance, check our Kpow and Flex docs.
We've consolidated our Community licensing into a single unified license that works with both Kpow Community Edition and Flex Community Edition. Your Community license allows you to run Kpow and Flex in up to three non-production environments each, making it easier to learn, test, and build with Kafka and Flink. The new licence streamlines management, providing a single key for both products and annual renewal via the licence portal. Perfect for exploring projects like Factor House Local or building your own data pipelines. Existing legacy licenses will continue to work and will also be accessible in the license portal.
This release brings in a number of performance improvements to Kpow, Flex and Factor Platform. The work to compute and materialize views and insights about your Kafka or Flink resources has now been decreased by an order of magnitude. For our top-end customers we have observed a 70% performance increase in Kpow’s materialization.
Data Inspect enhancements
Confluent Data Rules support: Data inspect now supports Confluent Schema Registry Data Rules, including CEL, CEL_FIELD, and JSONata rule types. If you're using Data Contracts in Confluent Cloud, Data Inspect now accurately identifies rule failures and lets you filter them with kJQ.
Support for Avro Primitive Types: We’ve added support for Avro schemas that consist of a plain primitive type, including string, number, and boolean.
Schema Registry & navigation improvements
General Schema Registry improvements (from 94.6): In 94.6, we introduced improvements to Schema Registry performance and updated the observation engine. This release continues that work, with additional refinements based on real-world usage.
Karapace compatibility fix: We identified and fixed a regression in the new observation engine that affected Karapace users.
Redpanda Schema Registry note: The new observation engine is not compatible with Redpanda’s Schema Registry. Customers using Redpanda should set `OBSERVATION_VERSION=1` until full support is available.
Navigation improvements: Filters on the Schema Overview pages now persist when navigating into a subject and back.
Chart accessibility & UX improvements
This release brings a meaningful accessibility improvement to Kpow & Flex: Keyboard navigation for line charts. Users can now focus a line chart and use the left and right arrow keys to view data point tooltips. We plan to expand accessibility for charts to include bar charts and tree maps in the near future, bringing us closer to full WCAG 2.1 Level AA compliance as reported in our Voluntary Product Accessibility Template (VPAT).
We’ve also improved the UX of comparing adjacent line charts: Each series is now consistently coloured across different line charts on a page, making it easier to identify trends across a series, e.g., a particular topic’s producer write/s vs. consumer read/s.
These changes benefit everyone: developers using assistive technology, teams with accessibility requirements, and anyone who prefers keyboard navigation. Accessibility isn't an afterthought, it's a baseline expectation for enterprise-grade tooling, and we're committed to leading by example in the Kafka and Flink ecosystem.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Streamline your Kpow deployment on Amazon EKS with our guide, fully integrated with the AWS Marketplace. We use eksctl to automate IAM Roles for Service Accounts (IRSA), providing a secure integration for Kpow's licensing and metering. This allows your instance to handle license validation via AWS License Manager and report usage for hourly subscriptions, enabling a production-ready deployment with minimal configuration.
This guide provides a comprehensive walkthrough for deploying Kpow, a powerful toolkit for Apache Kafka, onto an Amazon EKS (Elastic Kubernetes Service) cluster. We will cover the entire process from start to finish, including provisioning the necessary AWS infrastructure, deploying a Kafka cluster using the Strimzi operator, and finally, installing Kpow using a subscription from the AWS Marketplace.
The guide demonstrates how to set up both Kpow Annual and Kpow Hourly products, highlighting the specific integration points with AWS services like IAM for service accounts, ECR for container images, and the AWS License Manager for the annual subscription. By the end of this tutorial, you will have a fully functional environment running Kpow on EKS, ready to monitor and manage your Kafka cluster.
The source code and configuration files used in this guide can be found in the features/eks-deployment folder of this GitHub repository.
About Factor House
Factor House is a leader in real-time data tooling, empowering engineers with innovative solutions for Apache Kafka® and Apache Flink®.
Our flagship product, Kpow for Apache Kafka, is the market-leading enterprise solution for Kafka management and monitoring.
VPC: A Virtual Private Cloud (VPC) that has both public and private subnets is required.
IAM Permissions: A user with the necessary IAM permissions to create an EKS cluster with a service account.
Kpow Subscription:
A subscription to a Kpow product through the AWS Marketplace is required. After subscribing, you will receive access to the necessary components and deployment instructions.
The specifics of accessing the container images and Helm chart depend on the chosen Kpow product:
Kpow Annual product:
Subscribing to the annual product provides access to the ECR (Elastic Container Registry) image and the corresponding Helm chart.
Kpow Hourly product:
For the hourly product, access to the ECR image will be provided and deployment utilizes the public Factor House Helm repository for installation.
Deploy an EKS cluster
We will use eksctl to provision an Amazon EKS cluster. The configuration for the cluster is defined in the manifests/eks/cluster.eksctl.yaml file within the repository.
Before creating the cluster, you must open this file and replace the placeholder values for <VPC-ID>, <PRIVATE-SUBNET-ID-* >, and <PUBLIC-SUBNET-ID-* > with your actual VPC and subnet IDs.
⚠️ The provided configuration assumes the EKS cluster will be deployed in the us-east-1 region. If you intend to use a different region, you must update the metadata.region field and ensure the availability zone keys under vpc.subnets (e.g., us-east-1a, us-east-1b) match the availability zones of the subnets in your chosen region.
Here is the content of the cluster.eksctl.yaml file:
Cluster Metadata: A cluster named fh-eks-cluster in the us-east-1 region.
VPC: Specifies an existing VPC and its public/private subnets where the cluster resources will be deployed.
IAM with OIDC: Enables the IAM OIDC provider, which allows Kubernetes service accounts to be associated with IAM roles. This is crucial for granting AWS permissions to your pods.
Service Accounts:
kpow-annual: Creates a service account for the Kpow Annual product. It attaches the AWSLicenseManagerConsumptionPolicy, allowing Kpow to validate its license with the AWS License Manager service.
kpow-hourly: Creates a service account for the Kpow Hourly product. It attaches the AWSMarketplaceMeteringRegisterUsage policy, which is required for reporting usage metrics to the AWS Marketplace.
Node Group: Defines a managed node group named ng-dev with t3.medium instances. The worker nodes will be placed in the private subnets (privateNetworking: true).
Once you have updated the YAML file with your networking details, run the following command to create the cluster. This process can take 15-20 minutes to complete.
eksctl create cluster -f cluster.eksctl.yaml
Once the cluster is created, eksctl automatically updates your kubeconfig file (usually located at ~/.kube/config) with the new cluster's connection details. This allows you to start interacting with your cluster immediately using kubectl.
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# ip-192-168-...-21.ec2.internal Ready <none> 2m15s v1.32.9-eks-113cf36
# ...
Launch a Kafka cluster
With the EKS cluster running, we will now launch an Apache Kafka cluster into it. We will use the Strimzi Kafka operator, which simplifies the process of running Kafka on Kubernetes.
Install the Strimzi operator
First, create a dedicated namespace for the Kafka cluster.
kubectl create namespace kafka
Next, download the Strimzi operator installation YAML. The repository already contains the file manifests/kafka/strimzi-cluster-operator-0.45.1.yaml, but the following commands show how it was downloaded and modified for this guide.
## Define the Strimzi version and download URL
STRIMZI_VERSION="0.45.1"DOWNLOAD_URL=https://github.com/strimzi/strimzi-kafka-operator/releases/download/$STRIMZI_VERSION/strimzi-cluster-operator-$STRIMZI_VERSION.yaml
## Download the operator manifest
curl -L -o manifests/kafka/strimzi-cluster-operator-$STRIMZI_VERSION.yaml ${DOWNLOAD_URL}
## Modify the manifest to install the operator in the 'kafka' namespace
sed -i 's/namespace: .*/namespace: kafka/' manifests/kafka/strimzi-cluster-operator-$STRIMZI_VERSION.yaml
Now, apply the manifest to install the Strimzi operator in your EKS cluster.
The configuration for our Kafka cluster is defined in manifests/kafka/kafka-cluster.yaml. It describes a simple, single-node cluster suitable for development, using ephemeral storage, meaning data will be lost if the pods restart.
After a few minutes, all the necessary pods and services for Kafka will be running. You can verify this by listing all resources in the kafka namespace.
kubectl get all -n kafka -o name
The output should look similar to this, showing the pods for Strimzi, Kafka, Zookeeper, and the associated services. The most important service for connecting applications is the Kafka bootstrap service.
Now that the EKS and Kafka clusters are running, we can deploy Kpow. This guide covers the deployment of both Kpow Annual and Kpow Hourly products. Both deployments will use a common set of configurations for connecting to Kafka and setting up authentication/authorization.
First, ensure you have a namespace for Kpow. The eksctl command we ran earlier already created the service accounts in the factorhouse namespace, so we will use that. If you hadn't created it, you would run kubectl create namespace factorhouse.
Create ConfigMaps
We will use two Kubernetes ConfigMaps to manage Kpow's configuration. This approach separates the core configuration from the Helm deployment values.
kpow-config-files: This ConfigMap holds file-based configurations, including RBAC policies, JAAS configuration, and user properties for authentication.
kpow-config: This ConfigMap provides environment variables to the Kpow container, such as the Kafka bootstrap address and settings to enable our authentication provider.
The contents of these files can be found in the repository at manifests/kpow/config-files.yaml and manifests/kpow/config.yaml.
kubectl get configmap -n factorhouse
# NAME DATA AGE
# kpow-config 5 ...
# kpow-config-files 3 ...
Deploy Kpow Annual
Download the Helm chart
The Helm chart for Kpow Annual is in a private Amazon ECR repository. First, authenticate your Helm client.
# Enable Helm's experimental support for OCI registries
export HELM_EXPERIMENTAL_OCI=1
# Log in to the AWS Marketplace ECR registry
aws ecr get-login-password \
--region us-east-1 | helm registry login \
--username AWS \
--password-stdin 709825985650.dkr.ecr.us-east-1.amazonaws.com
Next, pull and extract the chart.
# Create a directory, pull the chart, and extract it
mkdir -p awsmp-chart && cd awsmp-chart
# Pull the latest version of the Helm chart from ECR (add --version <x.x.x> to specify a version)
helm pull oci://709825985650.dkr.ecr.us-east-1.amazonaws.com/factor-house/kpow-aws-annualtar xf $(pwd)/* && find $(pwd) -maxdepth 1 -type f -delete
cd ..
Launch Kpow Annual
Now, install Kpow using Helm. We will reference the service account kpow-annual that was created during the EKS cluster setup, which has the required IAM policy for license management.
Note: The CPU and memory values are intentionally set low for this guide. For production environments, check the official documentation for recommended capacity.
Verify and access Kpow Annual
Check that the Kpow pod is running successfully.
kubectl get all -l app.kubernetes.io/instance=kpow-annual -n factorhouse
# NAME READY STATUS RESTARTS AGE
# pod/kpow-annual-kpow-aws-annual-c6bc849fb-zw5ww 0/1 Running 0 46s
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# service/kpow-annual-kpow-aws-annual ClusterIP 10.100.220.114 <none> 3000/TCP 47s
# ...
To access the UI, forward the service port to your local machine.
The Helm values are defined in values/eks-hourly.yaml.
# values/eks-hourly.yaml
env:
ENVIRONMENT_NAME: "Kafka from Kpow Hourly"envFromConfigMap: "kpow-config"volumeMounts:
# ... (volume configuration is the same as annual)
volumes:
# ...
resources:
# ...
Verify and access Kpow Hourly
Check that the Kpow pod is running.
kubectl get all -l app.kubernetes.io/instance=kpow-hourly -n factorhouse
# NAME READY STATUS RESTARTS AGE
# pod/kpow-hourly-kpow-aws-hourly-68869b6cb9-x9prf 0/1 Running 0 83s
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# service/kpow-hourly-kpow-aws-hourly ClusterIP 10.100.221.36 <none> 3000/TCP 85s
# ...
To access the UI, forward the service port to a different local port (e.g., 3001) to avoid conflicts.
In this guide, we have successfully deployed a complete, production-ready environment for monitoring Apache Kafka on AWS. By leveraging eksctl, we provisioned a robust EKS cluster with correctly configured IAM roles for service accounts, a critical step for secure integration with AWS services. We then deployed a Kafka cluster using the Strimzi operator, demonstrating the power of Kubernetes operators in simplifying complex stateful applications.
Finally, we walked through the deployment of both Kpow Annual and Kpow Hourly from the AWS Marketplace. This showcased the flexibility of Kpow's subscription models and their seamless integration with AWS for licensing and metering. You are now equipped with the knowledge to set up, configure, and manage Kpow on EKS, unlocking powerful insights and operational control over your Kafka ecosystem.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Kpow v78 provides a major overhaul of our user interface and the brand new ability to search multiple topics at the same time in Data Inspect.
Multi-Topic Search
Klang in the kREPL already provides the ability to search multiple topics in a single query (and then play with that data in the kREPL), but that interface is programmatic and quite formidable - so we're introduced the ability to perform multi-topic search in our simplified Data Inspect UI.
This means you can now query multiple topics and filter the results with kQL all in a single query.
User Interface Update
This release includes a major backlog item to simplify overall user experience when using Kpow. That means changes to our navigation; removing the sub-menu options and placing them in a horizontal tab menu within the page; breaking long pages into shorter ones; and simplifying forms like Data Inspect.
The functionality of Kpow remains the same, we just do more with less. This effort is in anticipation of two future releases that will add many new features to Kpow.
Multi-Tenancy
Our next feature release will provide the ability for admin users to restrict visibility of Kafka resources by user-role. In effect you will be able to completely reduce your users experience of Kafka right down to only the topics, groups, and clusters that interest them.
Multi-Tenancy introduces a new Kpow-Admin role - users who have the ability to view all resources, simulate tenancies, escalate privileges, and so on.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
This article covers running Kpow in Kubernetes using the Kpow Helm Chart. Introduction Kpow is the all-in-one toolkit to manage, monitor, and learn about your Kafka resources. Helm is the package manager for Kubernetes. Helm deploys charts, which you can think of as a packaged application. We publish...
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Kafka Alerting with Kpow, Prometheus and Alertmanager
This article covers setting up alerting with Kpow using Prometheus and Alertmanager. Introduction Kpow was built from our own need to monitor Kafka clusters and related resources (eg, Streams, Connect and Schema Registries). Through Kpow's user interface we can detect and even predict potential problems...
This article covers setting up alerting with Kpow using Prometheus and Alertmanager.
Introduction
Kpow was built from our own need to monitor Kafka clusters and related resources (eg, Streams, Connect and Schema Registries).
Through Kpow's user interface we can detect and even predict potential problems with Kafka such as:
Replicas that have gone out of sync
Consumer group assignments that are lagging above a certain threshold
Topic growth that will exceed a quota
How can we alert teams as soon as these problems occur?
Kpow does not provide its own alerting functionality but instead integrates with Prometheus for a modern alerting solution.
Why don't we natively support alerting?
We believe a dedicated product like Prometheus is better suited for alerting rather than an individual product in most cases. Most organizations have alerting needs beyond Kafka, and having alerting managed from a centralized service, such as Prometheus makes more sense.
Don't use Prometheus?
Fear not, almost every major observability tool on the market today supports Prometheus metrics. For example, Grafana Cloud supports Prometheus alerts out of the box.
This article will demonstrate how to set up Kpow with Prometheus + AlertManager, alongside example configuration to help you start defining your alerts when things go wrong with your Kafka cluster.
Architecture
Here is the basic architecture of alerting with Prometheus:
Alerts are defined in Prometheus configuration. Prometheus pulls metrics from all client applications (including Kpow). If any condition is met, Prometheus pushes the alert to the AlertManager service, which manages the alerts through its pipeline of silencing, inhibition, grouping and sending out notifications. Essentially what that means is that AlertManager takes care of deduplicating, grouping and routing of alerts to the correct integration such as Slack, email or Opsgenie.
Kpow Metrics
The unique thing about Kpow as a product is that we calculate our own telemetry about your Kafka Cluster and related resources.
This has a ton of advantages:
No dependency on Kafka's own JMX metrics - This allows frictionless installation and configuration.
From our observations of your Kafka cluster we calculate a wide range of Kafka metrics, including group and topic offset deltas.
This same pattern applies to other supported resources such as Kafka Connect, Kafka Streams and Schema Registry metrics.
Environment Setup
We provide a docker-compose.yml configuration that starts up Kpow, a 3-node Kafka cluster and Prometheus + AlertManager. This can be found in the kpow-local repositry on GitHub. Instructions on how to start a 30-day trial can be found in the repository if you are new to Kpow.
git clone https://github.com/factorhouse/kpow-local.gitcd kpow-localvi local.env # add your LICENSE details, see kpow-local README.mddocker-compose up
Once the Docker Compose environment is running:
Alertmanager's web UI will be reachable on port 9001
Prometheus' web UI will be reachable on port 9090
Kpow's web UI will be reachable on port 3000
The remainder of this tutorial will be based off the Docker Compose environment.
Prometheus Configuration
A single instance of Kpow can observe and monitor multiple Kafka clusters and related resources! This makes Kpow a great aggregator for your entire Kafka deployment across multiple environments as a single Prometheus endpoint served by Kpow can provide metrics about all your Kafka resources.
When Kpow starts up, it logs the various Prometheus endpoints available:
--* Prometheus Egress: * GET /metrics/v1 - All metrics * GET /offsets/v1 - All topic offsets * GET /offsets/v1/topic/[topic-name] - All topic offsets for specific topic, all clusters. * GET /streams/v1 - All Kafka Streams metrics * GET /streams/v1/group/[group-name] - All Kafka Streams metrics for specific group, all clusters * GET /metrics/v1/cluster/sb2i_wfxSa-LaD0srBaMiA - Metrics for cluster Dev01 * GET /offsets/v1/cluster/sb2i_wfxSa-LaD0srBaMiA - Offsets for cluster Dev01 * GET /streams/v1/cluster/sb2i_wfxSa-LaD0srBaMiA - Kafka Streams metrics for cluster Dev01 * GET /metrics/v1/connect/sb2i_wfxSa-LaD0srBaMiA - Metrics for connect instance sb2i_wfxSa-LaD0srBaMiA (cluster sb2i_wfxSa-LaD0srBaMiA) * GET /metrics/v1/cluster/lkc-jyojm - Metrics for cluster Uat01 * GET /offsets/v1/cluster/lkc-jyojm - Offsets for cluster Uat01 * GET /streams/v1/cluster/lkc-jyojm - Kafka Streams metrics for cluster Uat01 * GET /metrics/v1/schema/a2f06a916672d71d675f - Metrics for schema registry instance a2f06a916672d71d675f (cluster lkc-jyojm) * GET /metrics/v1/cluster/CuxsifYVRhSRX6iLTbANWQ - Metrics for cluster Prod1 * GET /offsets/v1/cluster/CuxsifYVRhSRX6iLTbANWQ - Offsets for cluster Prod1 * GET /streams/v1/cluster/CuxsifYVRhSRX6iLTbANWQ - Kafka Streams metrics for cluster Prod1
This allows Prometheus to only consume a subset of metrics (eg, metrics about a specific consumer group or resource).
To have Prometheus pull all metrics, add this entry to your scrape_configs:
Note : you will need to provide a reachable target. In this example Kpow is reachable at http://kpow:3000.
Within your prometheus config, you will need to specify a location to your rules.yml file:
rule_files: - kpow-rules.yml
Our kpow-rules.yml file looks something like:
groups:- name: Kafka rules: # Example rules in section below
We have a single alert group called Kafka. The collection of rules are explained in the next section.
The sample kpow-rules.yml and alertmanager.yml config can be found here. In this example alertmanager will be sending all fired alerts to a Slack WebHook.
Kpow Metric Structure
A glossary of available Prometheus metrics from Kpow can be found here.
All Kpow metrics follow a similar labelling convention:
domain - the category of metric (for example cluster, connect, streams)
id - the unique identifier of the category (for example Kafka Cluster ID)
target - the identifier of the metric (for example consumer group, topic name etc)
Relates to a Kafka Cluster (with id 6Qw4099nSuuILkCkWC_aNw and label Trade Book Staging) for consumer group tx_partner_group4.
Prometheus Rules
The remainder of this section will provide example Prometheus rules for common alerting scenarios.
Alerting when a Consumer Group is unhealthy
- alert: UnhealthyConsumer expr: group_state == 0 or group_state == 1 or group_state == 2 for: 5m annotations: summary: "Consumer {{ $labels.target }} is unhealthy" description: "The Consumer Group {{ $labels.target }} has gone into {{ $labels.state }} for cluster {{ $labels.id }}"
Here, the group_state metric from Kpow is exposed as a gauge and the value represents the ordinal value of the ConsumerGroupState enum. The expr is testing whether group_state enters state DEAD, EMPTY or UNKNOWN for all consumer groups.
The for clause causes Prometheus to wait for a certain duration between first encountering a new expression output vector element and counting an alert as firing for this element. In this case 5 minutes.
The annotations section then provides a human readable alert description which describes which consumer group has entered an unhealthy state. Group state has a state label that contains the human-readable value of the state (eg, STABLE).
Alerting when a Kafka Connect task is unhealthy
Similar to our consumer group configuration, we can alert when we detect a connector task has gone into an ERROR state.
- alert: UnhealthyConnectorTask expr: connect_connector_task_state != 1 for: 5m annotations: summary: "Connect task {{ $labels.target }} is unhealthy" description: "The Connector task {{ $labels.target }} has gone into {{ $labels.target }} for cluster {{ $labels.id }}"- alert: UnhealthyConnector expr: connect_connector_state != 1 for: 5m annotations: summary: "Connector {{ $labels.target }} is unhealthy" description: "The Connector {{ $labels.target }} has gone into {{ $labels.target }} for cluster {{ $labels.id }}"
Here we have configured two alerts: one if an individual connector task goes enters an error state, and one if the connector itself enters an error state. The value of 1 represents the RUNNING state.
Alerting when a consumer group is lagging above a threshold
In this example Prometheus will fire an alert if any consumer groups lag exceeds 5000 messages for more than 5 minutes.
We can configure a similar alert for host_offset_lag to monitor individual lagging hosts, or even broker_offset_lag for lagging behind brokers.
- alert: LaggingConsumerGroup expr: group_offset_lag > 5000 for: 5m annotations: summary: "Consumer group {{ $labels.target }} is lagging" description: "Consumer group {{ $labels.target }} is lagging for cluster {{ $labels.id }}"
Alerting when the Kpow instance is down
- alert: KpowDown expr: up == 0 and {job="kpow"} for: 1m annotations: summary: "Kpow is down" description: "Kpow instance {{ $labels.target }} has been down for more than 1 minute."
Conclusion
This article demonstrates how you can build out a modern alerting system with Kpow and Prometheus.
Source code for configuration, including a demo docker-compose.yml of the setup can be found here.
Prometheus metrics are the de-facto industry standard, meaning similar integrations are possible with services such as Grafana Cloud or New Relic. All of these services provide an equally compelling solution to alerting.
What's even more exciting for us is Amazon's Managed Service for Prometheus which is currently in feature preview. This service looks to make Prometheus monitoring of containerized applications at scale much easier.
While Prometheus metrics are what we expose for data egress with Kpow, please get in touch if you would like alternative metric egress formats in Kpow such as WebHooks or even a JMX connection - we'd love to know your use case!
Manage, Monitor and Learn Apache Kafka with Kpow by Factor House.
We know how easy Apache Kafka® can be with the right tools. We built Kpow to make the developer experience with Kafka simple and enjoyable, and to save businesses time and money while growing their Kafka expertise. A single Docker container or JAR file that installs in minutes, Kpow's unique Kafka UI gives you instant visibility of your clusters and immediate access to your data.
Kpow is compatible with Apache Kafka+1.0, Red Hat AMQ Streams, Amazon MSK, Instaclustr, Aiven, Vectorized, Azure Event Hubs, Confluent Platform, and Confluent Cloud.
Start with a free 30-day trial and solve your Kafka issues within minutes.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
This minor release provides improved support for Jetty User Authentication (PropertyFile, JDBC, LDAP) when operating Kpow behind a reverse-proxy.
The new environment variable HTTP_FORWARDED ensures that redirects to the login page maintain the correct scheme (e.g. HTTPS) when SSL termination is happening at the reverse-proxy.
Release 77: Improved Reverse Proxy X-Forwarded-For Support
All
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
[MELBOURNE, AUS] Apache Kafka and Apache Flink Meetup, 27 November
Melbourne, we’re making it a double feature. Workshop by day, meetup by night - same location, each with valuable content for data and software engineers, or those working with Data Streaming technologies. Build the backbone your apps deserve, then roll straight into the evening meetup.
[SYDNEY, AUS] Apache Kafka and Apache Flink Meetup, 26 November
Sydney, we’re making it a double feature. Workshop by day, meetup by night - same location, each with valuable content for data and software engineers, or those working with Data Streaming technologies. Build the backbone your apps deserve, then roll straight into the evening meetup.
We’re building more than products, we’re building a community. Whether you're getting started or pushing the limits of what's possible with Kafka and Flink, we invite you to connect, share, and learn with others.