
Things that go bump in the night: Kafka operational issues and how to survive them
Most Kafka content focuses on how things should work. This session is about what actually happens in production.
Chad Harris, Solutions Architect at Factor House, draws on years of experience operating Kafka across organisations from early-stage startups to some of Silicon Valley's largest companies. In this session, Chad walks through real-world operational failures, from subtle misconfigurations to full-scale incidents, and the debugging workflows that actually help when the system stops behaving the way you expect.
You will come away with a clearer picture of which signals matter, how to approach Kafka observability in practice, and mitigation strategies you can apply immediately.
What the session covers
- Common Kafka failure patterns and how they manifest in production
- Debugging workflows for when logs are not telling the whole story
- Observability signals that matter, and ones that do not
- Practical mitigation strategies drawn from real incidents
This is a technical session aimed at platform engineers, data engineers, and anyone responsible for running Kafka in production. War stories included.