-
-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
Description
On large consumer clusters (64 consumers or so) during rebalance it is very hard to understand what the whole cluster is doing from the logs. Logs are too verbose and since the consumer are too many it is hard to find causes and effects.
Let's make a tool solve the problem for us. The idea is to write a logs parser that takes GCP logs or any log, and looks for arroyo specific events like initialization, crash, partitions assigned, partition revoked and commits and produces a unified timeline (in whatever format) with higher level events like:
- shows barriers across consumers about when a rebalance starts to when the consumer group is stable. This is so we do not have to match partition assigned and revoked messages across consumers
- shows when each consumer starts processing data and committing data after getting partitions assigned.
- shows processing vs stuck times on a time line across consumers.
Make it extensible for the owner of the consumer so that snuba can add custom events.
Enterprise edition: generate a mermaid sequence diagram out of the log.