Skip to content

Build a tool in Arroyo to parse logs #461

@fpacifici

Description

@fpacifici

On large consumer clusters (64 consumers or so) during rebalance it is very hard to understand what the whole cluster is doing from the logs. Logs are too verbose and since the consumer are too many it is hard to find causes and effects.

Let's make a tool solve the problem for us. The idea is to write a logs parser that takes GCP logs or any log, and looks for arroyo specific events like initialization, crash, partitions assigned, partition revoked and commits and produces a unified timeline (in whatever format) with higher level events like:

  • shows barriers across consumers about when a rebalance starts to when the consumer group is stable. This is so we do not have to match partition assigned and revoked messages across consumers
  • shows when each consumer starts processing data and committing data after getting partitions assigned.
  • shows processing vs stuck times on a time line across consumers.

Make it extensible for the owner of the consumer so that snuba can add custom events.

Enterprise edition: generate a mermaid sequence diagram out of the log.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions