Skip to content

Add Pressure Stall Information (PSI) metrics #2995

@alpineQ

Description

@alpineQ

Area(s)

area:system

Propose new conventions

Linux Pressure Stall Information (PSI) metrics provide valuable insights into resource pressure on CPU, memory, and I/O subsystems, available on Linux systems with kernel 4.20+.

PSI includes 20 new metric instruments:

  • CPU: system.psi.cpu.some.* (avg10, avg60, avg300, total)
  • Memory: system.psi.memory.some.* and system.psi.memory.full.* (avg10, avg60, avg300, total for each)
  • I/O: system.psi.io.some.* and system.psi.io.full.* (avg10, avg60, avg300, total for each)

Where "some" indicates some tasks are stalled and "full" indicates all tasks are stalled. The avg* metrics represent pressure averages over 10, 60, and 300 second windows, while total metrics track cumulative stall time in microseconds.

PSI metrics can be automatically collected alongside existing host metrics when running on Linux systems with PSI support. On non-Linux systems or when PSI is unavailable, the implementation gracefully degrades with no impact.

Additional Context

Open Telemetry integration details

There are issues on this matter in:

And 2 PRs that I am proposing to address these issues:

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    In Progress

    Status

    Accepted

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions