-
Notifications
You must be signed in to change notification settings - Fork 237
Description
Describe the bug
The OOM metric pod_container_status_terminated_reason_oom_killed from container insights enhanced observability can't be found on the Cloudwatch console even when there's a confirmed OOMKilled error for pods in our EKS cluster. It is on the list of observable metrics found here.
Steps to reproduce
- Use the default EKS add-on for cloudwatch observability with no changes to the config - version number below.
- Simulate a pod OOM error
- Observe that the metric
pod_container_status_terminated_reason_oom_killedcannot be found in Cloudwatch container insights
What did you expect to see?
pod_container_status_terminated_reason_oom_killed can be viewed and graphed on Cloudwatch container insights console. Pods with OOMKilled errors can be monitored.
What did you see instead?
The metric can't be found even when there's an OOMKilled status occuring for pods.
What version did you use?
AmazonCloudWatchAgent CWAgent/1.300054.0b1074 (go1.23.7; linux; amd64)- EKS v1.30
- EKS cloudwatch add-on version:
v3.6.0-eksbuild.2
What config did you use?
config.txt
Environment
AL2
Additional context
In the cloudwatch agent logs, there is the line:
manager.go:306] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory