-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
Bug description:
Fluent Bit v4.1.1 with the S3 output starts idle (≈0% CPU) and after ~10+ minutes the process pins the CPU (uses ~100% of its cgroup CPU allocation) and stays high indefinitely, even when there is no input traffic and even if the S3 settings point to a fake endpoint.
When the same configuration runs on Fluent Bit ≤ v4.0.10, CPU stays near 0% indefinitely. This appears to be a regression introduced in v4.1.1.
Steps to reproduce the problem:
- Save this minimal config as fluent-bit.yaml:
service:
flush: 2
log_level: info
pipeline:
inputs:
- name: tail
path: /tmp/*.log
outputs:
- name: s3
match_regex: "*"
retry_limit: "no_limit"
workers: 0
bucket: YOUR_BUCKET_NAME
region: us-west-1
endpoint: https://aws.com
- Run it via Docker:
docker run -d --name fb-s3-bug \
--cpus="1" --memory="256m" --memory-swap="256m" \
-v "$PWD/fluent-bit.yaml:/fluent-bit/etc/fluent-bit.yaml:ro" \
-v "$PWD/flb-storage:/var/flb-storage" \
fluent/fluent-bit:4.1.1 -c /fluent-bit/etc/fluent-bit.yaml
- Monitor CPU:
docker stats fb-s3-bug
- For ~10–20 minutes CPU ≈ 0–1%.
- Then CPU rises to ~100% of the container limit and remains high until restart.
Expected behavior
When there is no input and nothing to upload, Fluent Bit should remain idle (~0–1% CPU).
S3’s periodic upload timer should not enter a busy loop.
Additional context
- Issue reproduces with both fake and real S3 endpoints.
- The bug usually appears EXACTLY 10 minutes after Fluent Bit starts.
- S3 output in v4.1.1 causes ~100% CPU across 30+ pods after 10–20 min idle.
- Same config on Fluent Bit 4.0.10 and earlier remains stable (idle CPU indefinitely). → strongly indicates regression introduced in 4.1.x series.
- Other outputs, such as OpenTelemetry, work fine and do not cause any CPU issues.
- I had to downgrade Fluent Bit to a lower version to avoid the issue.