Skip to content

Conversation

@AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Sep 23, 2025

DO NOT MERGE BEFORE 9.2 IS RELEASED

Proposed commit message

[kubernetes] adds beta support for rotated logs

Enables the collection of rotated log files, including GZIP-compressed files, for Kubernetes container logs.

This enhancement introduces new configuration options to control the ingestion of rotated logs, including specifying paths for the rotated logs and enabling GZIP decompression.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • [ ] I have verified that Kibana version constraints are current according to guidelines.
  • [ ] I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

How to test this PR locally

  • install this integration to an ES stack
cd packages/kubernetes
elastic-package install -v
  • start a kind cluster with the follwing config
node-cfg.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: KubeletConfiguration
        containerLogMaxSize: "10Mi"
        containerLogMaxFiles: 10
kind create -v cluster --config ./node-cfg.yaml
  • run a flog pod, wait the logs to be generated.
flog.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: flog-log-generator
spec:
  template:
    spec:
      containers:
        - name: flog
          image: mingrammer/flog
#          too small "-d" won't give kubelet time to rotate the files
          args: ["-t", "stdout", "-f", "json", "-n", "185000", "-d", "1ms"]
      restartPolicy: Never
  backoffLimit: 4
kubectl apply -f flog.yaml
  • add the kubernets integration enabling to ingest rotated logs. Ideally only enable container logs to avoid errors as kind is really minimal and not all metrics are available
  • install a 9.2.0-SNAPSHOT elastic-agent using the instructions for deploying it on k8s
  • ensure all 185000 were ingested and the ingested files include the rotated plain text and gzip files

Related issues

Screenshots

Screenshot from 2025-09-30 15-07-07

@AndersonQ AndersonQ self-assigned this Sep 23, 2025
@AndersonQ AndersonQ added Integration:kubernetes Kubernetes Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] labels Sep 23, 2025
@AndersonQ AndersonQ force-pushed the 14238-kubernetes-gzip-support branch from 736eb8a to 2cbfd60 Compare September 23, 2025 16:57
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@AndersonQ AndersonQ force-pushed the 14238-kubernetes-gzip-support branch from 2cbfd60 to d64bdda Compare September 26, 2025 15:04
@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Sep 26, 2025
@AndersonQ AndersonQ force-pushed the 14238-kubernetes-gzip-support branch 3 times, most recently from 8bc127f to 2132f29 Compare September 29, 2025 08:00
@elastic-sonarqube
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
2.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@AndersonQ AndersonQ changed the title [kubernetes] adds beta support for rotated logs [kubernetes] adds beta support for rotated logs (including GZIP) Sep 29, 2025
@AndersonQ AndersonQ marked this pull request as ready for review September 29, 2025 08:48
@AndersonQ AndersonQ requested a review from a team as a code owner September 29, 2025 08:49
@elasticmachine
Copy link

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@andrewkroh andrewkroh added the Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services] label Sep 29, 2025
title: Kubernetes container rotated log path
multi: true
default:
- /var/log/pods/${kubernetes.namespace}_${kubernetes.pod.name}_${kubernetes.pod.uid}/${kubernetes.container.name}/*.log.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it suspicious that for the paths our default is /var/log/containers/*${kubernetes.container.id}.lo but for the rotated_logs_paths it's a completely different /var/log/pods/${kubernetes.namespace}_${kubernetes.pod.name}_${kubernetes.pod.uid}/${kubernetes.container.name}/*.log.*.

I have a couple of questions here:

  1. Why do we even need the new rotated_logs_paths parameters? Why can't we just add a second line to the paths default?
  2. How did we determine that the rotated container logs are stored in /var/log/pods/${kubernetes.namespace}_${kubernetes.pod.name}_${kubernetes.pod.uid}/${kubernetes.container.name}/*.log.*? Can we put a documentation reference as a comment near this default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we even need the new rotated_logs_paths parameters?

I added some context on the issue, but the tl;dr is that the paths serve different purposes. The /var/log/containers/ is deprecated and is just a collection of symlinks pointing to the active log file. To get the rotated logs, we have to target the source directory, /var/log/pods/, directly.

As for why this is a new parameter, it was the cleanest and safest approach for a few reasons:

Isolating a beta feature: Since support for rotated/gzipped logs is new, it's safer to keep it separate.

Preventing data re-ingestion: Changing the existing default paths could cause users to re-ingest logs, which we must avoid.

Simplicity/limitations: With a different path, it can be added or not depending on rotated_logs toggle, without interfering with the default to active log files.

How did we determine that the rotated container logs are stored in /var/log/pods/${kubernetes.namespace}${kubernetes.pod.name}${kubernetes.pod.uid}/${kubernetes.container.name}/.log.?

A mix of docs and reading k8s code. Unfortunately I did not find authoritative and detailed docs on this path, just the logs are in /var/log/pods/. I also explain on the issue and link the docs/source code from where I got this path. I'll add some comment here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndersonQ thanks for the clarification!

My concern is our future plan for this feature. How are we going to take it out of beta? Would this extra parameter rotated_logs_paths stay?

While it's still in Beta it might be totally fine to hard-code /var/log/pods/${kubernetes.namespace}${kubernetes.pod.name}${kubernetes.pod.uid}/${kubernetes.container.name}/.log as a part of paths in the template if rotated_logs is switched to true without adding a new rotated_logs_paths parameter. How common is it to need to configure rotated_logs_paths?

If we add this new parameter rotated_logs_paths, we commit to having it in the future too.

I'd like to see @cmacknz's take on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is our future plan for this feature. How are we going to take it out of beta? Would this extra parameter rotated_logs_paths stay?

No, we would make this the default, we just have to do it without data duplication in this case.

I think my overall preference would be to have a feature toggle for something like "ingest rotated gzip logs (beta)" that just makes all the changes necessary without exposing the individual pieces that are changed.

I think the toggles now are technically correct but way to technical for a lot of users to potentially understand. The integration is supposed to abstract away complexity and this just putting the underlying Kubernetes log structure in everyones view directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference this is what filelog does:

filelog:
  include:
    - /var/log/pods/*/*/*.log
  exclude:
    # Exclude logs from all containers named otel-collector
    - /var/log/pods/*/otel-collector/*.log
  start_at: end
  include_file_path: true
  include_file_name: false
  operators:
    # parse container logs
    - type: container
      id: container-parser

You will note they do not template an input per pod UID and they have an obvious way to exclude their own logs.

This something we should also do we are blocked by requiring data re-ingestion to make changes like this. We need some way to correct our logging configurations and change them at will without re-ingesting every log in a cluster. Several improvements are stuck behind this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to use a glob for /var/log/pod/. Also because k8s docs don't really document either pattern, the one for /var/logs/container/ and /var/log/pod/.

Because the input is templated using the container ID in the input ID, for each container we have one input, which currently works well with the paths matching a single file. However, if we add a glob in the paths, then for each container we will have one input instance ingesting files from all containers, that's gonna generate massive data duplication.

For reference this is what filelog does:

It's interesting that they set start_at: end, their default is to consider every existing file as fully ingested and only ingest new data. That helps a lot to avoid data duplication, but do not ingest old data like we do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This something we should also do we are blocked by requiring data re-ingestion to make changes like this. We need some way to correct our logging configurations and change them at will without re-ingesting every log in a cluster. Several improvements are stuck behind this problem.

We already can migrate state from one Filestream input to another. The current implementation is very conservative and requires us to set the ID from the old input we're getting the state from, however we can loose this requirement and just have input look up any state in Filestream's registry, regardless of the input it belongs to.

The more we discuss about it, the more I believe we need to re-think how we store and scope file states in Filestream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more we discuss about it, the more I believe we need to re-think how we store and scope file states in Filestream.

Oh yeah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to use a glob for /var/log/pod/. Also because k8s docs don't really document either pattern, the one for /var/logs/container/ and /var/log/pod/.

Because the input is templated using the container ID in the input ID, for each container we have one input, which currently works well with the paths matching a single file. However, if we add a glob in the paths, then for each container we will have one input instance ingesting files from all containers, that's gonna generate massive data duplication.

Oh, that's a good pint. Lat me think about it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine. The variables are substituted all together. In a simplified way:

the variables of an input are only substituted if all variables are returned by the provider
the k8s provider returns all the variables in a set with a single value for each key. In other words, for a pod with 2 containers there would be 2 sets like:

namespace: default
pod: 
  uid: 1
  name: pod-1
container:
  id: cont-1
  name cont-name-1

namespace: default
pod:
  uid: 1
  name: pod-1
container:
  id: cont-2
  name cont-name-2

that way, there is no risk of creating a combination that mixes 2 pods or containers from the same pod as each set of values are all valid and belonging to the same hierarchy

@AndersonQ
Copy link
Member Author

@rdner, @belimawr I also updated the screenshot in the PR description. please re-review at your earliest convenience

@AndersonQ AndersonQ requested review from belimawr and rdner September 30, 2025 13:14
- name: rotated_logs
type: bool
title: Use Symlinks
title: Include rotated log files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to have GZIP in the name to explain why you'd want to do this.

@AndersonQ AndersonQ requested review from a team as code owners October 9, 2025 08:41
@AndersonQ AndersonQ force-pushed the 14238-kubernetes-gzip-support branch from 8a0d219 to 37b750d Compare October 9, 2025 08:50
@AndersonQ AndersonQ removed request for a team October 9, 2025 08:51
@AndersonQ
Copy link
Member Author

@cmacknz, @rdner is it ok now?

@rdner
Copy link
Member

rdner commented Oct 21, 2025

@AndersonQ I don't think this comment from @cmacknz is addressed, is it?

My initial question remains unanswered: why do we need the new configuration parameter rotated_logs_paths when we can just add values in paths based on one feature flag/toggle rotated_logs?

If I understood correctly, @cmacknz and I are aligned on this. We should minimize the amount of new configuration fields we introduce, especially when we know this parameter won't be necessary in the future. I believe it should be just one toggle.

@AndersonQ
Copy link
Member Author

@AndersonQ I don't think this comment from @cmacknz is addressed, is it?

@rdner, I thought my explanation here addressed it.

My initial question remains unanswered: why do we need the new configuration parameter rotated_logs_paths when we can just add values in paths based on one feature flag/toggle rotated_logs?

If I understood correctly, @cmacknz and I are aligned on this. We should minimize the amount of new configuration fields we introduce, especially when we know this parameter won't be necessary in the future. I believe it should be just one toggle.

it's about allowing users to configure it if they have another path for it. I can definitely go with just a toggle, but then it won't be configurable.
I don't know how often it's needed, I'd guess pretty much as much as one needs to change the default path for logs. That minus the amount of users which change it to collect directly from /var/log/pod

Also, if for some reason the users have the logs in a custom location, other than /var/log/pod, I doubt the rotated logs would be there, thus, they need to configure it.

It's a sane default, just as we have for the active log. I don't see why we should not allow it to be configurable as well.

@AndersonQ
Copy link
Member Author

I'm happy to remove it if you 2 want it removed. However I'd not be surprised if we get requests to make it configurable.

@rdner
Copy link
Member

rdner commented Oct 21, 2025

@AndersonQ It's not about what we two want, it's about what makes sense and how we're going to present this beta feature now and take it out of beta in the future.

I see what we do here in this PR as temporary – to let our customers try it out. When we figure out how to migrate from one path to another without data duplication, we just make the path we toggle now by the rotated_logs: true a new default.

What are you going to do with the rotated_logs_paths then?

@AndersonQ
Copy link
Member Author

how we're going to present this beta feature now and take it out of beta in the future.

I see what we do here in this PR as temporary – to let our customers try it out. When we figure out how to migrate from one path to another without data duplication, we just make the path we toggle now by the rotated_logs: true a new default.

What are you going to do with the rotated_logs_paths then?

If, once we can add new paths and we migrate the default log path from the integration to /var/log/pods, it might not be necessary and we can remove it if we don't want to allow users to modify it anymore.

There is at least the windows case, the path for windows is different, so we might always need it just as we allow to modify the active logs path.
Or if we move to even removing the toggle to ingest rotated logs, again, we remove the toggle and the rotated_logs_paths.

For now, I believe we should let it be configurable. It's possible to change it later, so I don't see it as a problem.

@rdner, @cmacknz, what do you think?

@rdner
Copy link
Member

rdner commented Oct 22, 2025

@AndersonQ the problem is that in this PR we allow to configure the same Filebeat configuration parameter paths with 2 integration configuration parameters: paths and rotated_logs_paths. Furthermore, we now allow to configure rotated_logs_paths, our users will start to rely on it and then we remove it. This is a breaking change.

I'd rather avoid introducing something that is unnecessary and would lead to a breaking change later.

If we don't support the Windows case in Beta and document it – it's fine. The current objective is to give early access to the GZip ingestion, not to solve the entire use-case.

I'm going to leave the final decision to @cmacknz since it looks like my argumentation alone is not sufficient here.

@cmacknz
Copy link
Member

cmacknz commented Oct 22, 2025

When I first thought of this toggle approach I was only considering the gzip_experimental: true part which we expect to eventually become the default, so we could remove the toggle with no consequences. The additional need to change the paths complicates this quite a bit.

I also see that we can solve having people enable this purely via a documentation, for example if a user adds gzip_experimental: true in the {{custom}} block and changes the paths appropriately then we don't have to make any changes here. So my preference would be to do that.

We can change approach from this PR into instead making a PR to the beats+elastic agent documentation (because the same change works for standalone not just this integration) and include the same instructions in the relevant changelogs.

Then we don't have to worry about trying to remove these temporary settings later.

@AndersonQ
Copy link
Member Author

We can change approach from this PR into instead making a PR to the beats+elastic agent documentation (because the same change works for standalone not just this integration) and include the same instructions in the relevant changelogs.

Then we don't have to worry about trying to remove these temporary settings later.

ok then.

@cmacknz, do you have a particular agent docs in mind or you meant the k8s integration docs? On top of my head I don't know which agent docs you're referring to.

Also I believe we can update the integrations doc to either mention it or to point to the agent's doc. Thus, so far I believe the following docs can be updated to mention ingesting GZIP files:

do you have anything else in mind?

@AndersonQ AndersonQ marked this pull request as draft October 23, 2025 07:22
@cmacknz
Copy link
Member

cmacknz commented Oct 24, 2025

Also I believe we can update the integrations doc to either mention it or to point to the agent's doc.

Yes if we can document this in one single place, perhaps the filestream docs, and then just reference the new section everywhere that would be best.

The only places I see in addition we would want to mention this are:

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @AndersonQ

@AndersonQ
Copy link
Member Author

as per #15441 (comment) it'll be just a doc change. So to better track it and keep the discussions focused, I'll open another PR and use #15792 to track the task

@AndersonQ AndersonQ closed this Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. Integration:kubernetes Kubernetes Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] Team:obs-ds-hosted-services Observability Hosted Services team [elastic/obs-ds-hosted-services]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[kubernetes]: Add GZIP support for logs ingested by filestream

6 participants