Skip to content

Increased "hubble observe" CPU usage after upgrade from 1.15.15 to 1.15.16 or newer #1727

@tporeba

Description

@tporeba

I was planning to upgrade cilium images I use to run hubble observe and I noticed that newer images use significantly more CPU to do the same work.

I started from quay.io/cilium/cilium:v1.15.5 and switching to 1.15.16 or newer (1.17.x, 1.18.x) causes an increase in CPU usage ~5x in my setup.

I have 10 pods organized as daemonset, each is running 3 containers of hubble observe, each fetching a subset of logs. All those processes produce total ~30MB/min of logs for whole system.

# log for business traffic
# 20 exclusions total
hubble observe --follow --print-node-name --time-format RFC3339Milli  \
  --not --namespace kube-system \
  --not --namespace A \
  --not --namespace B \
  --not --namespace C 
...

# log for technical traffic
# 20 inclusions total
hubble observe --follow --print-node-name --time-format RFC3339Milli \
     --namespace kube-system \
     --namespace A \
     --namespace B \
     --namespace C
...

# log for dropped traffic
hubble observe --follow --print-node-name --time-format RFC3339Milli \
          --type drop --type l7 --verdict DROPPED --not --to-ip ff02::/16 

I tested a couple of cilium versions:

  • 1.15.5, 1.15.6, 1.15.12, 1.15.14, 1.15.15 --> these behave normally, my grafana shows that whole deamonset uses < 1 cpu in total
  • 1.15.16, 1.15.19, 1.16.16, 1.17.6, 1.17.9, 1.18.3 ---> for these I see increased CPU usage of almost 5 cpus total.

Here is a screen from graphana after I downgraded back to 1.15.15

Image

This is from a standard graphana Dashboard Kubernetes / Compute Resources / Namespace (Workloads), with metric plotted being more or less:

sum(
  node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace="logs"}
* on(namespace,pod)
  group_left(workload, workload_type) namespace_workload_pod:kube_pod_owner:relabel{namespace="logs"}
) by (workload, workload_type)

Here is also hubble version from inside of the containers for those 2 closest image tags:

v1.15.15 >  hubble version
hubble v1.17.1@HEAD-0d65c11 compiled with go1.23.6 on linux/amd64

v1.15.16 > root@os-workernode06:/home/cilium# hubble version
hubble v1.17.2@HEAD-aba36c0 compiled with go1.23.7 on linux/amd64

Is this intended effect or a bug?
Is new hubble version doing something more, that requires more cpu?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions