-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Which image of the operator are you using? 1.14.0
Where do you run it - cloud or metal? Vanilla K8S/Bare metall
Are you running Postgres Operator in production? Yes
Type of issue? Bug report
Problem Description
We're experiencing a resource leak where EndpointSlices for -repl services are not being cleaned up after the parent Service is deleted. This has triggered quota alerts in our namespace.
Observed Behavior
When a PostgreSQL cluster is deleted, the operator properly deletes the replica service (*-repl)
However, the associated EndpointSlices remain in the namespace as orphaned resources
This only affects -repl services - other services (-config, master endpoint) clean up properly
The EndpointSlices have proper ownerReferences pointing to the Service:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: Service
name: endpoints-keys-test-repl
uid: 4d904c2d-cfff-4ee1-bd50-3215dc16de82Reproduction Steps
Create a Spilo PostgreSQL cluster
Delete the cluster
Check for remaining EndpointSlices:
kubectl get endpointslices -n <namespace> | grep db-repl
Orphaned EndpointSlices will remain despite parent Services being deleted
Expected Behavior
EndpointSlices should be automatically cleaned up by Kubernetes garbage collector when the parent Service is deleted (via ownerReferences).
Investigation Findings
The operator does not create an endpoint for -repl during cluster creation, only the service.
However, at the moment of cluster deletion attempts to clean this object.
From operator logs during deletion:
time="2025-10-30T11:26:09Z" level=debug msg="deleting replica endpoint" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3
time="2025-10-30T11:26:09Z" level=info msg="replica endpoint \"service-keys/endpoints-keys-test-repl\" has been deleted" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3
time="2025-10-30T11:26:09Z" level=debug msg="deleting replica service" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3
time="2025-10-30T11:26:09Z" level=info msg="replica service \"service-keys/endpoints-keys-test-repl\" has been deleted" cluster-name=service-keys/endpoints-keys-test pkg=cluster worker=3
From Kubernetes API logs, the garbage collector attempts to patch the EndpointSlice but doesn't delete it:
Oct 30, 2025 @ 11:26:09.840 /apis/discovery.k8s.io/v1/namespaces/service-keys/endpointslices/endpoints-keys-test-repl-9p7pf
ResponseComplete patch
system:serviceaccount:kube-system:generic-garbage-collector
Hypothesis
It appears the operator may be deleting the Service with orphan propagation policy instead of background/foreground, which would explain why the garbage collector only patches but never deletes the EndpointSlices.
Impact
Resource quota exhaustion in namespaces with many demo/test databases
Manual cleanup required periodically
Affects all -repl services across multiple clusters