- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
Description
Describe the bug
There's a bug causing metadata inconsistencies during rollouts due to fragile pod update logic. When updating ephemeral metadata, if any pod update fails (due to deletion or concurrent modification causing conflicts), the entire process halts for the remaining pods. This leaves many pods with outdated metadata, directly impacting metric accuracy. This problem is reproducible in high-replica scenarios by deleting pods while the system attempts to apply stable metadata. The goal is for all running pods to consistently reflect the stable metadata post-rollout.
To Reproduce
1-Create a Rollout resource configured with a high number of replicas.
2- Update the Rollout's specification to trigger a progressive update.
3- Wait until the canary ReplicaSet is fully scaled (100%) and its definition includes the stable metadata.
4- Observe as the system starts applying the stable metadata to the individual pods managed by the canary ReplicaSet.
5- While this metadata update is in progress, manually delete some pods that still belong to the canary ReplicaSet but haven't yet received the stable metadata.
6- Notice that a single failure (attempting to update a deleted pod) disrupts the process, causing many other running pods to remain with outdated (canary) metadata instead of being updated to stable.
Expected behavior
Expect all running pods to have the stable metadata
Version
1.8.2
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.