-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
Client version: 1.33.0 (git-v1.33.0)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Server Version: v1.33.4
3. What cloud provider are you using?
OpenStack
4. What commands did you run? What is the simplest way to reproduce this issue?
kops rolling-update cluster --validate-count 2 --bastion-interval 2m --instance-group bastions,master-az1,master-az2,master-az3,nodes-az1,nodes-az2,nodes-az3 --validation-timeout 20m --yes
5. What happened after the commands executed?
Cluster update ran while load balancer API was unstable. Node load balancer deregistration failed for each node, but the update continued.
6. What did you expect to happen?
The cluster update should abort after the first unsuccessful node load balancer deregistration.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
Kops command output:
SDK 2025/08/20 05:15:33 WARN Response has no supported checksum. Not validating response payload.
I0820 05:15:33.126370 227 create_kubecfg.go:151] unable to get user: user: Current requires cgo or $USER set in environment
I0820 05:15:36.234944 227 instancegroups.go:508] Validating the cluster.
NAME STATUS NEEDUPDATE READY MIN TARGET MAX NODES
bastions Ready 0 1 1 1 1 0
master-az1 Ready 0 1 1 1 1 1
master-az2 Ready 0 1 1 1 1 1
master-az3 Ready 0 1 1 1 1 1
nodes-az1 NeedsUpdate 1 0 1 1 1 1
nodes-az2 NeedsUpdate 1 0 1 1 1 1
nodes-az3 NeedsUpdate 1 0 1 1 1 1
I0820 05:15:38.235764 227 instancegroups.go:544] Cluster validated.
I0820 05:15:38.235793 227 instancegroups.go:342] Tainting 1 node in "nodes-az1" instancegroup.
I0820 05:15:38.274219 227 instancegroups.go:431] Draining the node: "nodes-az1-qfemgf".
I0820 05:15:40.390737 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:41.106014 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:42.291045 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:43.776610 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:46.092009 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:50.547771 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:59.236905 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:16:15.888793 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:16:49.254575 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:17:56.091020 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:20:13.179493 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:54.455283 227 loadbalancer.go:384] got error 409 retrying...
E0820 05:24:54.455360 227 rollingupdate.go:219] failed to roll InstanceGroup "nodes-az1": failed to drain node "nodes-az1-qfemgf": error deregistering instance "152db94f-fafa-471d-a219-d5d5f30fce25", node "nodes-az1-qfemgf": failed to deregister instance from load balancers: timed out waiting for the condition
I0820 05:24:54.455372 227 instancegroups.go:508] Validating the cluster.
I0820 05:24:56.627653 227 instancegroups.go:544] Cluster validated.
I0820 05:24:56.627685 227 instancegroups.go:342] Tainting 1 node in "nodes-az2" instancegroup.
I0820 05:24:56.668566 227 instancegroups.go:431] Draining the node: "nodes-az2-4xpav3".
I0820 05:24:57.730326 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:58.327227 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:58.807064 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:58.966839 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:59.681057 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:01.280384 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:02.136278 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:05.687178 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:06.581115 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:14.251399 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:15.305006 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:30.650995 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:32.402023 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:26:04.099553 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:26:07.273007 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:27:13.447574 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:27:16.637422 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:29:26.550514 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:29:31.983180 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:00.567880 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:06.194263 227 loadbalancer.go:384] got error 409 retrying...
E0820 05:34:06.194332 227 rollingupdate.go:219] failed to roll InstanceGroup "nodes-az2": failed to drain node "nodes-az2-4xpav3": error deregistering instance "e0a666cf-296f-4ca0-acb5-2371eab1b8a5", node "nodes-az2-4xpav3": failed to deregister instance from load balancers: timed out waiting for the condition
I0820 05:34:06.194347 227 instancegroups.go:508] Validating the cluster.
I0820 05:34:08.426196 227 instancegroups.go:544] Cluster validated.
I0820 05:34:08.426244 227 instancegroups.go:342] Tainting 1 node in "nodes-az3" instancegroup.
I0820 05:34:08.467404 227 instancegroups.go:431] Draining the node: "nodes-az3-zlclye".
I0820 05:34:09.576486 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:09.802331 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:10.417448 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:10.834617 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:11.273166 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:12.303061 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:13.178655 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:13.665537 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:17.468349 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:18.277489 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:26.212449 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:27.186908 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:43.705441 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:43.822280 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:35:16.946923 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:35:18.115243 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:36:22.425695 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:36:25.513390 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:38:36.309690 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:38:36.976682 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:43:04.507581 227 loadbalancer.go:384] got error 409 retrying...
I0820 05:43:10.285685 227 loadbalancer.go:384] got error 409 retrying...
E0820 05:43:10.285760 227 rollingupdate.go:219] failed to roll InstanceGroup "nodes-az3": failed to drain node "nodes-az3-zlclye": error deregistering instance "ccec663f-7e11-42a5-9969-1c6bae93eb34", node "nodes-az3-zlclye": failed to deregister instance from load balancers: timed out waiting for the condition
I0820 05:43:10.285776 227 rollingupdate.go:236] Completed rolling update for cluster "******.k8s.local" instance groups [bastions master-az1 master-az2 master-az3 nodes-az1 nodes-az2 nodes-az3]