Cluster update should abort after the first failed load balancer node deregistration

/kind bug

**1. What `kops` version are you running? The command `kops version`, will display
 this information.**
Client version: 1.33.0 (git-v1.33.0)

**2. What Kubernetes version are you running? `kubectl version` will print the
 version if a cluster is running or provide the Kubernetes version specified as
 a `kops` flag.**
Server Version: v1.33.4

**3. What cloud provider are you using?**
OpenStack

**4. What commands did you run?  What is the simplest way to reproduce this issue?**
kops rolling-update cluster --validate-count 2 --bastion-interval 2m --instance-group bastions,master-az1,master-az2,master-az3,nodes-az1,nodes-az2,nodes-az3 --validation-timeout 20m --yes

**5. What happened after the commands executed?**
Cluster update ran while load balancer API was unstable. Node load balancer deregistration failed for each node, but the update continued.

**6. What did you expect to happen?**
The cluster update should abort after the first unsuccessful node load balancer deregistration.

**7. Please provide your cluster manifest. Execute
  `kops get --name my.example.com -o yaml` to display your cluster manifest.
  You may want to remove your cluster name and other sensitive information.**

**8. Please run the commands with most verbose logging by adding the `-v 10` flag.
  Paste the logs into this report, or in a gist and provide the gist link here.**

**9. Anything else do we need to know?**

Kops command output:
```
SDK 2025/08/20 05:15:33 WARN Response has no supported checksum. Not validating response payload.
I0820 05:15:33.126370     227 create_kubecfg.go:151] unable to get user: user: Current requires cgo or $USER set in environment
I0820 05:15:36.234944     227 instancegroups.go:508] Validating the cluster.
NAME		STATUS		NEEDUPDATE	READY	MIN	TARGET	MAX	NODES
bastions	Ready		0		1	1	1	1	0
master-az1	Ready		0		1	1	1	1	1
master-az2	Ready		0		1	1	1	1	1
master-az3	Ready		0		1	1	1	1	1
nodes-az1	NeedsUpdate	1		0	1	1	1	1
nodes-az2	NeedsUpdate	1		0	1	1	1	1
nodes-az3	NeedsUpdate	1		0	1	1	1	1
I0820 05:15:38.235764     227 instancegroups.go:544] Cluster validated.
I0820 05:15:38.235793     227 instancegroups.go:342] Tainting 1 node in "nodes-az1" instancegroup.
I0820 05:15:38.274219     227 instancegroups.go:431] Draining the node: "nodes-az1-qfemgf".
I0820 05:15:40.390737     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:41.106014     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:42.291045     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:43.776610     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:46.092009     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:50.547771     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:15:59.236905     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:16:15.888793     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:16:49.254575     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:17:56.091020     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:20:13.179493     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:54.455283     227 loadbalancer.go:384] got error 409 retrying...
E0820 05:24:54.455360     227 rollingupdate.go:219] failed to roll InstanceGroup "nodes-az1": failed to drain node "nodes-az1-qfemgf": error deregistering instance "152db94f-fafa-471d-a219-d5d5f30fce25", node "nodes-az1-qfemgf": failed to deregister instance from load balancers: timed out waiting for the condition
I0820 05:24:54.455372     227 instancegroups.go:508] Validating the cluster.
I0820 05:24:56.627653     227 instancegroups.go:544] Cluster validated.
I0820 05:24:56.627685     227 instancegroups.go:342] Tainting 1 node in "nodes-az2" instancegroup.
I0820 05:24:56.668566     227 instancegroups.go:431] Draining the node: "nodes-az2-4xpav3".
I0820 05:24:57.730326     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:58.327227     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:58.807064     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:58.966839     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:24:59.681057     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:01.280384     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:02.136278     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:05.687178     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:06.581115     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:14.251399     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:15.305006     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:30.650995     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:25:32.402023     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:26:04.099553     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:26:07.273007     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:27:13.447574     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:27:16.637422     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:29:26.550514     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:29:31.983180     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:00.567880     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:06.194263     227 loadbalancer.go:384] got error 409 retrying...
E0820 05:34:06.194332     227 rollingupdate.go:219] failed to roll InstanceGroup "nodes-az2": failed to drain node "nodes-az2-4xpav3": error deregistering instance "e0a666cf-296f-4ca0-acb5-2371eab1b8a5", node "nodes-az2-4xpav3": failed to deregister instance from load balancers: timed out waiting for the condition
I0820 05:34:06.194347     227 instancegroups.go:508] Validating the cluster.
I0820 05:34:08.426196     227 instancegroups.go:544] Cluster validated.
I0820 05:34:08.426244     227 instancegroups.go:342] Tainting 1 node in "nodes-az3" instancegroup.
I0820 05:34:08.467404     227 instancegroups.go:431] Draining the node: "nodes-az3-zlclye".
I0820 05:34:09.576486     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:09.802331     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:10.417448     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:10.834617     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:11.273166     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:12.303061     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:13.178655     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:13.665537     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:17.468349     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:18.277489     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:26.212449     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:27.186908     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:43.705441     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:34:43.822280     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:35:16.946923     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:35:18.115243     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:36:22.425695     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:36:25.513390     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:38:36.309690     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:38:36.976682     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:43:04.507581     227 loadbalancer.go:384] got error 409 retrying...
I0820 05:43:10.285685     227 loadbalancer.go:384] got error 409 retrying...
E0820 05:43:10.285760     227 rollingupdate.go:219] failed to roll InstanceGroup "nodes-az3": failed to drain node "nodes-az3-zlclye": error deregistering instance "ccec663f-7e11-42a5-9969-1c6bae93eb34", node "nodes-az3-zlclye": failed to deregister instance from load balancers: timed out waiting for the condition
I0820 05:43:10.285776     227 rollingupdate.go:236] Completed rolling update for cluster "******.k8s.local" instance groups [bastions master-az1 master-az2 master-az3 nodes-az1 nodes-az2 nodes-az3]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster update should abort after the first failed load balancer node deregistration #17649

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster update should abort after the first failed load balancer node deregistration #17649

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions