This repository was archived by the owner on Jul 30, 2021. It is now read-only.
Commit 2ab502e
committed
checkpointer: add 5m grace period flag.
When nodes reboot, such as in the TestReboot e2e test case, it can take
a while for the cluster to get stable due to the dependency chain
between the apiserver, flannel, and the controller manager and so on.
If the controller manager was in the middle of doing something (e.g.
rolling the apiserver) while a reboot occurs, we need to ensure that the
controller manager gets healthy again. This requires keeping the
checkpointed apiserver up.
The downside is that this may run pods considerably longer than they
ought to. However, this is a failure recovery scenario, and running an
old pod is not a huge violation of k8s semantics (daemonsets strive for
1-at-a-time semantics but don't guarantee it).
This should alleviate the flakes observed in #824.1 parent 506fed7 commit 2ab502e
1 file changed
+1
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
| 296 | + | |
296 | 297 | | |
297 | 298 | | |
298 | 299 | | |
| |||
0 commit comments