Skip to content
This repository was archived by the owner on Jul 30, 2021. It is now read-only.

Commit 2ab502e

Browse files
committed
checkpointer: add 5m grace period flag.
When nodes reboot, such as in the TestReboot e2e test case, it can take a while for the cluster to get stable due to the dependency chain between the apiserver, flannel, and the controller manager and so on. If the controller manager was in the middle of doing something (e.g. rolling the apiserver) while a reboot occurs, we need to ensure that the controller manager gets healthy again. This requires keeping the checkpointed apiserver up. The downside is that this may run pods considerably longer than they ought to. However, this is a failure recovery scenario, and running an old pod is not a huge violation of k8s semantics (daemonsets strive for 1-at-a-time semantics but don't guarantee it). This should alleviate the flakes observed in #824.
1 parent 506fed7 commit 2ab502e

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

pkg/asset/internal/templates.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,7 @@ spec:
293293
- /checkpoint
294294
- --lock-file=/var/run/lock/pod-checkpointer.lock
295295
- --kubeconfig=/etc/checkpointer/kubeconfig
296+
- --checkpoint-grace-period=5m
296297
env:
297298
- name: NODE_NAME
298299
valueFrom:

0 commit comments

Comments
 (0)