Skip to content

Commit cef6a0d

Browse files
adrianreberviktoriaasaojeaSergeyKanzhelev
committed
Introduce WG Checkpoint Restore
Co-authored-by: Viktória Spišaková <[email protected]> Co-authored-by: Antonio Ojea <[email protected]> Co-authored-by: Sergey Kanzhelev <[email protected]> Signed-off-by: Adrian Reber <[email protected]>
1 parent 910c1aa commit cef6a0d

File tree

10 files changed

+172
-0
lines changed

10 files changed

+172
-0
lines changed

OWNERS_ALIASES

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,11 @@ aliases:
146146
- kannon92
147147
- mwielgus
148148
- tenzen-y
149+
wg-checkpoint-restore-leads:
150+
- adrianreber
151+
- haircommander
152+
- rst0git
153+
- viktoriaas
149154
wg-data-protection-leads:
150155
- xing-yang
151156
- yuxiangqian

liaisons.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ members will assume one of the departing members groups.
5858
| [WG AI Gateway](wg-ai-gateway/README.md) | Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**) |
5959
| [WG AI Integration](wg-ai-integration/README.md) | Paco Xu 徐俊杰 (**[@pacoxu](https://github.com/pacoxu)**) |
6060
| [WG Batch](wg-batch/README.md) | Antonio Ojea (**[@aojea](https://github.com/aojea)**) |
61+
| [WG Checkpoint Restore](wg-checkpoint-restore/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) |
6162
| [WG Data Protection](wg-data-protection/README.md) | Patrick Ohly (**[@pohly](https://github.com/pohly)**) |
6263
| [WG Device Management](wg-device-management/README.md) | Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**) |
6364
| [WG etcd Operator](wg-etcd-operator/README.md) | Maciej Szulik (**[@soltysh](https://github.com/soltysh)**) |

sig-apps/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
5959
The following [working groups][working-group-definition] are sponsored by sig-apps:
6060
* [WG AI Integration](/wg-ai-integration)
6161
* [WG Batch](/wg-batch)
62+
* [WG Checkpoint Restore](/wg-checkpoint-restore)
6263
* [WG Data Protection](/wg-data-protection)
6364
* [WG Node Lifecycle](/wg-node-lifecycle)
6465
* [WG Serving](/wg-serving)

sig-auth/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
6666

6767
The following [working groups][working-group-definition] are sponsored by sig-auth:
6868
* [WG AI Integration](/wg-ai-integration)
69+
* [WG Checkpoint Restore](/wg-checkpoint-restore)
6970

7071

7172
## Subprojects

sig-list.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ When the need arises, a [new SIG can be created](sig-wg-lifecycle.md)
6565
|[AI Gateway](wg-ai-gateway/README.md)|[ai-gateway](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-gateway)|* Multicluster<br>* Network<br>|* [Keith Mattix](https://github.com/keithmattix), Microsoft<br>* [Flynn](https://github.com/kflynn), Buoyant<br>* [Kellen Swain](https://github.com/kfswain), Google<br>* [Nir Rozenbaum](https://github.com/nirrozenbaum), IBM<br>* [Shane Utt](https://github.com/shaneutt), Red Hat<br>* [Xunzhuo](https://github.com/xunzhuo), Tencent<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-gateway)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-gateway)|* WG AI Gateway Bi-Weekly Meeting (Earlier Option): [Mondays at 12PM UTC (bi-weekly)]()<br>* WG AI Gateway Bi-Weekly Meeting (Later Option): [Thursdays at 6PM UTC (bi-weekly)]()<br>
6666
|[AI Integration](wg-ai-integration/README.md)|[ai-integration](https://github.com/kubernetes/kubernetes/labels/wg%2Fai-integration)|* API Machinery<br>* Apps<br>* Architecture<br>* Auth<br>* CLI<br>|* [Arda Guclu](https://github.com/ardaguclu), Red Hat<br>* [Arush Sharma](https://github.com/rushmash91), Amazon<br>* [Zvonko Kaiser](https://github.com/zvonkok), NVIDIA<br>|* [Slack](https://kubernetes.slack.com/messages/wg-ai-integration)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-ai-integration)|* WG AI Integration Weekly Meeting ([calendar](https://calendar.google.com/calendar/embed?src=71ef14cc0995618018b12614c63ca482d667e2922ff5b94d9fb0cfd32d4efada%40group.calendar.google.com)): [Wednesdays at 10:00 PT (Pacific Time) (biweekly)](https://zoom.us/j/95637970280?pwd=3Ys5MQF5hKoeWDazUsMdgt5FiRxbSs.1)<br>
6767
|[Batch](wg-batch/README.md)|[batch](https://github.com/kubernetes/kubernetes/labels/wg%2Fbatch)|* Apps<br>* Autoscaling<br>* Node<br>* Scheduling<br>|* [Kevin Hannon](https://github.com/kannon92), Red Hat<br>* [Marcin Wielgus](https://github.com/mwielgus), Google<br>* [Yuki Iwai](https://github.com/tenzen-y), CyberAgent, Inc.<br>|* [Slack](https://kubernetes.slack.com/messages/wg-batch)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)|* Regular Meeting ([calendar](https://calendar.google.com/calendar/embed?src=8ulop9k0jfpuo0t7kp8d9ubtj4%40group.calendar.google.com)): [Thursdays (starting February 15th 2024)s at 3PM CET (Central European Time) (monthly)](https://zoom.us/j/98329676612?pwd=c0N2bVV1aTh2VzltckdXSitaZXBKQT09)<br>
68+
|[Checkpoint Restore](wg-checkpoint-restore/README.md)|[checkpoint-restore](https://github.com/kubernetes/kubernetes/labels/wg%2Fcheckpoint-restore)|* Apps<br>* Auth<br>* Node<br>* Scheduling<br>|* [Adrian Reber](https://github.com/adrianreber), Red Hat<br>* [Peter Hunt](https://github.com/haircommander), Red Hat<br>* [Radostin Stoyanov](https://github.com/rst0git), University of Oxford<br>* [Viktória Spišaková](https://github.com/viktoriaas), Masaryk University<br>|* [Slack](https://kubernetes.slack.com/messages/wg-checkpoint-restore)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-checkpoint-restore)|
6869
|[Data Protection](wg-data-protection/README.md)|[data-protection](https://github.com/kubernetes/kubernetes/labels/wg%2Fdata-protection)|* Apps<br>* Storage<br>|* [Xing Yang](https://github.com/xing-yang), VMware<br>* [Xiangqian Yu](https://github.com/yuxiangqian), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-data-protection)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-data-protection)|* Regular WG Meeting: [Wednesdays at 9:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/j/6933410772)<br>
6970
|[Device Management](wg-device-management/README.md)|[device-management](https://github.com/kubernetes/kubernetes/labels/wg%2Fdevice-management)|* Architecture<br>* Autoscaling<br>* Network<br>* Node<br>* Scheduling<br>|* [John Belamaric](https://github.com/johnbelamaric), Google<br>* [Kevin Klues](https://github.com/klueska), NVIDIA<br>* [Patrick Ohly](https://github.com/pohly), Intel<br>|* [Slack](https://kubernetes.slack.com/messages/wg-device-management)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-device-management)|* Regular WG Meeting (Asia/Europe): [Wednesdays at 9:00 CET (Central European Time) (biweekly)](https://zoom.us/j/97238699195?pwd=cy9IMm1ZeERtRlJ3VS8yWUxHUWIrQT09)<br>* Regular WG Meeting (Europe/America): [Tuesdays at 8:30 PT (Pacific Time) (biweekly)](https://zoom.us/j/97238699195?pwd=cy9IMm1ZeERtRlJ3VS8yWUxHUWIrQT09)<br>
7071
|[etcd Operator](wg-etcd-operator/README.md)|[etcd-operator](https://github.com/kubernetes/kubernetes/labels/wg%2Fetcd-operator)|* Cluster Lifecycle<br>* etcd<br>|* [Benjamin Wang](https://github.com/ahrtr), VMware<br>* [Ciprian Hacman](https://github.com/hakman), Microsoft<br>* [Josh Berkus](https://github.com/jberkus), Red Hat<br>* [James Blair](https://github.com/jmhbnz), Red Hat<br>* [Justin Santa Barbara](https://github.com/justinsb), Google<br>|* [Slack](https://kubernetes.slack.com/messages/wg-etcd-operator)<br>* [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-etcd-operator)|* Regular WG Meeting: [Tuesdays at 11:00 PT (Pacific Time) (bi-weekly)](https://zoom.us/my/cncfetcdproject)<br>

sig-node/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
5454

5555
The following [working groups][working-group-definition] are sponsored by sig-node:
5656
* [WG Batch](/wg-batch)
57+
* [WG Checkpoint Restore](/wg-checkpoint-restore)
5758
* [WG Device Management](/wg-device-management)
5859
* [WG Node Lifecycle](/wg-node-lifecycle)
5960
* [WG Serving](/wg-serving)

sig-scheduling/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ subprojects, and resolve cross-subproject technical issues and decisions.
6363

6464
The following [working groups][working-group-definition] are sponsored by sig-scheduling:
6565
* [WG Batch](/wg-batch)
66+
* [WG Checkpoint Restore](/wg-checkpoint-restore)
6667
* [WG Device Management](/wg-device-management)
6768
* [WG Node Lifecycle](/wg-node-lifecycle)
6869
* [WG Serving](/wg-serving)

sigs.yaml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3648,6 +3648,43 @@ workinggroups:
36483648
liaison:
36493649
github: aojea
36503650
name: Antonio Ojea
3651+
- dir: wg-checkpoint-restore
3652+
name: Checkpoint Restore
3653+
mission_statement: >
3654+
This working group aims to provide a central location for the community to discuss the integration of Checkpoint/Restore functionality into Kubernetes.
3655+
3656+
charter_link: charter.md
3657+
stakeholder_sigs:
3658+
- Apps
3659+
- Auth
3660+
- Node
3661+
- Scheduling
3662+
label: checkpoint-restore
3663+
leadership:
3664+
chairs:
3665+
- github: adrianreber
3666+
name: Adrian Reber
3667+
company: Red Hat
3668+
3669+
- github: haircommander
3670+
name: Peter Hunt
3671+
company: Red Hat
3672+
3673+
- github: rst0git
3674+
name: Radostin Stoyanov
3675+
company: University of Oxford
3676+
3677+
- github: viktoriaas
3678+
name: Viktória Spišaková
3679+
company: Masaryk University
3680+
3681+
meetings: []
3682+
contact:
3683+
slack: wg-checkpoint-restore
3684+
mailing_list: https://groups.google.com/a/kubernetes.io/g/wg-checkpoint-restore
3685+
liaison:
3686+
github: BenTheElder
3687+
name: Benjamin Elder
36513688
- dir: wg-data-protection
36523689
name: Data Protection
36533690
mission_statement: >

wg-checkpoint-restore/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
<!---
2+
This is an autogenerated file!
3+
4+
Please do not edit this file directly, but instead make changes to the
5+
sigs.yaml file in the project root.
6+
7+
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
8+
--->
9+
# Checkpoint Restore Working Group
10+
11+
This working group aims to provide a central location for the community to discuss the integration of Checkpoint/Restore functionality into Kubernetes.
12+
13+
The [charter](charter.md) defines the scope and governance of the Checkpoint Restore Working Group.
14+
15+
## Stakeholder SIGs
16+
* [SIG Apps](/sig-apps)
17+
* [SIG Auth](/sig-auth)
18+
* [SIG Node](/sig-node)
19+
* [SIG Scheduling](/sig-scheduling)
20+
21+
22+
23+
## Organizers
24+
25+
* Adrian Reber (**[@adrianreber](https://github.com/adrianreber)**), Red Hat
26+
* Peter Hunt (**[@haircommander](https://github.com/haircommander)**), Red Hat
27+
* Radostin Stoyanov (**[@rst0git](https://github.com/rst0git)**), University of Oxford
28+
* Viktória Spišaková (**[@viktoriaas](https://github.com/viktoriaas)**), Masaryk University
29+
30+
## Contact
31+
- Slack: [#wg-checkpoint-restore](https://kubernetes.slack.com/messages/wg-checkpoint-restore)
32+
- [Mailing list](https://groups.google.com/a/kubernetes.io/g/wg-checkpoint-restore)
33+
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fcheckpoint-restore)
34+
- Steering Committee Liaison: Benjamin Elder (**[@BenTheElder](https://github.com/BenTheElder)**)
35+
<!-- BEGIN CUSTOM CONTENT -->
36+
37+
<!-- END CUSTOM CONTENT -->

wg-checkpoint-restore/charter.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
2+
# WG Checkpoint Restore Charter
3+
4+
This charter adheres to the conventions described in the [Kubernetes Charter README] and uses
5+
the Roles and Organization Management outlined in [sig-governance].
6+
7+
## Scope
8+
9+
The Checkpoint/Restore Working Group aims to solve the problem of transparently
10+
checkpointing and restoring workloads in Kubernetes, a [functionality discussed
11+
for over five years][kep2008]. The group will deliver the design and
12+
implementation of Checkpoint/Restore functionality in Kubernetes, serving as a
13+
central hub for community information and discussion. This initiative addresses
14+
a wide range of problems, including fault tolerance, improved resource
15+
utilization, and accelerated application startup times.
16+
17+
### In scope
18+
19+
- Identify core Kubernetes checkpoint/restore use cases (e.g., live migration,
20+
fault tolerance, debugging, snapshotting) and gather stakeholder requirements.
21+
- Investigate and propose Kubernetes APIs for checkpoint/restore operations.
22+
- Work with SIGs for the best integration of checkpoint/restore functionality
23+
and APIs.
24+
- Provide guidance for developers on checkpoint-friendly app design and
25+
recommendations for operators on feature management.
26+
- Work closely with relevant upstream projects (CRI-O, containerd, CRIU, gVisor)
27+
for alignment and integration.
28+
- Revisit the existing implementations to find and remedy possible inefficiencies.
29+
One example is the existing checkpoint archive format which has already been
30+
identified as being a major source of slowdown.
31+
32+
### Out of scope
33+
34+
- Not focused on general OS-level checkpointing outside Kubernetes
35+
pods/containers.
36+
- Will not dictate internal application checkpointing logic; focuses on
37+
Kubernetes platform orchestration of *container/pod state.
38+
39+
## Stakeholders
40+
41+
Stakeholders in this working group span multiple SIGs that own parts of the
42+
code in core kubernetes components and addons.
43+
44+
- SIG Node
45+
- SIG Scheduling
46+
- SIG Auth
47+
- SIG Apps
48+
49+
## Deliverables
50+
51+
The list of deliverables include the following high level features:
52+
53+
- In the early stage, we mainly want to offer a well-defined location for the
54+
community to find information, ask questions, and discuss the next steps of
55+
enabling checkpoint and restore in Kubernetes.
56+
57+
Later:
58+
59+
- Ability to checkpoint and restore a container using kubectl
60+
- Ability to checkpoint and restore a pod using kubectl
61+
- Integration of container/pod checkpointing in scheduling decisions
62+
63+
## Roles and Organization Management
64+
65+
This WG adheres to the Roles and Organization Management outlined in [wg-governance]
66+
and opts-in to updates and modifications to [wg-governance].
67+
68+
[wg-governance]: /committee-steering/governance/wg-governance.md
69+
70+
Additionally, the WG commits to:
71+
72+
- maintain a solid communication line between the Kubernetes groups and the
73+
wider CNCF community
74+
75+
## Timelines and Disbanding
76+
77+
As a first mandate, the WG will propose a draft roadmap and identify key tasks in the first quarter of operation.
78+
79+
After that, the WG will facilitate collaboration among community members to explore possible APIs and draft proposals for their integration into Kubernetes, which will then be presented to the relevant SIGs.
80+
81+
Achieving the aforementioned deliverables, also mentioned in the `In Scope`
82+
section, will allow us to decide when to disband this WG. There is no
83+
expectations that the Working Group will be converted into a SIG long term.
84+
85+
[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md
86+
[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
87+
[kep2008]: https://github.com/kubernetes/enhancements/issues/2008

0 commit comments

Comments
 (0)