-
Notifications
You must be signed in to change notification settings - Fork 679
Copy-on-Write (COW) Dump Implementation for Process duplication #2813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: criu-dev
Are you sure you want to change the base?
Conversation
The `criu cpuinfo check` command calls cpu_validate_cpuinfo(), which attempts to open the cpuinfo.img file using `open_image()`. If the image file is not found, `open_image()` returns an "empty image" object. As a result, `cpu_validate_cpuinfo()` tries to read from it and fails with the following error: (00.002473) Error (criu/protobuf.c:72): Unexpected EOF on (empty-image) This patch adds a check for an empty image and appropriate error message. Signed-off-by: Radostin Stoyanov <[email protected]>
Fixes a clang compile-time error: "argument unused during compilation: '-c'". Signed-off-by: Andrei Vagin <[email protected]>
Use shared first error buffer to return correct first error in rpc. Fixes: checkpoint-restore#338 Signed-off-by: Ivan Pravdin <[email protected]>
Having CTL_FLAGS_IPC_EACCES_SKIP == (CTL_FLAGS_OPTIONAL | CTL_FLAGS_READ_EIO_SKIP) is probably not what we want. So let's make it a real distinct flag. Fixes: 840735a ("ipc_sysctl: Prioritize restoring IPC variables using non usernsd approach") Signed-off-by: Pavel Tikhomirov <[email protected]>
Fixes: f38e588 ("net/sysctl: c/r ipv4/ping_group_range value") Signed-off-by: Pavel Tikhomirov <[email protected]>
We have ability to skip sysctl if there is no value, but we still give n requests to sysctl_op, that is not correct and probably can segfault on nullptr access. Fix it by adding ri to count non skipped requests. To be on the safe side, let's add a check that ri == n on read, as we should not do any skips there. While on it lets fix bad error message prefix: s/unix/ipv4/. Remove excess has_iarg set, and add sarg reset to NULL for the case sysctl_op skipped it. Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Pavel Tikhomirov <[email protected]>
We dump sysctls from criu user namespace, but restore from restored user namespace. So group id values should be mapped to the restored user namespace gid space to restore correctly. Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Pavel Tikhomirov <[email protected]>
net/unix/max_dgram_qlen can't be tuned from non-root userns before:
v5.17-rc1~170^2~215 ("net: Enable max_dgram_qlen unix sysctl to be
configurable by non-init user namespaces")
Signed-off-by: Andrei Vagin <[email protected]>
Currently there is no option to checkpoint/restore programs that use ICMP sockets, such as `ping`. This patch adds support for the same. Fixes checkpoint-restore#2557 Signed-off-by: समीर सिंह Sameer Singh <[email protected]>
Add ZDTM static tests for IP4/ICMP and IP6/ICMP socket feature. Signed-off-by: समीर सिंह Sameer Singh <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
E.g. I have a /etc/hosts in workspace mounted from the host, and get the following message. (00.141008) 1: mnt-v2: Create plain mountpoint /tmp/.criu.mntns.K1biY1/mnt-0000000938 for 938 (00.141546) 1: mnt-v2: Mounting unsupported @938 (0) (00.141887) 1: mnt-v2: Bind /tmp/agent/1-d8c746c6fda3a8b2/workspace/etc/hosts/ to /tmp/.criu.mntns.K1biY1/mnt-0000000938 (00.142179) 1: Error (criu/mount-v2.c:319): mnt-v2: Failed to open_tree /tmp/agent/1-d8c746c6fda3a8b2/workspace/etc/hosts/: Not a directory (00.143774) Error (criu/cr-restore.c:2320): Restoring FAILED. Signed-off-by: Chuan Qiu <[email protected]>
The test creates a file bindmount in criu mntns and binds it into test mntns, this external file bindmount is autodetected and restored via "--external mnt[]" criu option. Note: In previous patch we fix the problem on this code path where file bindmount restore fails as there is excess "/" in source path. Signed-off-by: Pavel Tikhomirov <[email protected]>
Currently the build scripts create the following symlink: criu-4.1/images/google/protobuf/descriptor.proto -> /usr/include/google/protobuf/descriptor.proto This symlink points to a system-wide absolute-path target. Also, this symlink ends up in the release tarball. The tarball may later be downloaded and unpacked by e.g. OS distributions. If unpacking is done using Python 3.14+, it will fail. This happens because Python 3.14 will switch the default behavior of extractall() from "fully trusting the content of archive" to "disallow common attack vectors while extracting the archive". With this new behavior, extractall() raises an exception when at least one file in the archive extracts or points to outside of the extraction directory (these are called path traversal attacks and zip slip attacks). Reported-by: Dmitrii Kuvaiskii <[email protected]> Signed-off-by: Radostin Stoyanov <[email protected]>
Commit 68f92b5 used `$$(Q)` instead of `$(Q)` in the Makefile target, which resulted in the following error: $(Q) echo "Generating descriptor.pb-c.c" /bin/sh: 1: Q: not found Generating descriptor.pb-c.c $(Q) protoc --proto_path=/usr/include --proto_path=images/ --c_out=images/ /usr/include/google/protobuf/descriptor.proto /bin/sh: 1: Q: not found as well as: $(Q) rm -rf images/google /bin/sh: line 1: Q: command not found Fix it. Signed-off-by: Kir Kolyshkin <[email protected]>
Commit 68f92b5 removed images/google/protobuf directory, so it is re-created each time during the build process. This resulted in a weird behavior change. Previously, one could do something like this: git clone $CRURL criu (cd criu && sudo make install-criu) rm -rf criu This worked fine, including running rm -rf as a non-root user, since no new directories were created under criu -- all directories were still owned by the original user. Since commit 68f92b5 the same sequence fails: rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.c': Permission denied rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.d': Permission denied rm: cannot remove '/home/runner/criu/images/google/protobuf/descriptor.pb-c.h': Permission denied A workaround is to keep empty images/google/protobuf directory, which is what this commit does. Signed-off-by: Kir Kolyshkin <[email protected]>
In general, we use "$(E)" instead of "$(Q) echo", but we also have a msg-gen macro which can be used here. Signed-off-by: Kir Kolyshkin <[email protected]>
After the CRIU process saves the parasite code for the target thread in the shared mmap, it is necessary to call __clear_cache before the target thread executes the code. Without this step, the target thread may not see the correct code to execute, which can result in a SIGILL signal. For the specific arm64 case. this is important so that the newly copied code is flushed from d-cache to RAM, so that the target thread sees the new code. The change is based on commit 6be10a2 by @fu.lin and on input received from @adrianreber. [ avagin: tweak code comment ] Signed-off-by: Ignacio Moreno Gonzalez <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
See the previous commit for rationale and architecture-specific details. [ avagin: tweak code comment ] Signed-off-by: Ignacio Moreno Gonzalez <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
A kernel change (commit 12f147ddd6de, "do_change_type(): refuse to operate on unmounted/not ours mounts") modified how mount propagation properties can be changed. Previously, these properties could be changed from any mount namespace. Now, they can only be modified from the specific mount namespace where the target mount is actually mounted This commit addresses this new restriction by ensuring that CRIU enters the correct mount namespace before attempting to restore mount propagation properties (MS_SLAVE or MS_SHARED) for a mount. Signed-off-by: Andrei Vagin <[email protected]>
Installing this package currently fails with the following message: Package qemu is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source E: Package 'qemu' has no installation candidate Signed-off-by: Radostin Stoyanov <[email protected]>
Signed-off-by: Radostin Stoyanov <[email protected]>
The tar command was failing with the following message: $ tar cf criu.tar ../../../criu tar: Removing leading `../../../' from member names tar: ../../../criu/scripts/ci/criu.tar: archive cannot contain itself; not dumped In addition, the /vagrant no-longer exist in the new Fedora images. bash: line 1: cd: /vagrant: No such file or directory Signed-off-by: Radostin Stoyanov <[email protected]>
Send large chunks to fill socket buffers. Signed-off-by: Andrei Vagin <[email protected]>
The arm64 tests are currently being executed on both actuated and GitHub runners. This change removes the actuated runner to avoid redundancy and streamline our CI process. Signed-off-by: Andrei Vagin <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Make should_dump_page to return int to indicate failure, also return useful data back through the struct page_info structure passed as a pointer. Also, correspondingly convert all call sites. No functional changes intended, except fixing a bug in should_dump_page() as it could return (-1) when pmc_fill() fails, while caller didn't expect that before. Signed-off-by: Alexander Mikhalitsyn <[email protected]>
|
@asafpamzn There are too many patches in this pull request and it would be difficult for someone to comment on the changes. The following document provides more information on how to contribute to CRIU:
I believe Mike Rapoport (@rppt) might be able to provide some advice about the idea. |
Creating a GitHub issue with more information about the use-case and why this functionality is important will help us to understand the proposed design.
There are multiple people in the community that can provide feedback. Mike is a MM maintainer for the Linux kernel and contributed many of the patches that enable post-copy migration with userfaultfd. |
Let's start with a design doc. |
|
Ack, working on a design doc |
Summary
I'm implementing a COW-based live migration feature for CRIU that uses userfaultfd write-protection to track memory modifications while the process continues running. The goal is to combine it with the lazy support in order to be able to duplicate a process to remote instance while minimizing downtime compared to traditional dump modes.
Overview
High level flow
In https://github.com/asafpamzn/criu/blob/criu-cow/criu/cr-dump.c#L1720
A new parasite to do the job
https://github.com/asafpamzn/criu/blob/criu-dev/criu/cow-dump.c#L197C1-L198C1
https://github.com/asafpamzn/criu/blob/a59a151c1e2fb6edfe899ab940698c5a412f75b1/criu/pie/parasite.c#L963
Question: I want to dump small VMAs and mark in write protect only large VMAs - How can I do it? I don't fully understand how I can combine VMAs as they are all pushed to the same page image file.
Next, a new thread is getting the page faults and transfer the process.
https://github.com/asafpamzn/criu/blob/criu-cow/criu/cr-dump.c#L1728
https://github.com/asafpamzn/criu/blob/a59a151c1e2fb6edfe899ab940698c5a412f75b1/criu/cow-dump.c#L423
https://github.com/asafpamzn/criu/blob/a59a151c1e2fb6edfe899ab940698c5a412f75b1/criu/cow-dump.c#L444
Awake the source process
https://github.com/asafpamzn/criu/blob/a59a151c1e2fb6edfe899ab940698c5a412f75b1/criu/cow-dump.c#L414
I'm in the early stages of learning the code. I will be happy to some guidance and advice.
Please let me know if it makes sense. I'm most concern about how I combine the memory areas as I want to write protect only large vmas