Skip to content

Conversation

@JackThomson2
Copy link
Contributor

Description

Adding support for virtio-balloon features: Free page hinting and reporting.

TODO

Update documentation on the update balloon features
Update release notes

...

Reason

...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@JackThomson2 JackThomson2 changed the title virtio-balloon: Add free page reporting reporting virtio-balloon: Add free page reporting hinting Oct 24, 2025
@JackThomson2 JackThomson2 force-pushed the free_page_hinting_reporting branch 2 times, most recently from c96f7be to 8ef7916 Compare October 24, 2025 14:38
@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 83.51648% with 60 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.74%. Comparing base (30c04f0) to head (8dcf4fd).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/rpc_interface.rs 4.54% 21 Missing ⚠️
src/vmm/src/lib.rs 0.00% 19 Missing ⚠️
...rc/vmm/src/devices/virtio/balloon/event_handler.rs 60.86% 9 Missing ⚠️
src/vmm/src/devices/virtio/balloon/device.rs 95.95% 8 Missing ⚠️
src/vmm/src/devices/virtio/transport/pci/device.rs 0.00% 2 Missing ⚠️
src/vmm/src/devices/virtio/balloon/persist.rs 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5491      +/-   ##
==========================================
- Coverage   82.75%   82.74%   -0.01%     
==========================================
  Files         269      269              
  Lines       27798    28126     +328     
==========================================
+ Hits        23003    23274     +271     
- Misses       4795     4852      +57     
Flag Coverage Δ
5.10-m5n.metal 82.90% <83.51%> (-0.01%) ⬇️
5.10-m6a.metal 82.17% <83.51%> (+<0.01%) ⬆️
5.10-m6g.metal 79.61% <83.51%> (+0.04%) ⬆️
5.10-m6i.metal 82.90% <83.51%> (-0.01%) ⬇️
5.10-m7a.metal-48xl 82.17% <83.51%> (+0.01%) ⬆️
5.10-m7g.metal 79.61% <83.51%> (+0.03%) ⬆️
5.10-m7i.metal-24xl 82.87% <83.51%> (-0.01%) ⬇️
5.10-m7i.metal-48xl 82.87% <83.51%> (+<0.01%) ⬆️
5.10-m8g.metal-24xl 79.61% <83.51%> (+0.03%) ⬆️
5.10-m8g.metal-48xl 79.61% <83.51%> (+0.03%) ⬆️
6.1-m5n.metal 82.93% <83.51%> (-0.01%) ⬇️
6.1-m6a.metal 82.21% <83.51%> (+<0.01%) ⬆️
6.1-m6g.metal 79.61% <83.51%> (+0.04%) ⬆️
6.1-m6i.metal 82.93% <83.51%> (-0.01%) ⬇️
6.1-m7a.metal-48xl 82.19% <83.51%> (-0.01%) ⬇️
6.1-m7g.metal 79.60% <83.51%> (+0.04%) ⬆️
6.1-m7i.metal-24xl 82.94% <83.51%> (-0.01%) ⬇️
6.1-m7i.metal-48xl 82.94% <83.51%> (-0.01%) ⬇️
6.1-m8g.metal-24xl 79.60% <83.51%> (+0.03%) ⬆️
6.1-m8g.metal-48xl 79.61% <83.51%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Free page reporting is a mechanism in which the guest will notify the
host of pages which are not currently in use. This feature can only be
configured on boot and will continue to report continuously.

With free page reporting firecracker will `MADV_DONTNEED` on the ranges
reported. This allows the host to free up memory and reduce the RSS of
the VM. With UFFD this is sent as the `UFFD_EVENT_REMOVE` after the call
with `MADV_DONTNEED`.

Signed-off-by: Jack Thomson <[email protected]>
Free page hinting is a mechanism which allows the guest driver to report
ranges of pages to the host device. A "hinting" run is triggered by the
device by issuing a new command id in the config space, after the update
to the id the device will hint ranges to the host which are unused. Once
the driver has exhausted all free ranges it notifies the device the run
has completed. The device can then issue another command allowing the
guest to reclaim these pages.

Adding support for hinting the firecracker balloon device, we offer
three points to manage the device; first to start a run, second to
monitor the status and a final to issue the command to allow the guest
to reclaim pages.

To note, there is a potential condition in the linux driver which would
allow a range to be reclaimed in an oom scenario before we remove the
range.

Signed-off-by: Jack Thomson <[email protected]>
Adding API endpoints to manage free page hinting . With
three different endpoint: Start - To begin a new run for free page
hinting, Status - To track the state of the hinting run, Stop - To stop
the hinting run and allow the guest to reclaim the pages reported.

Signed-off-by: Jack Thomson <[email protected]>
Add metrics to track free page hinting and reporting. For both devices
track the number of ranges reported, the number of errors encountered
while freeing and the total amount of memory freed.

Signed-off-by: Jack Thomson <[email protected]>
@JackThomson2 JackThomson2 force-pushed the free_page_hinting_reporting branch from 8ef7916 to 5a1b473 Compare October 27, 2025 12:07
Adding new resources to the http api to enable testing of the hinting
functionality.

Signed-off-by: Jack Thomson <[email protected]>
Add integration tests for free page hinting and reporting, both
functional and performance tests.

Update fast_page_helper so it can run in a oneshot mode, not requiring
the signal to track the performance.

New functional tests to ensure that hinting and reporting are reducing
the RSS as expected in the guest. Updated reduce RSS test to touch
memory to reduce the chance of flakiness.

New performance tests for the balloon device. First being a test to
track the CPU overhead of hinting and reporting. Second being a test to
measure the faulting latency while reporting is running in the guest.

Signed-off-by: Jack Thomson <[email protected]>
Add integration tests for free page hinting and reporting. Asserting the
features are enabled correctly. Testing the config space updates
triggered by hinting are being set as expected.

Signed-off-by: Jack Thomson <[email protected]>
While the traditional balloon device would not be able to reclaim memory
when back by huge pages, it could still technically be used to to
restrict memory usage in the guest.

With the addition of hinting and reporting, they report ranges in bigger
sizes (4mb by default). Because of this, it is possible for the host
reclaim huge pages backing the guest.

Updates the performance tests for the balloon when back by huge pages,
added varients to the size reduction tests to ensure hinting and
reporting can reduce the RSS of the guest.

Move the inflation test to performance to ensure it runs sequentially in
CI otherwise the host can be exhausted of huge pages.

Signed-off-by: Jack Thomson <[email protected]>
@JackThomson2 JackThomson2 force-pushed the free_page_hinting_reporting branch from 5a1b473 to 8dcf4fd Compare October 27, 2025 14:09
@JackThomson2 JackThomson2 changed the title virtio-balloon: Add free page reporting hinting [RFC] virtio-balloon: Add free page reporting hinting Oct 28, 2025
// The feature bitmap for virtio balloon.
const VIRTIO_BALLOON_F_STATS_VQ: u32 = 1; // Enable statistics.
const VIRTIO_BALLOON_F_DEFLATE_ON_OOM: u32 = 2; // Deflate balloon on OOM.
const VIRTIO_BALLOON_F_FREE_PAGE_REPORTING: u32 = 5; // Enable free page reportin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g

EventFd::new(libc::EFD_NONBLOCK).map_err(BalloonError::EventFd)?,
EventFd::new(libc::EFD_NONBLOCK).map_err(BalloonError::EventFd)?,
EventFd::new(libc::EFD_NONBLOCK).map_err(BalloonError::EventFd)?,
EventFd::new(libc::EFD_NONBLOCK).map_err(BalloonError::EventFd)?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: at this stage a loop would look sensible

parameters:
- name: body
in: body
description: When the device completes the hinting whether we shoud automatically ack this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/shoud/should/

#include <sys/mman.h> // mmap
#include <time.h> // clock_gettime
#include <fcntl.h> // open
#include <getopt.h> // getopt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can extract the update to the helper into a commit and explain the changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will do!


time.sleep(1)

# Get the firecracker pid, and open an ssh connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the bit about the ssh connection relevant here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll drop that good catch

time.sleep(1)
microvm.api.balloon_hinting_start.patch()
elif method == "reporting":
time.sleep(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why reporting requires a longer delay than hinting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting is expected to start in ~2 seconds and hinting in my testing takes ~200ms so that's why I've picked these. I can add a comment as they do seem like magic numbers


# Wait for the deflate to complete.
_ = get_stable_rss_mem_by_pid(firecracker_pid)
if method == "none":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not a "traditional" device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thanks

}

#[test]
fn test_process_hinting() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider splitting this test into multiple self-contained ones, each testing its own scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do thanks!

with attempt:
return int(self.jailer.pid_file.read_text(encoding="ascii"))

@cached_property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can go to a separate commit as test refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pulled from the mem hot-plugging PR so will be able to drop it once that lands :)

time.sleep(sleep_duration)


# pylint: disable=C0103
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: moving the test before making changes can got to a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants