Skip to content

Extend live migration protocol for postcopy and ondemand paging#8264

Merged
sboeuf merged 8 commits into
cloud-hypervisor:mainfrom
sboeuf:offload_snapshot
Jun 24, 2026
Merged

Extend live migration protocol for postcopy and ondemand paging#8264
sboeuf merged 8 commits into
cloud-hypervisor:mainfrom
sboeuf:offload_snapshot

Conversation

@sboeuf

@sboeuf sboeuf commented May 21, 2026

Copy link
Copy Markdown
Member

Following up the introduction of the offload daemon through #8403, the
current PR extends the live migration protocol to allow page faults to be
handled from the source (source VM or offload daemon).

In the context of remote live migration, that gives the ability to perform
what's called postcopy, while in the offload daemon case, we call it ondemand
paging. In both cases, the VMM relies on the userfaultfd mechanism.

This new feature brings the offload daemon on parity with the internal
snapshot/restore implementation.

@sboeuf sboeuf requested a review from a team as a code owner May 21, 2026 12:27
@sboeuf sboeuf force-pushed the offload_snapshot branch from 6276033 to cb69c03 Compare May 21, 2026 13:35
@phip1611 phip1611 self-requested a review May 21, 2026 14:37

@likebreath likebreath left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat idea. I like how this lets us offload functionality out of the core VMM implementation.

Overall looks good, though I haven't dug into the reference restore_daemon implementation yet. One more thought: I think we should support 'keep-alive' on the offload_snapshot endpoint, which would align better with the generic 'snapshot' expecation.

Comment thread cloud-hypervisor/src/bin/ch-remote.rs Outdated
Comment thread cloud-hypervisor/src/bin/ch-remote.rs Outdated
Comment thread vmm/src/lib.rs
@sboeuf sboeuf force-pushed the offload_snapshot branch from cb69c03 to 4dd352b Compare May 22, 2026 09:40
@sboeuf

sboeuf commented May 22, 2026

Copy link
Copy Markdown
Member Author

@likebreath thanks for the quick review :)
After having an offline conversation with @rbradford, we agreed it would be even simpler to avoid introducing a new API endpoint given this was more of an alias rather than a completely new endpoint. Therefore, I've removed the commits related to ch-remote and adding the two new endpoints.
The summary is that Cloud Hypervisor can already support something like an offload daemon thanks to its migration protocol, and this PR only introduces a reference implementation for such daemon so that we can run some integration tests.

@sboeuf sboeuf force-pushed the offload_snapshot branch from 4dd352b to 2d8062b Compare May 22, 2026 09:45

@likebreath likebreath left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we agreed it would be even simpler to avoid introducing a new API endpoint given this was more of an alias rather than a completely new endpoint.

Makes perfect sense. Some comments below about the reference daemon implementation.

Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
@sboeuf

sboeuf commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Just a summary of the proposal from this updated PR:

Goal

We'd like a way to allow CH's users to implement the features they need for snapshot/restore (things like encryption of guest RAM on the fly, or avoiding persisting the snapshot to local disk and instead send it over the network, etc...), without overloading CH with these features.

Offload daemon

One way we think this is achievable is by reusing the live migration protocol so that an offload daemon can behave as a destination VM (for the snapshot case), and as a source VM (for the restore case). The existing protocol gives us this ability and we've been able to verify that we can make snapshot/restore work from the offload daemon (almost) the same way it works with CH's internal snapshot/restore.

What's missing

One thing that is missing to be on parity with the current snapshot/restore support is userfaultfd. And given userfaultfd can't be entirely handled from the daemon (because the setup has to happen from CH's process to apply to the right VMAs), we must extend CH to support it. And given we're talking about using the live migration support, that basically means we would have to add post-copy (uffd) support to the current live migration protocol.

The proposal

Adding post-copy support to the current live migration protocol fits well with the live migration promise, and by adding it to the protocol, we can achieve both post-copy over the network AND fast restore from an offloaded daemon since we'd expect the daemon to serve pages on demand through the extended protocol.
I'd like to get some feedback since this is a first draft of how this could be shaped. Also, I've tried to keep things as simple as possible on the post-copy support for remote mirgation but we could also think about pre-copy + post-copy if we wanted to optimize migration time.

Comment thread docs/snapshot_restore.md Outdated
@sboeuf sboeuf force-pushed the offload_snapshot branch 4 times, most recently from f700950 to 069cbb4 Compare June 5, 2026 11:05
@rbradford

Copy link
Copy Markdown
Member

Still under active development so drafting.

@rbradford rbradford marked this pull request as draft June 8, 2026 12:07
@sboeuf sboeuf force-pushed the offload_snapshot branch 3 times, most recently from e53d618 to d8e6fb8 Compare June 9, 2026 13:12
@sboeuf sboeuf marked this pull request as ready for review June 9, 2026 13:12
@sboeuf

sboeuf commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Undrafted since it's now ready for reviews.

@sboeuf sboeuf force-pushed the offload_snapshot branch from d8e6fb8 to 268583c Compare June 10, 2026 08:20
@rbradford

Copy link
Copy Markdown
Member

@sboeuf Your new test failed.

@sboeuf sboeuf force-pushed the offload_snapshot branch from 268583c to 5f47c5c Compare June 10, 2026 12:03
@sboeuf

sboeuf commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

@sboeuf Your new test failed.

Yes it should be fixed now.

@sboeuf sboeuf force-pushed the offload_snapshot branch from 5f47c5c to cd9aea5 Compare June 10, 2026 15:39
@sboeuf

sboeuf commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@phip1611 @Coffeeri @saravan2 @rbradford @likebreath this PR is now ready for reviews. I wanted to highlight the fact that I can split it into two separate PRs given that the first half of this PR is about introducing the offload daemon without altering the live migration protocol, while the second part is about extending the live migration protocol to support postcopy for both live migration and the offload on-demand restore.
For now, I think this is simpler to have a full picture of what is trying to be achieved, which is why I submitted everything through this PR.

@phip1611

Copy link
Copy Markdown
Member

I really want to review this in depth but I can't manage it this week. Will handle it with priority next week

@sboeuf sboeuf force-pushed the offload_snapshot branch from d24b60f to 249910f Compare June 19, 2026 12:21
@sboeuf

sboeuf commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

@rbradford

Copy link
Copy Markdown
Member

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

Maybe update title & summary and resolve obsolete comment threads?

@sboeuf

sboeuf commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

Maybe update title & summary and resolve obsolete comment threads?

Yes this is done now. PR description is updated, all comments should be addressed and have been resolved.

@rbradford

Copy link
Copy Markdown
Member

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

Maybe update title & summary and resolve obsolete comment threads?

Yes this is done now. PR description is updated, all comments should be addressed and have been resolved.

Title still says "Introduce offloaded snapshot/restore" - I think it has already been introduced.

@sboeuf sboeuf changed the title Introduce offloaded snapshot/restore Extend live migration protocol for postcopy and ondemand paging Jun 19, 2026
@sboeuf sboeuf force-pushed the offload_snapshot branch from 249910f to 2bf2123 Compare June 19, 2026 14:28

@rbradford rbradford left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good but I would go through and check that there is consistency in the terminology. I see ondemand and on-demand and fault and connection and dial and channel all intermixed.

Feel free to merge after carefully reviewing that.

Comment thread vm-migration/src/protocol.rs Outdated
Comment thread vm-migration/src/protocol.rs Outdated
Comment thread vmm/src/lib.rs Outdated
Comment thread vmm/src/lib.rs Outdated
Comment thread vmm/src/lib.rs Outdated
Comment thread vmm/src/lib.rs Outdated
@sboeuf sboeuf force-pushed the offload_snapshot branch 2 times, most recently from 35e6800 to ba75bec Compare June 22, 2026 14:18
@phip1611

Copy link
Copy Markdown
Member

I'm going to review this tomorrow!

@sboeuf sboeuf force-pushed the offload_snapshot branch from ba75bec to a5e6e95 Compare June 23, 2026 09:21

@phip1611 phip1611 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @sboeuf - merci!

Since live migration should be a first-class citizen in CH, my review is rather comprehensive and partly nitpicky - sorry about that 😁

I think we are close, though. No critical concerns from my side, and I am happy to help bring this over the finish line together.

Comment thread vm-migration/src/protocol.rs
Comment thread vm-migration/src/protocol.rs
Comment thread vm-migration/src/protocol.rs Outdated
Comment thread vmm/src/memory_manager.rs Outdated
Comment thread vmm/src/api/mod.rs
Comment thread offload_daemon/src/main.rs
Comment thread vmm/src/lib.rs Outdated
Comment thread vmm/src/lib.rs Outdated
Comment thread vmm/src/lib.rs Outdated
Command::PageFault => {
let range = MemoryRange::read_from(&mut socket)?;
let len = range.length as usize;
const MAX_PAGE: usize = 2 << 20; // 2 MiB

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I am pretty sure that we should just transfer 4k pages (in the case of TCP). I also think KVM is doing transparent huge-page splitting into 4k.

Could you please elaborate what the current situation with huge pages and normal pages is and perhaps document here in the code somehow?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the guest memory on the destination VM is backed by hugepages from hugetlbfs (I'm not talking about THP)? That means these pages can be 2M or 1G in size, and therefore the page content that is expected must be 2M or 1G in size. (I need to fix this to handle 1G as well).
Am I missing something? Are we not supporting hugetlbfs with live migration?

@phip1611 phip1611 Jun 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking in the wrong direction - apologies. Everything is fine. I was mixing this up with precopy, where dirty logging does transparent 4K splitting. There is no need for transparent huge page splitting in a postcopy setup where we do not have to transmit the same pages over and over again.

Comment thread vmm/src/lib.rs
sboeuf added 7 commits June 23, 2026 08:31
Introducing PageFault as the new wire command needed by both postcopy
live migration and on demand restore from the offload daemon. This new
command describes the need from the destination to fault the page
content in. This request describes the page through a MemoryRange
structure, and the response can be either 0 or the actual page size.

In case it is 0, that means the source had access to the guest memory
and was able to copy the page content directly. In case the response is
the actual page size, there is a payload associated which contains the
page content.

We can expect local live migration and offload restore to run locally
and therefore have access to the guest memory. The remote live migration
over the network is the case where we would expect the page content to
be sent over the wire.

This command is served through an additional connection happening on the
UNIX or TCP socket. The goal is to keep the same codepath between local
and remote migrations. This additional channel allows PageFault commands
to be issued asynchronously so they can be served without blocking the
main connection.

A connection role is introduced in order to identify an additional
connection related to pre-copy memory versus the newly introduced
channel for serving post-copy requests.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Extract the page content provider out of the userfaultfd handler so it
can be plugged with different backends in followup commits.

No functional change intended.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Introducing a migration mode to both sides of the migration (send and
receive), so that a user can desribe which way the memory should be
migrated between the source and destination VMs.

For now, we only introduce `precopy` and `postcopy` as viable options,
but we can expect other modes (more optimized) to be added in the
future.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-8
Add the socket-backed UffdMemorySource that resolves each fault by
sending a Command::PageFault request to the peer over a dedicated fault
connection.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Plumb the SocketUffdMemorySource into the receiving side of live
migration. When memory_mode=postcopy is requested, the destination
brings up a dedicated fault connection, registers userfaultfd on the
restored memory regions, and serves guest pages on demand over that
connection while the VM resumes early.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Add an --ondemand flag to the offload daemon's restore subcommand to
support the post-copy mechanism from the live migration protocol.

In on-demand mode, the daemon creates empty memfds to back the guest
memory and sends them over to the VMM. This lets the VM start quickly,
right after the memfds are mapped into CH's address space.

At runtime, when the guest accesses a page (or the prefault handler
requests it), the daemon faults it in by copying the page content into
its shared memory mapping, then replies to the PageFault request so the
VMM can consider the page present.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Wire up the source side of postcopy migration over TCP. When
`mode=postcopy` is requested on vm.send-migration, the source skips
the pre-copy dirty-tracking loop and lets the destination resume early,
then serves guest pages on demand over a dedicated connection.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
@sboeuf sboeuf force-pushed the offload_snapshot branch from a5e6e95 to deb894e Compare June 23, 2026 17:11
@phip1611

phip1611 commented Jun 23, 2026

Copy link
Copy Markdown
Member

Thanks for incorporating my remarks. Will review again tomorrow morning!

Applying seccomp filtering to the migration postcopy thread running on
the source VM during migration.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-8
@sboeuf sboeuf force-pushed the offload_snapshot branch from deb894e to d26ed86 Compare June 23, 2026 18:14
@sboeuf

sboeuf commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Thanks for incorporating my remarks. Will review again tomorrow morning!

Sounds good! I think I've addressed all of your comments, please let me know if I missed/forgot anything.

@phip1611 phip1611 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@sboeuf sboeuf added this pull request to the merge queue Jun 24, 2026
Merged via the queue into cloud-hypervisor:main with commit cc98a23 Jun 24, 2026
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants