Skip to content

Introduce offloaded snapshot/restore#8264

Open
sboeuf wants to merge 11 commits into
cloud-hypervisor:mainfrom
sboeuf:offload_snapshot
Open

Introduce offloaded snapshot/restore#8264
sboeuf wants to merge 11 commits into
cloud-hypervisor:mainfrom
sboeuf:offload_snapshot

Conversation

@sboeuf

@sboeuf sboeuf commented May 21, 2026

Copy link
Copy Markdown
Member

By relying on the existing local live migration support and reusing the
semantics and the protocol associated with it, we intend to provide a
way for snapshotting and restoring a VM to/from a dedicated process that
we can call the offload daemon.

By allowing an external to perform the snapshot/restore actions on
behalf of Cloud Hypervisor, we give our users the opportunity to
implement their own offloaded daemon. The goal is to avoid bloating
Cloud Hypervisor with numerous features related to snapshot/restore, and
let the user decide how to perform the snapshot/restore actions. One
example is that we can decide to encrypt the guest RAM on the fly in
order to avoid writing an unencrypted version to local disk. Another
example is to be able to send guest RAM and associated state/config data
over the network without having to persist the data first to local
storage.

There might be other reasons to choose going with an offloaded daemon to
perform the snapshot/restore of the VM, but in every case, this empowers
the user to make their own choice.

Also, given we'd like to be able to support userfaultfd mechanism to be on
parity between the internal snapshot/restore implementation and the offload
daemon proposal, the post-copy feature had to be added to live migration.
With live migration protocol now supporting post-copy, we can expect both
remote migration over TCP to be performed with the post-copy mechanism,
as well as offloading VM restore with the ability to let the daemon fault the
pages.

This is a large PR that might need to be cut into smaller pieces, but it gives a
global understanding of what is the end goal here and what it takes to achieve
it.

@sboeuf sboeuf requested a review from a team as a code owner May 21, 2026 12:27
@sboeuf sboeuf force-pushed the offload_snapshot branch from 6276033 to cb69c03 Compare May 21, 2026 13:35
@phip1611 phip1611 self-requested a review May 21, 2026 14:37

@likebreath likebreath left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat idea. I like how this lets us offload functionality out of the core VMM implementation.

Overall looks good, though I haven't dug into the reference restore_daemon implementation yet. One more thought: I think we should support 'keep-alive' on the offload_snapshot endpoint, which would align better with the generic 'snapshot' expecation.

Comment thread cloud-hypervisor/src/bin/ch-remote.rs Outdated
Comment thread cloud-hypervisor/src/bin/ch-remote.rs Outdated
Comment thread vmm/src/lib.rs
@sboeuf sboeuf force-pushed the offload_snapshot branch from cb69c03 to 4dd352b Compare May 22, 2026 09:40
@sboeuf

sboeuf commented May 22, 2026

Copy link
Copy Markdown
Member Author

@likebreath thanks for the quick review :)
After having an offline conversation with @rbradford, we agreed it would be even simpler to avoid introducing a new API endpoint given this was more of an alias rather than a completely new endpoint. Therefore, I've removed the commits related to ch-remote and adding the two new endpoints.
The summary is that Cloud Hypervisor can already support something like an offload daemon thanks to its migration protocol, and this PR only introduces a reference implementation for such daemon so that we can run some integration tests.

@sboeuf sboeuf force-pushed the offload_snapshot branch from 4dd352b to 2d8062b Compare May 22, 2026 09:45

@likebreath likebreath left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we agreed it would be even simpler to avoid introducing a new API endpoint given this was more of an alias rather than a completely new endpoint.

Makes perfect sense. Some comments below about the reference daemon implementation.

Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
@sboeuf

sboeuf commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Just a summary of the proposal from this updated PR:

Goal

We'd like a way to allow CH's users to implement the features they need for snapshot/restore (things like encryption of guest RAM on the fly, or avoiding persisting the snapshot to local disk and instead send it over the network, etc...), without overloading CH with these features.

Offload daemon

One way we think this is achievable is by reusing the live migration protocol so that an offload daemon can behave as a destination VM (for the snapshot case), and as a source VM (for the restore case). The existing protocol gives us this ability and we've been able to verify that we can make snapshot/restore work from the offload daemon (almost) the same way it works with CH's internal snapshot/restore.

What's missing

One thing that is missing to be on parity with the current snapshot/restore support is userfaultfd. And given userfaultfd can't be entirely handled from the daemon (because the setup has to happen from CH's process to apply to the right VMAs), we must extend CH to support it. And given we're talking about using the live migration support, that basically means we would have to add post-copy (uffd) support to the current live migration protocol.

The proposal

Adding post-copy support to the current live migration protocol fits well with the live migration promise, and by adding it to the protocol, we can achieve both post-copy over the network AND fast restore from an offloaded daemon since we'd expect the daemon to serve pages on demand through the extended protocol.
I'd like to get some feedback since this is a first draft of how this could be shaped. Also, I've tried to keep things as simple as possible on the post-copy support for remote mirgation but we could also think about pre-copy + post-copy if we wanted to optimize migration time.

Comment thread docs/snapshot_restore.md
Anonymous memory is rejected with the same error message that local
live migration produces.
- Orchestrator-supplied network FDs (today carried by `vm.restore`'s
`net_fds` field) are **not** plumbed through `vm.receive-migration`,

@saravan2 saravan2 Jun 4, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up that I have an upcoming PR plumbing vfio_fds through vm.receive-migration over the SCM_RIGHTS channel, mirroring how vm.restore handles fd substitution. The mechanism I had implemented for vfio_fds can be extended to cover net_fds in vm.receive-migration. If you want, I can fold net_fds in my commits so that offload daemon can receive orchestrator supplied network FDs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's good to know! If you think that's not too much work then yes, otherwise we can still remove the limitation later.

@sboeuf sboeuf force-pushed the offload_snapshot branch 4 times, most recently from f700950 to 069cbb4 Compare June 5, 2026 11:05
@rbradford

Copy link
Copy Markdown
Member

Still under active development so drafting.

@rbradford rbradford marked this pull request as draft June 8, 2026 12:07
@sboeuf sboeuf force-pushed the offload_snapshot branch 3 times, most recently from e53d618 to d8e6fb8 Compare June 9, 2026 13:12
@sboeuf sboeuf marked this pull request as ready for review June 9, 2026 13:12
@sboeuf

sboeuf commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Undrafted since it's now ready for reviews.

@sboeuf sboeuf force-pushed the offload_snapshot branch from d8e6fb8 to 268583c Compare June 10, 2026 08:20
@rbradford

Copy link
Copy Markdown
Member

@sboeuf Your new test failed.

@sboeuf sboeuf force-pushed the offload_snapshot branch from 268583c to 5f47c5c Compare June 10, 2026 12:03
@sboeuf

sboeuf commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

@sboeuf Your new test failed.

Yes it should be fixed now.

sboeuf added 3 commits June 10, 2026 08:38
Expose VmMigrationConfig as a public facing structure that can be used
by an offload daemon to act as if it was the VM to migrate to, or the VM
to migrate from.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Adding a new dedicated binary that is meant to be used as a reference
implementation for validating that offloaded snapshot/restore works and
meant to be used through tests in general.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
sboeuf added 8 commits June 10, 2026 08:38
Move next_data_extent and write_region_sparse out of memory_manager.rs
into a new vmm::sparse module so the snapshot writer, the restore
reader, and the offload daemon can share one implementation.

No functional change intended.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Copy only populated extents when writing the snapshot file and when
filling the restore memfd, leaving unwritten ranges as holes. Both
the on-disk snapshot and the restored guest RAM stay sparse, so that
untouched guest pages cost no disk space or host memory.

This brings the offload daemon at feature parity with CH's internal
implementation of snapshot/restore.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Extending the snapshot/restore documentation so that it explains what
are the goals behind this offloaded snapshot/restore feature, how to use
it in practice, and also by documenting the protocol used by the offload
daemon so that anyone could write its own daemon.

By relying on the existing local live migration support and reusing the
semantics and the protocol associated with it, we intend to provide a
way for snapshotting and restoring a VM to/from a dedicated process that
we can call the offload daemon.

By allowing an external to perform the snapshot/restore actions on
behalf of Cloud Hypervisor, we give our users the opportunity to
implement their own offloaded daemon. The goal is to avoid bloating
Cloud Hypervisor with numerous features related to snapshot/restore, and
let the user decide how to perform the snapshot/restore actions. One
example is that we can decide to encrypt the guest RAM on the fly in
order to avoid writing an unencrypted version to local disk. Another
example is to be able to send guest RAM and associated state/config data
over the network without having to persist the data first to local
storage.

There might be other reasons to choose going with an offloaded daemon to
perform the snapshot/restore of the VM, but in every case, this empowers
the user to make their own choice.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Introducing PageFault as the new wire command needed by both post-copy
live migration and on-demand (fast) restore from the offload daemon.
This new command describe the need from the destination to fault the
page content in. This request describe the page through a MemoryRange
structure, and the response can be either 0 or the actual page size.

In case it's 0, that means the source had access to the guest memory and
was able to copy the page content directly. In case the response is the
actual page size, there's a payload associated which contains the page
content.

We can expect local live migration and offload restore to run locally
and therefore have access to the guest memory. The remote live migration
over the network is the case where we would expect the page content to
be sent over the wire.

This command is served through an additional connection happening on the
UNIX or TCP socket. The goal is to keep the same codepath between local
and remote migrations. This additional channel allows PageFault commands
to be issued asynchronously so they can be served without blocking the
main connection.

A connection role is introduced in order to identify an additional
connection related to pre-copy memory versus the newly introduced
channel for serving post-copy requests.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Extract the page content provider out of the userfaultfd handler so it
can be plugged with different backends in followup commits.

No functional change intended.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Adding the socket backed UffdMemorySource that resolves each fault by
sending a Command::PageFault request to the peer over a dedicated fault
connection. This connection is brought up and ready to serve before
restoring the VM.

Also extending the receive-migration to accept a new postcopy boolean
parameter to let the destination know if we're expecting postcopy
migration or ondemand restore to happen.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Add a --lazy flag to the offload daemon's restore subcommand to support
the postcopy mechanism from live migration protocol.

Through this lazy mode, the daemon creates empty memfds to back the
guest memory and send them over to the VMM. This allows the VM to be
started quickly after the memfd is mapped into CH's address space.

At runtime, when the guest accesses the pages (or when the prefault
handler request the pages), the daemon faults every page by copying the
page content to its shared memory mapping. Once the page content is
copied, it replies to the PageFault request to notify the VMM that it
can consider the page present.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Adding postcopy=on knob to vm.send-migration endpoint so that a remote
migration over TCP can resume the destination's VM and stream pages on
demand instead of running the pre-copy dirty-tracking loop.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
@sboeuf sboeuf force-pushed the offload_snapshot branch from 5f47c5c to cd9aea5 Compare June 10, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants