Extend live migration protocol for postcopy and ondemand paging by sboeuf · Pull Request #8264 · cloud-hypervisor/cloud-hypervisor

sboeuf · 2026-05-21T12:27:11Z

Following up the introduction of the offload daemon through #8403, the
current PR extends the live migration protocol to allow page faults to be
handled from the source (source VM or offload daemon).

In the context of remote live migration, that gives the ability to perform
what's called postcopy, while in the offload daemon case, we call it ondemand
paging. In both cases, the VMM relies on the userfaultfd mechanism.

This new feature brings the offload daemon on parity with the internal
snapshot/restore implementation.

likebreath

Very neat idea. I like how this lets us offload functionality out of the core VMM implementation.

Overall looks good, though I haven't dug into the reference restore_daemon implementation yet. One more thought: I think we should support 'keep-alive' on the offload_snapshot endpoint, which would align better with the generic 'snapshot' expecation.

sboeuf · 2026-05-22T09:43:46Z

@likebreath thanks for the quick review :)
After having an offline conversation with @rbradford, we agreed it would be even simpler to avoid introducing a new API endpoint given this was more of an alias rather than a completely new endpoint. Therefore, I've removed the commits related to ch-remote and adding the two new endpoints.
The summary is that Cloud Hypervisor can already support something like an offload daemon thanks to its migration protocol, and this PR only introduces a reference implementation for such daemon so that we can run some integration tests.

likebreath

we agreed it would be even simpler to avoid introducing a new API endpoint given this was more of an alias rather than a completely new endpoint.

Makes perfect sense. Some comments below about the reference daemon implementation.

sboeuf · 2026-06-03T15:36:41Z

Just a summary of the proposal from this updated PR:

Goal

We'd like a way to allow CH's users to implement the features they need for snapshot/restore (things like encryption of guest RAM on the fly, or avoiding persisting the snapshot to local disk and instead send it over the network, etc...), without overloading CH with these features.

Offload daemon

One way we think this is achievable is by reusing the live migration protocol so that an offload daemon can behave as a destination VM (for the snapshot case), and as a source VM (for the restore case). The existing protocol gives us this ability and we've been able to verify that we can make snapshot/restore work from the offload daemon (almost) the same way it works with CH's internal snapshot/restore.

What's missing

One thing that is missing to be on parity with the current snapshot/restore support is userfaultfd. And given userfaultfd can't be entirely handled from the daemon (because the setup has to happen from CH's process to apply to the right VMAs), we must extend CH to support it. And given we're talking about using the live migration support, that basically means we would have to add post-copy (uffd) support to the current live migration protocol.

The proposal

Adding post-copy support to the current live migration protocol fits well with the live migration promise, and by adding it to the protocol, we can achieve both post-copy over the network AND fast restore from an offloaded daemon since we'd expect the daemon to serve pages on demand through the extended protocol.
I'd like to get some feedback since this is a first draft of how this could be shaped. Also, I've tried to keep things as simple as possible on the post-copy support for remote mirgation but we could also think about pre-copy + post-copy if we wanted to optimize migration time.

rbradford · 2026-06-08T12:07:31Z

Still under active development so drafting.

sboeuf · 2026-06-09T13:12:55Z

Undrafted since it's now ready for reviews.

rbradford · 2026-06-10T12:02:38Z

@sboeuf Your new test failed.

sboeuf · 2026-06-10T12:04:55Z

@sboeuf Your new test failed.

Yes it should be fixed now.

sboeuf · 2026-06-11T07:05:16Z

@phip1611 @Coffeeri @saravan2 @rbradford @likebreath this PR is now ready for reviews. I wanted to highlight the fact that I can split it into two separate PRs given that the first half of this PR is about introducing the offload daemon without altering the live migration protocol, while the second part is about extending the live migration protocol to support postcopy for both live migration and the offload on-demand restore.
For now, I think this is simpler to have a full picture of what is trying to be achieved, which is why I submitted everything through this PR.

phip1611 · 2026-06-11T14:21:14Z

I really want to review this in depth but I can't manage it this week. Will handle it with priority next week

phip1611

First of all, many thanks for the work you have put into this.

Before I get into implementation details, I would like to add some context. About 18 months ago, my team and I built a fairly comprehensive live-migration benchmarking infrastructure for QEMU/KVM. It allowed us to observe a VM during migration in detail from several perspectives:

the VM host and QEMU
the VM itself
an external workload connected to the VM

One of the main lessons we learned is that a production-grade migration implementation should not rely on postcopy alone. In practice, you usually want precopy first, with a switch to postcopy only if precopy does not converge.

Another important finding was that postcopy needs multiple connections to work well under realistic production workloads (QEMU still doesn't support to this day).

Concern: Single-connection postcopy

Please do not take this as criticism of the current work. I understand that this is an initial implementation and that support for additional connections can be added later. I mainly want to make sure we have a shared understanding of the limitations and requirements here.

QEMU, even today, only supports postcopy over a single connection. The precopy phase before switching to postcopy can use multiple connections (they call this "multifd"), but postcopy itself cannot. In our testing, this becomes a serious bottleneck for large VMs. For example, when migrating a VM with around 120 vCPUs and 1 TB or more of RAM while running an intensive workload, the VM can remain effectively unusable for dozens of seconds after the migration because all vCPUs compete for memory over a single connection.

For postcopy to work well in such scenarios, I believe we need at least two kinds of connection pools:

a pool of connections that proactively fetches the remaining memory in postcopy phase
a priority pool of connections used by vCPUs when they fault on missing pages

For that reason, I would be concerned about treating a single-channel postcopy design, where memory is fetched only on page faults, as sufficient for production-grade postcopy in Cloud Hypervisor. It may be a reasonable first step, but I think we should be explicit that this is not the final architecture we want for a robust production implementation. We should ensure that we do not set any technical debt into stone here.

Thanks for the great work and I'm very much looking forward to the discussion!

PS: @sboeuf I tried contacting you over Slack to schedule a meeting - is this okay or do you prefer email?

sboeuf · 2026-06-15T17:44:14Z

First of all, many thanks for the work you have put into this.

Before I get into implementation details, I would like to add some context. About 18 months ago, my team and I built a fairly comprehensive live-migration benchmarking infrastructure for QEMU/KVM. It allowed us to observe a VM during migration in detail from several perspectives:

the VM host and QEMU

the VM itself

an external workload connected to the VM

One of the main lessons we learned is that a production-grade migration implementation should not rely on postcopy alone. In practice, you usually want precopy first, with a switch to postcopy only if precopy does not converge.

Another important finding was that postcopy needs multiple connections to work well under realistic production workloads (QEMU still doesn't support to this day).

Concern: Single-connection postcopy

Please do not take this as criticism of the current work. I understand that this is an initial implementation and that support for additional connections can be added later. I mainly want to make sure we have a shared understanding of the limitations and requirements here.

QEMU, even today, only supports postcopy over a single connection. The precopy phase before switching to postcopy can use multiple connections (they call this "multifd"), but postcopy itself cannot. In our testing, this becomes a serious bottleneck for large VMs. For example, when migrating a VM with around 120 vCPUs and 1 TB or more of RAM while running an intensive workload, the VM can remain effectively unusable for dozens of seconds after the migration because all vCPUs compete for memory over a single connection.

For postcopy to work well in such scenarios, I believe we need at least two kinds of connection pools:

a pool of connections that proactively fetches the remaining memory in postcopy phase

a priority pool of connections used by vCPUs when they fault on missing pages

For that reason, I would be concerned about treating a single-channel postcopy design, where memory is fetched only on page faults, as sufficient for production-grade postcopy in Cloud Hypervisor. It may be a reasonable first step, but I think we should be explicit that this is not the final architecture we want for a robust production implementation. We should ensure that we do not set any technical debt into stone here.

Thanks for the great work and I'm very much looking forward to the discussion!

PS: @sboeuf I tried contacting you over Slack to schedule a meeting - is this okay or do you prefer email?

@phip1611 thanks for providing some context, that's interesting! I'm happy that you've spotted the two main issues in terms of long term production support, but given this was a first attempt at implementing the offload daemon with the same set of features we already have for snapshot/restore, I had to introduce postcopy support. And because I have introduced this postcopy (PageFault) support at the protocol level, I thought it would be interesting to see it happen for live migration as well.
But you're right, we could heavily improve the PageFault handling by having multiple connections for that, and I think this isn't blocked in any way for the future given we can add as many extra connections as we need (as long as we identify the ConnRole).
On the pre-copy/post-copy front, this comes down to evaluating the right trigger for moving from pre-copy to post-copy at some point during the pre-copy iterations. We can discuss further about this in a dedicated issue.
And in general, I also agree that we shouldn't treat this PR as the offload-daemon/postcopy work being done, but I felt there was already so many things in there that I wanted to make sure we agreed on the basics before iterating with more follow-up PRs.

sboeuf · 2026-06-17T08:18:26Z

@rbradford @phip1611 I've split this PR by opening a "part 1" PR #8403 with only the introduction of the new offload daemon. Once this is merged, I'll rebase the current PR.

sboeuf · 2026-06-19T12:28:31Z

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

rbradford · 2026-06-19T12:29:55Z

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

Maybe update title & summary and resolve obsolete comment threads?

sboeuf · 2026-06-19T12:40:00Z

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

Maybe update title & summary and resolve obsolete comment threads?

Yes this is done now. PR description is updated, all comments should be addressed and have been resolved.

rbradford · 2026-06-19T13:25:54Z

@rbradford @phip1611 I've updated this PR, which is rebased on latest main branch (therefore contains the recent addition for the offload daemon). It is mainly oriented around the postcopy addition to the live migration protocol, which allows both remote live migration and local offload daemon to handle ondemand paging.

Maybe update title & summary and resolve obsolete comment threads?

Yes this is done now. PR description is updated, all comments should be addressed and have been resolved.

Title still says "Introduce offloaded snapshot/restore" - I think it has already been introduced.

Introducing PageFault as the new wire command needed by both postcopy live migration and on demand restore from the offload daemon. This new command describes the need from the destination to fault the page content in. This request describes the page through a MemoryRange structure, and the response can be either 0 or the actual page size. In case it is 0, that means the source had access to the guest memory and was able to copy the page content directly. In case the response is the actual page size, there is a payload associated which contains the page content. We can expect local live migration and offload restore to run locally and therefore have access to the guest memory. The remote live migration over the network is the case where we would expect the page content to be sent over the wire. This command is served through an additional connection happening on the UNIX or TCP socket. The goal is to keep the same codepath between local and remote migrations. This additional channel allows PageFault commands to be issued asynchronously so they can be served without blocking the main connection. A connection role is introduced in order to identify an additional connection related to pre-copy memory versus the newly introduced channel for serving post-copy requests. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7

Extract the page content provider out of the userfaultfd handler so it can be plugged with different backends in followup commits. No functional change intended. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7

Introducing a migration mode to both sides of the migration (send and receive), so that a user can desribe which way the memory should be migrated between the source and destination VMs. For now, we only introduce `precopy` and `postcopy` as viable options, but we can expect other modes (more optimized) to be added in the future. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-8

Adding the socket backed UffdMemorySource that resolves each fault by sending a Command::PageFault request to the peer over a dedicated fault connection. This connection is brought up and ready to serve before restoring the VM. Also plumbing the receiving side of the live migration so that postcopy relies on the new SocketUffdMemorySource implementation. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7

Add an --ondemand flag to the offload daemon's restore subcommand to support the post-copy mechanism from the live migration protocol. In on-demand mode, the daemon creates empty memfds to back the guest memory and sends them over to the VMM. This lets the VM start quickly, right after the memfds are mapped into CH's address space. At runtime, when the guest accesses a page (or the prefault handler requests it), the daemon faults it in by copying the page content into its shared memory mapping, then replies to the PageFault request so the VMM can consider the page present. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7

Wire up the source side of postcopy migration over TCP. When `mode=postcopy` is requested on vm.send-migration, the source skips the pre-copy dirty-tracking loop and lets the destination resume early, then serves guest pages on demand over a dedicated connection. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7

rbradford

This is looking good but I would go through and check that there is consistency in the terminology. I see ondemand and on-demand and fault and connection and dial and channel all intermixed.

Feel free to merge after carefully reviewing that.

sboeuf requested a review from a team as a code owner May 21, 2026 12:27

sboeuf force-pushed the offload_snapshot branch from 6276033 to cb69c03 Compare May 21, 2026 13:35

phip1611 self-requested a review May 21, 2026 14:37

likebreath reviewed May 22, 2026

View reviewed changes

Comment thread cloud-hypervisor/src/bin/ch-remote.rs Outdated

Comment thread cloud-hypervisor/src/bin/ch-remote.rs Outdated

Comment thread vmm/src/lib.rs

sboeuf force-pushed the offload_snapshot branch from cb69c03 to 4dd352b Compare May 22, 2026 09:40

sboeuf force-pushed the offload_snapshot branch from 4dd352b to 2d8062b Compare May 22, 2026 09:45

likebreath reviewed May 28, 2026

View reviewed changes

sboeuf force-pushed the offload_snapshot branch from 2d8062b to 18152cb Compare June 3, 2026 14:53

sboeuf mentioned this pull request Jun 3, 2026

vm-migration: Add migration protocol versioning #8316

Merged

saravan2 reviewed Jun 4, 2026

View reviewed changes

Comment thread docs/snapshot_restore.md Outdated

sboeuf force-pushed the offload_snapshot branch 4 times, most recently from f700950 to 069cbb4 Compare June 5, 2026 11:05

rbradford marked this pull request as draft June 8, 2026 12:07

sboeuf force-pushed the offload_snapshot branch 3 times, most recently from e53d618 to d8e6fb8 Compare June 9, 2026 13:12

sboeuf marked this pull request as ready for review June 9, 2026 13:12

sboeuf force-pushed the offload_snapshot branch from d8e6fb8 to 268583c Compare June 10, 2026 08:20

sboeuf force-pushed the offload_snapshot branch from 268583c to 5f47c5c Compare June 10, 2026 12:03

sboeuf force-pushed the offload_snapshot branch from 5f47c5c to cd9aea5 Compare June 10, 2026 15:39

phip1611 requested changes Jun 15, 2026

View reviewed changes

Comment thread docs/snapshot_restore.md Outdated

Comment thread vm-migration/src/protocol.rs Outdated

Comment thread vmm/src/api/mod.rs Outdated

Comment thread vmm/src/lib.rs Outdated

Comment thread cloud-hypervisor/tests/integration.rs Outdated

sboeuf force-pushed the offload_snapshot branch 3 times, most recently from 647020d to 23daa83 Compare June 17, 2026 08:14

sboeuf force-pushed the offload_snapshot branch 9 times, most recently from 1425f64 to d24b60f Compare June 19, 2026 12:10

rbradford reviewed Jun 19, 2026

View reviewed changes

Comment thread offload_daemon/src/main.rs Outdated

sboeuf force-pushed the offload_snapshot branch from d24b60f to 249910f Compare June 19, 2026 12:21

sboeuf changed the title ~~Introduce offloaded snapshot/restore~~ Extend live migration protocol for postcopy and ondemand paging Jun 19, 2026

sboeuf added 6 commits June 19, 2026 07:24

sboeuf force-pushed the offload_snapshot branch from 249910f to 2bf2123 Compare June 19, 2026 14:28

rbradford approved these changes Jun 19, 2026

View reviewed changes

Comment thread vm-migration/src/protocol.rs

Comment thread vm-migration/src/protocol.rs

Comment thread vmm/src/lib.rs

Comment thread vmm/src/lib.rs

Comment thread vmm/src/lib.rs

Comment thread vmm/src/lib.rs

Conversation

sboeuf commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

likebreath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sboeuf commented May 22, 2026

Uh oh!

likebreath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sboeuf commented Jun 3, 2026

Goal

Offload daemon

What's missing

The proposal

Uh oh!

Uh oh!

rbradford commented Jun 8, 2026

Uh oh!

sboeuf commented Jun 9, 2026

Uh oh!

rbradford commented Jun 10, 2026

Uh oh!

sboeuf commented Jun 10, 2026

Uh oh!

sboeuf commented Jun 11, 2026

Uh oh!

phip1611 commented Jun 11, 2026

Uh oh!

phip1611 left a comment

Choose a reason for hiding this comment

Concern: Single-connection postcopy

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sboeuf commented Jun 15, 2026

Concern: Single-connection postcopy

Uh oh!

sboeuf commented Jun 17, 2026

Uh oh!

Uh oh!

sboeuf commented Jun 19, 2026

Uh oh!

rbradford commented Jun 19, 2026

Uh oh!

sboeuf commented Jun 19, 2026

Uh oh!

rbradford commented Jun 19, 2026

Uh oh!

rbradford left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sboeuf commented May 21, 2026 •

edited

Loading