Skip to content

Offloaded snapshot/restore (part 1)#8403

Merged
sboeuf merged 6 commits into
cloud-hypervisor:mainfrom
sboeuf:offload_snapshot_part1
Jun 18, 2026
Merged

Offloaded snapshot/restore (part 1)#8403
sboeuf merged 6 commits into
cloud-hypervisor:mainfrom
sboeuf:offload_snapshot_part1

Conversation

@sboeuf

@sboeuf sboeuf commented Jun 17, 2026

Copy link
Copy Markdown
Member

By relying on the existing local live migration support and reusing the
semantics and the protocol associated with it, we intend to provide a
way for snapshotting and restoring a VM to/from a dedicated process that
we can call the offload daemon.

By allowing an external to perform the snapshot/restore actions on
behalf of Cloud Hypervisor, we give our users the opportunity to
implement their own offloaded daemon. The goal is to avoid bloating
Cloud Hypervisor with numerous features related to snapshot/restore, and
let the user decide how to perform the snapshot/restore actions. One
example is that we can decide to encrypt the guest RAM on the fly in
order to avoid writing an unencrypted version to local disk. Another
example is to be able to send guest RAM and associated state/config data
over the network without having to persist the data first to local
storage.

There might be other reasons to choose going with an offloaded daemon to
perform the snapshot/restore of the VM, but in every case, this empowers
the user to make their own choice.

@sboeuf sboeuf requested a review from a team as a code owner June 17, 2026 08:05
@sboeuf sboeuf requested review from phip1611 and rbradford and removed request for a team June 17, 2026 08:06
@phip1611 phip1611 requested a review from tpressure June 17, 2026 08:15
@rbradford

Copy link
Copy Markdown
Member

@sboeuf Oh noes! The CI failed on your new test

@sboeuf

sboeuf commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

@sboeuf Oh noes! The CI failed on your new test

Yes I'm fixing it, that's a problem linked to the split of the original PR :)

@sboeuf sboeuf force-pushed the offload_snapshot_part1 branch from 0acc069 to 0e0bbba Compare June 17, 2026 09:11
Comment thread offload_daemon/src/main.rs Outdated
@sboeuf sboeuf force-pushed the offload_snapshot_part1 branch from 0e0bbba to 3a49e96 Compare June 17, 2026 09:20
Comment thread vmm/src/sparse.rs
Comment thread vmm/src/lib.rs Outdated
@sboeuf sboeuf force-pushed the offload_snapshot_part1 branch 6 times, most recently from 130c1c8 to e39187c Compare June 17, 2026 19:08

@phip1611 phip1611 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Left a few remarks.

Comment thread vmm/src/lib.rs Outdated
Comment thread offload_daemon/src/main.rs
Comment thread offload_daemon/src/main.rs
Comment thread offload_daemon/src/main.rs
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs Outdated
Comment thread offload_daemon/src/main.rs
Comment thread vmm/src/api/mod.rs Outdated
sboeuf added 6 commits June 18, 2026 02:24
Expose VmMigrationConfig as a public facing structure that can be used
by an offload daemon to act as if it was the VM to migrate to, or the VM
to migrate from.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Adding a new dedicated binary that is meant to be used as a reference
implementation for validating that offloaded snapshot/restore works and
meant to be used through tests in general.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Move next_data_extent and write_region_sparse out of memory_manager.rs
into a new vmm::sparse module so the snapshot writer, the restore
reader, and the offload daemon can share one implementation.

No functional change intended.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Copy only populated extents when writing the snapshot file and when
filling the restore memfd, leaving unwritten ranges as holes. Both
the on-disk snapshot and the restored guest RAM stay sparse, so that
untouched guest pages cost no disk space or host memory.

This brings the offload daemon closer to be at feature parity with CH's
internal implementation of snapshot/restore. The only missing piece is
on-demand paging at this point.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
Extending the snapshot/restore documentation so that it explains what
are the goals behind this offloaded snapshot/restore feature, how to use
it in practice, and also by documenting the protocol used by the offload
daemon so that anyone could write its own daemon.

By relying on the existing local live migration support and reusing the
semantics and the protocol associated with it, we intend to provide a
way for snapshotting and restoring a VM to/from a dedicated process that
we can call the offload daemon.

By allowing an external process to perform the snapshot/restore actions
on behalf of Cloud Hypervisor, we give our users the opportunity to
implement their own offloaded daemon. The goal is to avoid bloating
Cloud Hypervisor with numerous features related to snapshot/restore, and
let the user decide how to perform the snapshot/restore actions. One
example is that we can decide to encrypt the guest RAM on the fly in
order to avoid writing an unencrypted version to local disk. Another
example is to be able to send guest RAM and associated state/config data
over the network without having to persist the data first to local
storage.

There might be other reasons to choose going with an offloaded daemon to
perform the snapshot/restore of the VM, but in every case, this empowers
the user to make their own choice.

Signed-off-by: Sebastien Boeuf <sboeuf@meta.com>
Assisted-by: Claude:claude-opus-4-7
@sboeuf sboeuf force-pushed the offload_snapshot_part1 branch from e39187c to 2902197 Compare June 18, 2026 09:24
Comment thread offload_daemon/src/main.rs
@sboeuf

sboeuf commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

@phip1611 are we good to go with this PR?

@phip1611 phip1611 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all concerns so fast. LGTM 🎉

@sboeuf sboeuf added this pull request to the merge queue Jun 18, 2026
Merged via the queue into cloud-hypervisor:main with commit 2f2f709 Jun 18, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants