Offloaded snapshot/restore (part 1)#8403
Merged
Merged
Conversation
Member
|
@sboeuf Oh noes! The CI failed on your new test |
Member
Author
Yes I'm fixing it, that's a problem linked to the split of the original PR :) |
0acc069 to
0e0bbba
Compare
rbradford
reviewed
Jun 17, 2026
0e0bbba to
3a49e96
Compare
rbradford
approved these changes
Jun 17, 2026
130c1c8 to
e39187c
Compare
phip1611
requested changes
Jun 18, 2026
phip1611
left a comment
Member
There was a problem hiding this comment.
Thanks for working on this. Left a few remarks.
Expose VmMigrationConfig as a public facing structure that can be used by an offload daemon to act as if it was the VM to migrate to, or the VM to migrate from. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7
Adding a new dedicated binary that is meant to be used as a reference implementation for validating that offloaded snapshot/restore works and meant to be used through tests in general. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7
Move next_data_extent and write_region_sparse out of memory_manager.rs into a new vmm::sparse module so the snapshot writer, the restore reader, and the offload daemon can share one implementation. No functional change intended. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7
Copy only populated extents when writing the snapshot file and when filling the restore memfd, leaving unwritten ranges as holes. Both the on-disk snapshot and the restored guest RAM stay sparse, so that untouched guest pages cost no disk space or host memory. This brings the offload daemon closer to be at feature parity with CH's internal implementation of snapshot/restore. The only missing piece is on-demand paging at this point. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7
Extending the snapshot/restore documentation so that it explains what are the goals behind this offloaded snapshot/restore feature, how to use it in practice, and also by documenting the protocol used by the offload daemon so that anyone could write its own daemon. By relying on the existing local live migration support and reusing the semantics and the protocol associated with it, we intend to provide a way for snapshotting and restoring a VM to/from a dedicated process that we can call the offload daemon. By allowing an external process to perform the snapshot/restore actions on behalf of Cloud Hypervisor, we give our users the opportunity to implement their own offloaded daemon. The goal is to avoid bloating Cloud Hypervisor with numerous features related to snapshot/restore, and let the user decide how to perform the snapshot/restore actions. One example is that we can decide to encrypt the guest RAM on the fly in order to avoid writing an unencrypted version to local disk. Another example is to be able to send guest RAM and associated state/config data over the network without having to persist the data first to local storage. There might be other reasons to choose going with an offloaded daemon to perform the snapshot/restore of the VM, but in every case, this empowers the user to make their own choice. Signed-off-by: Sebastien Boeuf <sboeuf@meta.com> Assisted-by: Claude:claude-opus-4-7
e39187c to
2902197
Compare
phip1611
reviewed
Jun 18, 2026
Member
Author
|
@phip1611 are we good to go with this PR? |
phip1611
approved these changes
Jun 18, 2026
phip1611
left a comment
Member
There was a problem hiding this comment.
Thanks for addressing all concerns so fast. LGTM 🎉
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
By relying on the existing local live migration support and reusing the
semantics and the protocol associated with it, we intend to provide a
way for snapshotting and restoring a VM to/from a dedicated process that
we can call the offload daemon.
By allowing an external to perform the snapshot/restore actions on
behalf of Cloud Hypervisor, we give our users the opportunity to
implement their own offloaded daemon. The goal is to avoid bloating
Cloud Hypervisor with numerous features related to snapshot/restore, and
let the user decide how to perform the snapshot/restore actions. One
example is that we can decide to encrypt the guest RAM on the fly in
order to avoid writing an unencrypted version to local disk. Another
example is to be able to send guest RAM and associated state/config data
over the network without having to persist the data first to local
storage.
There might be other reasons to choose going with an offloaded daemon to
perform the snapshot/restore of the VM, but in every case, this empowers
the user to make their own choice.