CPU Profiles part 2: MSR adjustment logic#8442
Draft
olivereanderson wants to merge 21 commits into
Draft
Conversation
KVM defines feature MSRs as MSRs that expose host capabilities and processor features. CPU profiles will describe how to adjust such feature MSRs similarly to how they describe CPUID modifications. We take the first step in this direction by introducing a method on the hypervisor trait to obtain a list of the indices of the supported feature MSRs. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
CPU profiles will describe a list of MSRs they explicitly permit. When applying the profile we wanto to check that the host has all the required MSRs otherwise we have an incompatibility issue. In order to check which MSRs the host supports we start by exposing the hypervisor's get_msr_index_list method which lists most of the MSRs supported by the host. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Note that even though a host might support all MSRs required by the CPU profile, it might still expose some MSRs that are not compatible with the given CPU profile. Such MSRs are not necessarily guarded by CPUID and we cannot simply set them to zero either. KVM supports a means to filter MSRs however and we intend to utilize this capability to prevent the guest from accessing MSRs that are not permitted by the CPU profile. We start with adding a type representing an MsrFilter. NOTE: We can consider removing our MsrFilter type once rust-vmm/kvm#359 is integrated in Cloud Hypervisor. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Our second step towards denying guests access to CPU profile incompatible MSRs is to add the missing msr_filter method on the KvmVm type. We emphasize that this is a temporary workaround until rust-vmm/kvm#359 is integrated in CHV. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We place a new method on the Vm trait which we will later call from the vmm crate in order to deny the guest from accessing MSRs that are incompatible with a selected CPU profile (if any). We only provide an implementation for the KVM backend for now and leave MSHV for later. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
In order to help callers construct correct filter ranges when calling set_msr_filter we provide a method that conveys the maximum number of filter ranges the hypervisor backend permits. We follow the existing precedent in Cloud Hypervisor of returning `&static T` from methods on the hypervisor related interfaces as a means to provide constants associated with types as a workaround for the fact that associated constants are not dyn compatible. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
CPU Profiles will be serialized to JSON by the upcoming CPU Profile generation tool and we want MSRs to be serialized as hex strings. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We create 64-bit analogues of the already existing hex (de-)serialzer helper functions for the 32-bit case. These are necessary because the CPU profile needs associated data describing how to adjust feature MSRs whose values are 64-bits. In this case we prefer some small amount of code duplication over macros and/or traits since we do not expect the need for further variants of these helpers. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce a type analogous to CpuidOutputRegisterAdjustment, but for feature MSRs. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce a type describing MSR adjustments associated with a CPU profile. The upcoming CPU profile generation tool will serialize instances of this struct when generating a CPU profile. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
When Cloud hypervisor has not been configured for (KVM) Hyper-V we want to adapt the CPU profiles not to require existence of Hyper-V related MSRs. The first step introduced here is to create a list of such MSRs. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce a method on the CpuProfile enum that computes the required MSR related updates in order to be compatible with the CPU profile. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
In order to safely apply CPU profiles we need to ensure that the host's feature MSRs that are permitted by the CPU profile are also compatible with the values the CPU profile dictates. KVM_SET_MSRs takes care of checking compatibility for most feature MSRs on both Intel and AMD CPUs that we will permit CPU profiles to have (more on this in the upcoming CPU profile generation tool PRs), but there is one exception. Userspace may set whatever value for the Intel exclusive IA32_ARCH_CAPABILITIES MSR without receiving any complaints from KVM. We thus introduce our own compatibility check for IA32_ARCH_CAPABILITIES. One might even argue that KVM_SET_MSRs is called relatively late when creating or receiving a VM and that it would be preferable to have compatibility checks for all permitted feature MSRs run earlier. This would also mean more informative debug logs. We argue however that those additional checks would lead to too much code that is not strictly necessary which is why we decided against doing that in this patch set. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce a function somewhat analogous to `generate_common_cpuid`, except that it is only relevant for CPU profiles. This function is more "high level" than the CpuProfile::required_msr_updates method and is intended to be called from the vmm crate. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Currently the KVM and MSHV Vm implementations have internal buffers used as scratch space when snapshotting/restoring MSRS. Upon construction the hypervisor sets an entry in this buffer for essentially each MSR it supports. There are a few exceptions such as MSRs that are treated as sregs, or MCE banks, but that is not relevant here. With CPU profiles some of the MSR entries in the buffer may however no longer be permitted and we do not want to attempt to save or restore their state. Since the hypervisor crate (where KvmVm resides) has no awareness of CPU profiles, it is more appropriate for the buffer to live in CpuManager, allowing dynamic updates based on the active CPU profile. Note that since we work with the Vm abstraction in the vmm crate we also needed to adapt the MSHV implementation even though CPU profiles will only be available to KVM to start with. This commit builds on commit dd47149 on the Cyberus Technology fork by Philipp Schuster. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
When a non-host CPU profile is selected the guest should only be able to access MSRs that the CPU profile permits. We thus introduce a function which takes a list of permitted MSRs and produces a filter essentially only permitting the given MSRs. The logic in this commit is somewhat complex, but we test it extensively with property based testing. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We introduce a method on the CpuManager that prepares MSR related changes required by the selected CPU profile. This method is called upon constructing the CpuManager in `Vm::create_cpu_manager` thus setting up all state that we need in order to obtain MSR compatibility with the chosen CPU profile upon creating (or restoring) vCPUs. The `feature_msr` field introduced in this commit will be utilized to set MSRs on vCPUs in a follow up commit. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
We refactor the CPU configuration functionality so that we can set feature MSRs according to a CPU profile. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
In the context of live migration or restoring a snapshot more generally we can only trust compatibility with the selected CPU profile as long as setting the feature MSRs defined by the CPU profile succeeds. This means that we have to adapt the current behavior, which only logs a warning on MSRs that cannot be set, to instead error if the MSR is declared to be crucial by the caller. Alternatively we could perform compatibility checks for all necessary feature MSRs before attempting to call `Vcpu::set_state`, but then we would have to introduce a lot of complex code. We thus prefer to rather let KVM do these checks for us when setting the MSRs. Signed-off-by: Oliver Anderson <oliver.anderson@cyberus-technology.de> On-behalf-of: SAP oliver.anderson@sap.com
Contributor
Author
|
@phip1611 and @tpressure could you please give me a first review here since there are quite a few changes compared with our fork? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the second PR in a series adding support for x86_64 CPU profiles to Cloud hypervisor which is requested in #7068.
The x86_64 CPU profile feature in its entirety is currently tracked in #7068 (comment), but you may also want to read the "full implementation overview" section below to get the big picture of what the complete implementation will look like.
Reviewers unfamiliar with CPU profiles/templates/models may want to jump directly to the "Motivation and background" section in this description first.
What this PR does
This PR does not add any user facing functionality to Cloud hypervisor. All it does is introduce data structures and logic for updating and filtering MSRs in accordance with a (at this point) hypothetical CPU profile.
The logic introduced here is (mostly) oblivious to the CPU manufacturer (as long as it is an x86_64 CPU), but focuses on the KVM hypervisor. This means that there are a few unimplemented!() invocations in the MSHV path, but note that these paths should not be encountered in the case of the host CPU profile.
If/When CPU profile support for MSHV is implemented then it should be possible to work with a lot of the code introduced in this PR though.
Differences with the implementation in the Cyberus Technology fork
For those familiar with the working prototype available in the Cyberus Technology fork there are a few differences to note here with regards to the implementation:
Our working prototype
We have a full implementation for Intel CPUs on our fork.
Generating a new CPU profile
See our docs for instructions on how to generate a new CPU profile for your own Intel CPU(s)
Experimenting with existing CPU profiles
You can also test running cloud hypervisor with one of our pre-generated profiles: Intel Sapphire Rapids or Intel Skylake; Simply add
,profile=sapphire-rapidsor,profile=skylaketo the--cpus <cpu option>argument when bringing up the VM.How we have tested our implementation
We have performed a range of tests on deployments of various sizes including:
Full implementation overview
At a high level our implementation consists of:
CpuProfileenum with a variant per CPU profile together with logic for extracting the compressed JSON files.We plan to bring all of this functionality to Cloud hypervisor through several PRs:
The CPU profile generation tool in more detail
Our CPU profile generation tool is aware of pretty much all (Intel) CPUID (sub) leaves and MSRs as well as CPUID and MSR entries tied to KVM.
It uses these hard coded lists together with our specified policies when the tool is executed to select, or prevent CPU features to become part of the generated CPU profile.
We will do our best to make the tool automatically warn or error when it encounters CPUID leaves and/or MSRs it is not aware of as this is a good sign that the tool needs to be updated.
If/when individual reserved CPUID and/or MSR bits within a CPUID or MSR register become specified, then one may also want to update the CPU profile generation tool to take this into account. If this is forgotten then we primarily expect this to just lead to new profiles having a slightly reduced set of supported features. This is something we can fix upon detection by re-generating the profile (with a "v2" suffix for backward compatibility reasons).
Additional binary size per new CPU profile
Our experiments show that with compression each new CPU profile adds between 3 and 4 KB to the Cloud hypervisor binary. Without compression the pretty printed JSON files sum up to about 50 - 60 KB per CPU profile.
Further optimizations to binary size are definitely possible, but we consider 3 - 4 KB per CPU profile good enough for the time being.
Motivation and background
Recall that software is usually developed to run on a variety of processors with various features. In order for the software to dynamically discover which hardware features may be utilized one typically uses the CPUID instruction to query the CPU for information, or in some often more low-level cases one uses so called MSRs (model specific registers) to obtain relevant processor specific information.
In the context of live migration this can of course lead to a time of check to time of use bug if the guest obtains processor information through CPUID or MSRs from the migration source and then ends up making decisions based on these findings on the migration destination that may have a different processor that might not support the same instructions as the migration source.
To mitigate this problem Cloud hypervisor performs CPUID checks (MSR checks are done by KVM, but it is debatable whether these are sufficient) at the beginning of a live migration. If the CPUID entries reported by the destination's hypervisor are not compatible with those of the source VM then the migration is aborted.
Such compatibility checks, although important, are thus also somewhat limiting in clusters with hosts/nodes running on different processors. There is no way to perform a live migration from a host with a say Intel Granite Rapids processor to a destination with a Intel Sapphire Rapids processor, even if the guest is not utilizing any functionality outside of the capabilities of the Intel Sapphire Rapids machine.
The aforementioned checks also prevent migrations from older hardware to newer in some cases where the older hardware supports deprecated CPU features (such as for example Intel MPX).
There are also certain CPU features that are unlikely to ever give accurate results in the context of live migration such as performance counters and debugging capabilities. We do not want guests making decisions based on these capabilities during live migration, even when all CPUs involved are identical!
Luckily hypervisors are capable of manipulating what guests see when executing the CPUID instruction or reading MSRs. This is a fact that we can and will use to our advantage. A CPU profile is thus a recipe for adjusting CPUID (sub) leaves and MSRs to hide certain CPU features from guests. One can then vastly increase the number of nodes/hosts in a cluster a booted VM with an applied CPU profile may live migrate to at some point in the future!
FAQ
What about other ISAs
We focus on
x86_64for the time being. Other architectures such as ARM and RISC-V will only have the Host CPU profile for the foreseeable future, unless someone steps up and wants to tackle either of them already now.Due to the vastly different nature of these architectures we expect there to be relatively little overlap with our work here in the context of x86_64 CPUs.
What about live migration between Intel and AMD CPUs?
It might work with an extremely limited minimal profile, but I don't think that would be suitable for workloads intended to be used in production.
The intended use-cases are thus Intel <-> Intel and AMD <-> AMD live migrations.