live migration fails due to host cpu features different

Bug #1903822 reported by chengsheng
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
chengsheng

Bug Description

When the vm does not specify a host for live migration, it will call libvirt's compareCPU() to compare cpu features, which is not very useful. For example, when the kernel version of the host is different, compareCPU() passes the check, but in fact the cpu features of the host may be different, When the vm is live migrated to this host, a startup error will occur.

Steps to reproduce:
1. Prepare two compute nodes with the same cpu features, the system is centos7.6 and the kernel is 3.10.
2. It is normal not to specify host live migration between two nodes.
3. Upgrade the kernel of one of the nodes to 4.19.
4. Do not specify the host live migration, the vm migrates from one node to another node, but an error is reported during startup, and the error log is as follows:
    Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: xsaves: libvirtError: operation failed: guest CPU doesn't match specification: missing features: xsaves

OpenStack version: queens
Other versions should have the same problem。

chengsheng (chengsheng)
description: updated
chengsheng (chengsheng)
Changed in nova:
assignee: nobody → chengsheng (chengsheng)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/762330

Changed in nova:
status: New → In Progress
Revision history for this message
sean mooney (sean-k-mooney) wrote :

This looks more like a libvirt bug not a nova bug.
there are better api that we can use from libvirt to do sticter test tehn we currently do but i suspect the issue is that the libvirft you are usign predats the xsaves feature or does not know about it so it did not chekc but we delegate cpu compatiablity checking to libvirt rather then doing it via nova directly.

Revision history for this message
chengsheng (chengsheng) wrote :

The reason is that the 3.10 kernel does not define xsaves, but libvirt can get ‘xsaves’ through capabilities, but kvm does not.
It needs to replace the CPU feature in capabilities with the cpu feature in domcapabilities.

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

I agree with chengsheng here.

Sean, this is not a libvirt bug; this is a Nova bug ... as we're not doing the CPU comparison properly. These problems will also occur when you try to migrate a guest from CentOS-7 to CentOS-8.

The immediate solution is, as I suggested in this spec[1], is to use compareHypervisorCPU() and baselineHypervisorCPU() APIs.

So we _need_this fix: https://review.opendev.org/c/openstack/nova/+/762330/

[1] https://opendev.org/openstack/nova-specs/commit/70811da221035044e27

Revision history for this message
norman shen (jshen28) wrote :

In fact, I believe using `virsh capabilities` to compare cpu features is incorrect, at least for host-model

* libvirt will use qemu qmp command `query-cpu-model-expansion` to retrieve a list of cpu features and whether it is supported
* some features like `monitor` which is not implemented will not be required and thus not affect live migration at all
* even if cpuid is different for two hosts, it is not implying live migration is forbid, because it is dependent on whether qemu implements it or not
* also, some features like `invtsc` or `hypervisor` is not part of any common cpu but provides by qemu will also actually be required in the live domain

and of course I am no expert in libvirt or qemu, so apologize in advance if my understanding is incorrect...

besides, right now if live migration from A to B succeeds, live migration back (no stop/reboot in between) should also work but for the current nova code it is not.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.