live migration fails due to host cpu features different

Bug #1903822 reported by chengsheng on 2020-11-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
chengsheng

Bug Description

When the vm does not specify a host for live migration, it will call libvirt's compareCPU() to compare cpu features, which is not very useful. For example, when the kernel version of the host is different, compareCPU() passes the check, but in fact the cpu features of the host may be different, When the vm is live migrated to this host, a startup error will occur.

Steps to reproduce:
1. Prepare two compute nodes with the same cpu features, the system is centos7.6 and the kernel is 3.10.
2. It is normal not to specify host live migration between two nodes.
3. Upgrade the kernel of one of the nodes to 4.19.
4. Do not specify the host live migration, the vm migrates from one node to another node, but an error is reported during startup, and the error log is as follows:
    Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: xsaves: libvirtError: operation failed: guest CPU doesn't match specification: missing features: xsaves

OpenStack version: queens
Other versions should have the same problem。

chengsheng (chengsheng) on 2020-11-11
description: updated
chengsheng (chengsheng) on 2020-11-11
Changed in nova:
assignee: nobody → chengsheng (chengsheng)

Fix proposed to branch: master
Review: https://review.opendev.org/762330

Changed in nova:
status: New → In Progress
sean mooney (sean-k-mooney) wrote :

This looks more like a libvirt bug not a nova bug.
there are better api that we can use from libvirt to do sticter test tehn we currently do but i suspect the issue is that the libvirft you are usign predats the xsaves feature or does not know about it so it did not chekc but we delegate cpu compatiablity checking to libvirt rather then doing it via nova directly.

chengsheng (chengsheng) wrote :

The reason is that the 3.10 kernel does not define xsaves, but libvirt can get ‘xsaves’ through capabilities, but kvm does not.
It needs to replace the CPU feature in capabilities with the cpu feature in domcapabilities.

Kashyap Chamarthy (kashyapc) wrote :

I agree with chengsheng here.

Sean, this is not a libvirt bug; this is a Nova bug ... as we're not doing the CPU comparison properly. These problems will also occur when you try to migrate a guest from CentOS-7 to CentOS-8.

The immediate solution is, as I suggested in this spec[1], is to use compareHypervisorCPU() and baselineHypervisorCPU() APIs.

So we _need_this fix: https://review.opendev.org/c/openstack/nova/+/762330/

[1] https://opendev.org/openstack/nova-specs/commit/70811da221035044e27

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers