Live migration fails despite identical CPUs/capabilities

Bug #2061569 reported by macchese
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

charm bundle: openstack-2023.2
nova-compute: 28.0.1 channel 2023.2/stable
ubuntu: jammy .15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

I have two identical host, op1 and op4, DELL PowerEdge R7425 acting as nova-compute nodes (hypervisors).
ubuntu@juju:~/juju/controller$ openstack hypervisor list
+----+---------------------+-----------------+--------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+----+---------------------+-----------------+--------------+-------+
| 7 | op1.maas | QEMU | xxxxxxxxxxxx | up |
| 8 | op4.maas | QEMU | xxxxxxxxxxxx | up |
+----+---------------------+-----------------+--------------+-------+

1)microcode is the same:

root@op1:~# virsh capabilities |grep microcode
      <microcode version='134222446'/>

root@op4:~# virsh capabilities |grep microcode
      <microcode version='134222446'/>

2) libvirt capabilities features are identical:
root@op1:~# virsh capabilities |grep feature
      <feature name='ht'/>
      <feature name='monitor'/>
      <feature name='osxsave'/>
      <feature name='xsaves'/>
      <feature name='cmp_legacy'/>
      <feature name='extapic'/>
      <feature name='skinit'/>
      <feature name='wdt'/>
      <feature name='tce'/>
      <feature name='topoext'/>
      <feature name='perfctr_core'/>
      <feature name='perfctr_nb'/>
      <feature name='invtsc'/>
      <feature name='clzero'/>
      <feature name='xsaveerptr'/>
      <feature name='npt'/>
      <feature name='lbrv'/>
      <feature name='svm-lock'/>
      <feature name='nrip-save'/>
      <feature name='tsc-scale'/>
      <feature name='vmcb-clean'/>
      <feature name='flushbyasid'/>
      <feature name='decodeassists'/>
      <feature name='pause-filter'/>
      <feature name='pfthreshold'/>
    <migration_features>
    </migration_features>
    <features>
    </features>
    <features>
    </features>

root@op4:~# virsh capabilities |grep feature
      <feature name='ht'/>
      <feature name='monitor'/>
      <feature name='osxsave'/>
      <feature name='xsaves'/>
      <feature name='cmp_legacy'/>
      <feature name='extapic'/>
      <feature name='skinit'/>
      <feature name='wdt'/>
      <feature name='tce'/>
      <feature name='topoext'/>
      <feature name='perfctr_core'/>
      <feature name='perfctr_nb'/>
      <feature name='invtsc'/>
      <feature name='clzero'/>
      <feature name='xsaveerptr'/>
      <feature name='npt'/>
      <feature name='lbrv'/>
      <feature name='svm-lock'/>
      <feature name='nrip-save'/>
      <feature name='tsc-scale'/>
      <feature name='vmcb-clean'/>
      <feature name='flushbyasid'/>
      <feature name='decodeassists'/>
      <feature name='pause-filter'/>
      <feature name='pfthreshold'/>
    <migration_features>
    </migration_features>
    <features>
    </features>
    <features>
    </features>

3) NUMA idem
root@op1:~# lscpu | grep -i numa
NUMA node(s): 8
NUMA node0 CPU(s): 0,8,16,24,32,40,48,56
NUMA node1 CPU(s): 2,10,18,26,34,42,50,58
NUMA node2 CPU(s): 4,12,20,28,36,44,52,60
NUMA node3 CPU(s): 6,14,22,30,38,46,54,62
NUMA node4 CPU(s): 1,9,17,25,33,41,49,57
NUMA node5 CPU(s): 3,11,19,27,35,43,51,59
NUMA node6 CPU(s): 5,13,21,29,37,45,53,61
NUMA node7 CPU(s): 7,15,23,31,39,47,55,63

root@op4:~# lscpu | grep -i numa
NUMA node(s): 8
NUMA node0 CPU(s): 0,8,16,24,32,40,48,56
NUMA node1 CPU(s): 2,10,18,26,34,42,50,58
NUMA node2 CPU(s): 4,12,20,28,36,44,52,60
NUMA node3 CPU(s): 6,14,22,30,38,46,54,62
NUMA node4 CPU(s): 1,9,17,25,33,41,49,57
NUMA node5 CPU(s): 3,11,19,27,35,43,51,59
NUMA node6 CPU(s): 5,13,21,29,37,45,53,61
NUMA node7 CPU(s): 7,15,23,31,39,47,55,63

) live-migration

openstack server list --all-project
+--------------------------------------+------+--------+-------------------+--------------------------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+-------------------+--------------------------+---------+
| 6b75b329-bda5-4348-a25e-ec1b9102b398 | u1 | ACTIVE | hub=xxxxxxxxxxxxx | N/A (booted from volume) | f1.mini |

#openstack server migrate --live-migration 6b75b329-bda5-4348-a25e-ec1b9102b398
No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-a3489211-cb92-4744-a3b1-1df96bbcb3f2)

log from nova-cloud-controller (debug mode) here:
https://paste.ubuntu.com/p/kF3kmcXtQH/

where op4.maas is the source hypervisor and op1.maas is the destination one: they have the same hardware, RAM and ceph storage.

log from destination nova-compute
504b0aa43f982b0f26ebf4 230876c80d504b0aa43f982b0f26ebf4] Instance launched has CPU info: {"arch": "x86_64", "model": "EPYC-IBPB", "vendor": "AMD", "topology": {"cells": 8, "sockets": 1, "co
res": 4, "threads": 2}, "features": ["pclmuldq", "tsc-scale", "clzero", "avx2", "osvw", "sse4a", "perfctr_core", "decodeassists", "3dnowprefetch", "rdrand", "fpu", "lbrv", "syscall", "apic"
, "cx8", "lm", "clflush", "pause-filter", "sha-ni", "sse4.1", "fxsr_opt", "avx", "sse4.2", "pfthreshold", "nrip-save", "lahf_lm", "smap", "xsavec", "cx16", "mtrr", "pge", "aes", "nx", "mmxe
xt", "fxsr", "misalignsse", "xsaveerptr", "fma", "svm-lock", "ssse3", "bmi1", "sse", "mce", "invtsc", "de", "msr", "fsgsbase", "pat", "bmi2", "vmcb-clean", "sse2", "cr8legacy", "vme", "skin
it", "arat", "tce", "npt", "sep", "ibpb", "perfctr_nb", "flushbyasid", "pse36", "mca", "popcnt", "smep", "cmp_legacy", "mmx", "wdt", "pdpe1gb", "tsc", "topoext", "xsaves", "xsave", "abm", "
rdseed", "xgetbv1", "xsaveopt", "pni", "adx", "monitor", "extapic", "svm", "pae", "cmov", "pse", "ht", "rdtscp", "movbe", "f16c", "clflushopt"]}
2024-04-15 18:15:40.776 2182518 ERROR nova.virt.libvirt.driver [None req-c7cbc9ca-4cbc-4679-832f-f07f9854d4b8 f616f0a309e741d7a1a8416852da6692 e175daa0e5464c38b424cfa2e0b457d3 - - 230876c80
d504b0aa43f982b0f26ebf4 230876c80d504b0aa43f982b0f26ebf4] CPU doesn't have compatibility.

macchese (max-liccardo)
description: updated
macchese (max-liccardo)
description: updated
summary: - Live migration fails despite identical CPUs with Host filter ignoring
- hosts
+ Live migration fails despite identical CPUs/capabilities
Revision history for this message
Andrew Bonney (andrewbonney) wrote :
Revision history for this message
Uggla (rene-ribaud) wrote :

Hello Max, thanks for submitting this issue.

The CPU comparison is done by a libvirt function.

First, can you please verify that libvirt/kvm/qemu versions are identical on both compute nodes.

If identical, can you provide source and destination debug logs from the computes nodes.
And the virsh dumpxml of your VM.

As soon as completed, please set this ticket to new again.

Changed in nova:
status: New → Triaged
status: Triaged → Incomplete
Revision history for this message
Uggla (rene-ribaud) wrote :

Oops thanks Andrew, I have just seen the bug: https://bugs.launchpad.net/nova/+bug/2023035

@Max please try the workaround as proposed by Andrew, as the fix is not yet ready.

Changed in nova:
status: Incomplete → Confirmed
Revision history for this message
macchese (max-liccardo) wrote (last edit ):
Download full text (4.4 KiB)

I setup skip on startup and dest but it doensn't work.

libvirt, qemu, kvm and nova are the same on both hosts
root@op1:/var/log/nova# dpkg -l |egrep "libvirt|kvm|qemu|nova"
ii ipxe-qemu 1.21.1+git-20220113.fbbdc3926-0ubuntu1 all PXE boot firmware - ROM images for qemu
ii ipxe-qemu-256k-compat-efi-roms 1.0.0+git-20150424.a25a16d-0ubuntu4 all PXE boot firmware - Compat EFI ROM images for qemu
ii libvirt-clients 8.0.0-1ubuntu7.10 amd64 Programs for the libvirt library
ii libvirt-daemon 8.0.0-1ubuntu7.10 amd64 Virtualization daemon
ii libvirt-daemon-config-network 8.0.0-1ubuntu7.10 all Libvirt daemon configuration files (default network)
ii libvirt-daemon-config-nwfilter 8.0.0-1ubuntu7.10 all Libvirt daemon configuration files (default network filters)
ii libvirt-daemon-driver-qemu 8.0.0-1ubuntu7.10 amd64 Virtualization daemon QEMU connection driver
ii libvirt-daemon-system 8.0.0-1ubuntu7.10 amd64 Libvirt daemon configuration files
ii libvirt-daemon-system-systemd 8.0.0-1ubuntu7.10 all Libvirt daemon configuration files (systemd)
ii libvirt0:amd64 8.0.0-1ubuntu7.10 amd64 library for interfacing with different virtualization systems
ii nova-api-metadata 3:28.0.1-0ubuntu1~cloud0 all OpenStack Compute - metadata API frontend
ii nova-common 3:28.0.1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 3:28.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 3:28.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 3:28.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python3-libvirt 8.0.0-1build1 amd64 libvirt Python 3 bindings
ii python3-nova 3:28.0.1-0ubuntu1~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:18.4.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
ii qemu-block-extra 1:6.2+dfsg-2ubuntu6.18 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-system-common 1:6.2+dfsg-2ubuntu6.18 amd64 QEMU full system emulation b...

Read more...

Changed in nova:
status: Confirmed → New
Revision history for this message
macchese (max-liccardo) wrote :

I tryied also modifing cpu-mode of the VM

<cpu mode="host-model" check="partial">
    <topology sockets="1" dies="1" cores="1" threads="1"/>
  </cpu>

It fails as above

description: updated
Revision history for this message
Andrew Bonney (andrewbonney) wrote :

It looks like you may have the config workarounds defined in the wrong place. They need to be under the existing [workarounds] section of the config rather than [DEFAULT]

Revision history for this message
macchese (max-liccardo) wrote :

oh sorry,
I used juju because charm openstack and It seems to me that it cannot populate the [workaround] section very well.
I modify nova.conf by hand and now it works!
So what next? I should use these workaround until live-migration will be patched, right?

P.S: FYI I filled this bug to juju-charmers
https://bugs.launchpad.net/charm-nova-compute/+bug/2062033

Elod Illes (elod-illes)
tags: added: libvirt live-migration
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.