Input/output error during migration

Bug #1817268 reported by Mathieu Corbin
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

Operating system: Ubuntu 18.04.2 LTS
qemu version: 2.11.1, but also reproduced with 3.1.0 (compiled manually).
virsh --version: 4.0.0

Hello,

I am having an issue with migration of UEFI virtual machines. If the --copy-storage-inc and the --tunnelled libvirt flags are used together, the migration fails. The same command for non-uefi virtual machines (e.g the same libvirt xml without the <nvram> and <loader> tags) works.

The command/output error is:

virsh migrate --verbose --live --p2p --tunnelled --copy-storage-inc --change-protection --abort-on-error testuefi qemu+tcp://<ip>/system
error: internal error: qemu unexpectedly closed the monitor: Receiving block device images
2019-02-21T16:20:15.263261Z qemu-system-x86_64: error while loading state section id 2(block)
2019-02-21T16:20:15.263996Z qemu-system-x86_64: load of migration failed: Input/output error

If I remove one of the --tunnelled or the --copy-storage-inc flag, it works, for example:

virsh migrate --verbose --live --p2p --copy-storage-inc --change-protection --abort-on-error testuefi qemu+tcp://<ip>/system
Migration: [100 %]

virsh migrate --verbose --live --p2p --tunnelled --change-protection --abort-on-error testuefi qemu+tcp://<ip>/system
Migration: [100 %]

I have no idea why those two flags combined together produce an error, and only for UEFI virtual machines.

here is the libvirt xml definition:

<domain type='kvm' id='4'>
  <name>testuefi</name>
  <uuid>ce12de05-ec09-4b4b-a27a-47003a511bda</uuid>
  <description>CentOS 4.5 (32-bit)</description>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <cputune>
    <shares>878</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Apache Software Foundation</entry>
      <entry name='product'>CloudStack KVM Hypervisor</entry>
      <entry name='uuid'>ce12de05-ec09-4b4b-a27a-47003a511bda</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.11'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/testuefi_VARS.fd</nvram>
    <boot dev='cdrom'/>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Westmere</model>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='pclmuldq'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/testmigration.qcow2'/>
      <backingStore type='file' index='1'>
        <format type='raw'/>
        <source file='/var/lib/libvirt/images/b3e4b880-0611-43bc-9d71-9cdac138f6e2'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver cache='none'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
     <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='06:6a:20:00:00:55'/>
      <source bridge='public'/>
      <target dev='vnet4'/>
      <model type='virtio'/>
      <driver queues='2'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Here is the qemu command on the destination host:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name guest=testuefi-VM,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-testuefi-VM/master-key.aes -machine pc-i440fx-2.11,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Server,vmx=on,pcid=on,ssbd=on,hypervisor=on -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/testuefi-VM_VARS.fd,if=pflash,format=raw,unit=1 -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid b340b117-1704-4ccf-93a7-21303b12dd7f -smbios 'type=1,manufacturer=Apache Software Foundation,product=CloudStack KVM Hypervisor,uuid=b340b117-1704-4ccf-93a7-21303b12dd7f' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-14-testuefi-VM/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/testmigration.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive if=none,id=drive-ide0-1-0,readonly=on,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=37 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=06:a0:66:00:00:0c,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc vnc=unix:/var/run/qemu/b340b117-1704-4ccf-93a7-21303b12dd7f.sock -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -msg timestamp=on

Thanks,

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Hi Mathieu,
  How big is your testuefi-VM_VARS.fd file ? Does the problem go away if you pad it to 1MB?

I've seen a problem migrating pflash where the files aren't 1MB (?) multiples but only using the 'old style' block migration; normally you get an NBD based block migration but when you select tunneling libvirt can't tunnel the nbd stream so falls back to the old style migration and hits this bug.

Revision history for this message
Mathieu Corbin (mathieucorbin) wrote :

Hi David,

Thanks you for your help.

The VARS file is 128K.

I increased the "/var/lib/libvirt/qemu/nvram/testuefi_VARS.fd" var file to 1M, but had this error during migration:

error: internal error: qemu unexpectedly closed the monitor: 2019-02-22T09:52:34.098833Z qemu-system-x86_64: Length mismatch: system.flash1: 0x100000 in != 0x20000: Invalid argument
2019-02-22T09:52:34.098940Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'

So I destroyed the machine, removed the "/var/lib/libvirt/qemu/nvram/testuefi_VARS.fd" var file, increased the /usr/share/OVMF/OVMF_VARS.fd file in both hypervisors (src and dest) to 1M, recreated the machine and now the migration works:

virsh migrate --verbose --live --p2p --tunnelled --copy-storage-inc --change-protection --abort-on-error testuefi qemu+tcp://<ip>/system
Migration: [100 %]

I still have to test if increasing the VARS file do not cause issues on the virtual machine.

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Yep that Length mismatch is expected - the source and destination do have to match.

Revision history for this message
Mathieu Corbin (mathieucorbin) wrote :

Thanks you, so it is a bug and not the expected behavior right ?

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

Yes, I think so; although old-style block migration doesn't get much work on it now; so probably the fix I'd recommend for most cases now would be to pad the var file.

Revision history for this message
Thomas Huth (th-huth) wrote :

Looking through old bug tickets... can you still reproduce this issue with the latest version of QEMU? Or could we close this ticket nowadays?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Mathieu Corbin (mathieucorbin) wrote :

I guess the bug still exists, I fixed it back in the time by repackaging OVMF_VARS.fd (padded to be 1M).

I will try to find some time to mount an environment to reproduce the issue again.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.