live-migration ci failure on nfs shared storage

Bug #1535232 reported by Timofey Durakov on 2016-01-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Unassigned

Bug Description

gate-tempest-dsvm-multinode-live-migration job fails on shared storage with
tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration test fails.
Libvirt logs on destination node: http://xsnippet.org/361328/

libvirt version: 1.2.2-0ubuntu13.1.16
qemu version: 2.0.0+dfsg-2ubuntu1.21

Timofey Durakov (tdurakov) wrote :
Timofey Durakov (tdurakov) wrote :
Kashyap Chamarthy (kashyapc) wrote :
Download full text (4.3 KiB)

Source QEMU log for instance-00000004
---------------------------------------------------
From here: http://logs.openstack.org/83/252283/79/experimental/gate-tempest-dsvm-multinode-live-migration/070dcc0/logs/subnode-2/libvirt/qemu/instance-00000004.txt.gz

2016-01-13 22:02:39.960+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name instance-00000004 -S -machine pc-i440fx-trusty,accel=tcg,usb=off -cpu core2duo,-monitor,-pse36 -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d3ab1355-4ab5-4859-a393-8e0916ba9432 -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=13.0.0,serial=d86f63d9-0e54-46c5-80e5-5d3192ea802b,uuid=d3ab1355-4ab5-4859-a393-8e0916ba9432,family=Virtual Machine -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000004.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -kernel /opt/stack/data/nova/instances/d3ab1355-4ab5-4859-a393-8e0916ba9432/kernel -initrd /opt/stack/data/nova/instances/d3ab1355-4ab5-4859-a393-8e0916ba9432/ramdisk -append root=/dev/vda console=tty0 console=ttyS0 no_timer_check -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/d3ab1355-4ab5-4859-a393-8e0916ba9432/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:74:41:59,bus=pci.0,addr=0x2 -chardev file,id=charserial0,path=/opt/stack/data/nova/instances/d3ab1355-4ab5-4859-a393-8e0916ba9432/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
char device redirected to /dev/pts/0 (label charserial1)
qemu: terminating on signal 15 from pid 8062
2016-01-13 22:04:13.087+0000: shutting down
---------------------------------------------------

Destination QEMU log for instance-00000004
---------------------------------------------------
From here: http://logs.openstack.org/83/252283/79/experimental/gate-tempest-dsvm-multinode-live-migration/070dcc0/logs/libvirt/qemu/instance-00000004.txt.gz

2016-01-13 22:02:58.684+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name instance-00000004 -S -machine pc-i440fx-trusty,accel=tcg,usb=off -cpu core2duo,-monitor,-pse36 -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid d3ab1355-4ab5-4859-a393-8e0916ba9432 -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=13.0.0,serial=d86f63d9-0e54-46c5-80e5-5d3192ea802b,uuid=d3ab1355-4ab5-4859-a393-8e0916ba9432,family=Virtual Machine -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000004.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=contro...

Read more...

tags: added: libvirt
Daniel Berrange (berrange) wrote :
Download full text (3.8 KiB)

Looking at logs from this job:

http://logs.openstack.org/83/252283/96/experimental/gate-tempest-dsvm-multinode-live-migration/469db74/

On the target host QEMU logfile we see

> 2016-01-26 10:30:51.895+0000: shutting down
> load of migration failed

The fact that we see the shutting down message before the "load of migration failed" message strongly suggests that libvirt is proactively shutting down the QEMU. ie the failure to load migration data is *not* the cause of the problem - it is just a side effect of libvirt shutting down QEMU.

There is no sign of errors on the dest host libvirt logs - it is simply told by the source host that migration has failed and to shutdown the new QEMU.

Going back to the source host we see migration has started succesfully and transferred alot of data. eg the last status update before failure shows:

2016-01-26 10:30:51.197+0000: 8591: debug : qemuMonitorJSONCommandWithFd:264 : Send command '{"execute":"query-migrate","id":"libvirt-144"}' for write with FD -1
2016-01-26 10:30:51.197+0000: 8591: debug : qemuMonitorSend:959 : QEMU_MONITOR_SEND_MSG: mon=0x7f1e0c00db30 msg={"execute":"query-migrate","id":"libvirt-144"}
 fd=-1
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorIOWrite:504 : QEMU_MONITOR_IO_WRITE: mon=0x7f1e0c00db30 buf={"execute":"query-migrate","id":"libvirt-144"}
 len=48 ret=48 errno=11
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorIOProcess:396 : QEMU_MONITOR_IO_PROCESS: mon=0x7f1e0c00db30 buf={"return": {"expected-downtime": 106, "status": "active", "setu
p-time": 2, "total-time": 6613, "ram": {"total": 67776512, "remaining": 323584, "mbps": 95.288421, "transferred": 82742303, "duplicate": 4631, "dirty-pages-rate": 310, "skipped": 0,
"normal-bytes": 83189760, "normal": 20310}}, "id": "libvirt-144"}
 len=312
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorJSONIOProcessLine:157 : Line [{"return": {"expected-downtime": 106, "status": "active", "setup-time": 2, "total-time": 6613, "r
am": {"total": 67776512, "remaining": 323584, "mbps": 95.288421, "transferred": 82742303, "duplicate": 4631, "dirty-pages-rate": 310, "skipped": 0, "normal-bytes": 83189760, "normal"
: 20310}}, "id": "libvirt-144"}]
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorJSONIOProcessLine:177 : QEMU_MONITOR_RECV_REPLY: mon=0x7f1e0c00db30 reply={"return": {"expected-downtime": 106, "status": "active", "setup-time": 2, "total-time": 6613, "ram": {"total": 67776512, "remaining": 323584, "mbps": 95.288421, "transferred": 82742303, "duplicate": 4631, "dirty-pages-rate": 310, "skipped": 0, "normal-bytes": 83189760, "normal": 20310}}, "id": "libvirt-144"}

What is interesting is that we see some I/O error events issued - the first is

2016-01-26 10:30:46.888+0000: 8590: debug : qemuMonitorJSONIOProcessLine:172 : QEMU_MONITOR_RECV_EVENT: mon=0x7f1e0c00db30 event={"timestamp": {"seconds": 1453804246, "microseconds": 883853}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "operation": "write", "action": "report"}}

Initially migration is still continuing despite these I/O error reports

After we have received quite alot of them though, we eventually see migration fail

2016-01-26 ...

Read more...

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Changed in nova:
status: Triaged → Confirmed
Sujitha (sujitha-neti) on 2016-03-25
Changed in nova:
assignee: nobody → Sujitha (sujitha-neti)
Sujitha (sujitha-neti) on 2016-03-31
Changed in nova:
assignee: Sujitha (sujitha-neti) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers