colo: secondary vm crash during loadvm
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| QEMU |
Expired
|
Undecided
|
Unassigned | ||
Bug Description
Following document 'COLO-FT.txt', I test colo feature on my hosts. It seems goes well. But after a while the secondary vm crash. The stack is as follows:
#0 0x00007f191456dc37 in raise () from /lib/x86_
#1 0x00007f1914571028 in abort () from /lib/x86_
#2 0x00007f1914566bf6 in ?? () from /lib/x86_
#3 0x00007f1914566ca2 in __assert_fail () from /lib/x86_
#4 0x0000564154ad9147 in pcibus_reset (qbus=0x5641567
#5 0x0000564154a07cdb in qbus_reset_one (bus=0x56415676
#6 0x0000564154a0d721 in qbus_walk_children (bus=0x56415676
post_
at hw/core/bus.c:68
#7 0x0000564154a08b4d in qdev_walk_children (dev=0x56415675
post_
at hw/core/qdev.c:617
#8 0x0000564154a0d6e5 in qbus_walk_children (bus=0x56415659
post_
at hw/core/bus.c:59
#9 0x0000564154a07df5 in qbus_reset_all (bus=0x56415659
#10 0x0000564154a07e3a in qbus_reset_all_fn (opaque=
#11 0x0000564154a0e222 in qemu_devices_reset () at hw/core/reset.c:69
#12 0x00005641548b3b47 in pc_machine_reset () at /vms/git/
#13 0x0000564154972ca7 in qemu_system_reset (report=false) at vl.c:1697
#14 0x0000564154b9d007 in colo_process_
#15 0x00007f1914907184 in start_thread () from /lib/x86_
#16 0x00007f1914634bed in clone () from /lib/x86_
(gdb) frame 4
#4 0x0000564154ad9147 in pcibus_reset (qbus=0x5641567
warning: Source file is more recent than executable.
311 assert(
(gdb) ^CQuit
(gdb) p bus->irq_count[i]
$1 = -1

The qemu version is 2.9.0 release.
The 'irq_count' and 'irq_state' are sent by private vm, and loaded by secondary vm. When they sent by private vm, they maybe not in a consistent state. So sometimes 'bus->irq_count[i]' becomes '-1' on secondary vm.
I deleted the assertions and then tested it several times, it worked well