Many crashes with "memslot_get_virt: slot_id 170 too big"-type errors in 2.12.0 rc2

Bug #1762558 reported by Adam Williamson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

Since qemu 2.12.0 rc2 - qemu-2.12.0-0.6.rc2.fc29 - landed in Fedora Rawhide, just about all of our openQA-automated tests of Rawhide guests which run with qxl / SPICE graphics in the guest have died partway in, always shortly after the test switches from the installer (an X environment) to a console on a tty. qemu is, I think, hanging. There are always some errors like this right around the time of the hang:

[2018-04-09T20:13:42.0736 UTC] [debug] QEMU: id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
[2018-04-09T20:13:42.0736 UTC] [debug] QEMU: id 1, group 1, virt start 7f42dbc00000, virt end 7f42dfbfe000, generation 0, delta 7f42dbc00000
[2018-04-09T20:13:42.0736 UTC] [debug] QEMU: id 2, group 1, virt start 7f42d7a00000, virt end 7f42dba00000, generation 0, delta 7f42d7a00000
[2018-04-09T20:13:42.0736 UTC] [debug] QEMU:
[2018-04-09T20:13:42.0736 UTC] [debug] QEMU: (process:45812): Spice-CRITICAL **: memslot.c:111:memslot_get_virt: slot_id 218 too big, addr=da8e21fbda8e21fb

or occasionally like this:

[2018-04-09T20:13:58.0717 UTC] [debug] QEMU: id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
[2018-04-09T20:13:58.0720 UTC] [debug] QEMU: id 1, group 1, virt start 7ff093c00000, virt end 7ff097bfe000, generation 0, delta 7ff093c00000
[2018-04-09T20:13:58.0720 UTC] [debug] QEMU: id 2, group 1, virt start 7ff08fa00000, virt end 7ff093a00000, generation 0, delta 7ff08fa00000
[2018-04-09T20:13:58.0720 UTC] [debug] QEMU:
[2018-04-09T20:13:58.0720 UTC] [debug] QEMU: (process:25622): Spice-WARNING **: memslot.c:68:memslot_validate_virt: virtual address out of range
[2018-04-09T20:13:58.0720 UTC] [debug] QEMU: virt=0x0+0x18 slot_id=0 group_id=1
[2018-04-09T20:13:58.0721 UTC] [debug] QEMU: slot=0x0-0x0 delta=0x0
[2018-04-09T20:13:58.0721 UTC] [debug] QEMU:
[2018-04-09T20:13:58.0721 UTC] [debug] QEMU: (process:25622): Spice-WARNING **: display-channel.c:2426:display_channel_validate_surface: invalid surface_id 1048576
[2018-04-09T20:14:14.0728 UTC] [debug] QEMU: id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0
[2018-04-09T20:14:14.0728 UTC] [debug] QEMU: id 1, group 1, virt start 7ff093c00000, virt end 7ff097bfe000, generation 0, delta 7ff093c00000
[2018-04-09T20:14:14.0728 UTC] [debug] QEMU: id 2, group 1, virt start 7ff08fa00000, virt end 7ff093a00000, generation 0, delta 7ff08fa00000
[2018-04-09T20:14:14.0728 UTC] [debug] QEMU:
[2018-04-09T20:14:14.0728 UTC] [debug] QEMU: (process:25622): Spice-CRITICAL **: memslot.c:122:memslot_get_virt: address generation is not valid, group_id 1, slot_id 0, gen 110, slot_gen 0

The same tests running on Fedora 28 guests on the same hosts are not hanging, and the same tests were not hanging right before the qemu package got updated, so this seems very strongly tied to the new qemu.

Revision history for this message
Thomas Huth (th-huth) wrote :

These error messages ("memslot_get_virt") do not come from QEMU, but from spice, so please report this problem to the Spice project first (see https://www.spice-space.org/support.html for how to file a bug there).

Revision history for this message
Adam Williamson (awilliamson) wrote :

Nothing about SPICE changed in the affected time frame. This started happening between 2018-04-02 and 2018-04-07. The last time SPICE was changed in Rawhide was on 2018-02-09. However, qemu was bumped from rc1 to rc2 on 2018-04-05.

It's possible that https://bugzilla.redhat.com/show_bug.cgi?id=1564210 is involved; the offending mesa for that bug was built on 2018-04-03, so it also fits the time frame. However, the bug is not happening in Fedora 28 tests, and that same mesa change was sent to Fedora 28, so this seems less likely. The only related thing I've yet found that changed in Rawhide but not Fedora 28 during the time in which this bug started happening on Rawhide but not Fedora 28 is qemu.

Revision history for this message
Adam Williamson (awilliamson) wrote :

...on the other hand, I was clearly not thinking straight in associating this with the qemu version bump in Rawhide, because we don't *run* that qemu. We use the qemu from the worker host, not from the image under test, and the worker hosts are not running Rawhide, and their qemu hasn't changed during the time this problem appeared.

It still seems like the change that triggered this problem is something that changed in Rawhide between 2018-04-02 and 2018-04-07, but whatever that is, it almost certainly isn't qemu. You can close this issue for now, then. Sorry for the thinko.

Revision history for this message
Dr. David Alan Gilbert (dgilbert-h) wrote :

IMHO it's best to keep this open until we find out what's going on; it's not impossible it's something that's changed in qemu, and even if it isn't qemu's fault then you won't be the only person who ends up reporting it here, so it'll be good to get the answer.

Revision history for this message
Adam Williamson (awilliamson) wrote :

Up to you, of course. Just realized I didn't mention here that I also reported this downstream, and since it turns out to be not triggered by a qemu change I've been doing most of the investigation there:

https://bugzilla.redhat.com/show_bug.cgi?id=1565354

So far it's looking like the change that triggered this is going from kernel 4.16rc7 to kernel 4.17rc4.

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently considering to move its bug tracking to
another system. For this we need to know which bugs are still valid
and which could be closed already. Thus we are setting older bugs to
"Incomplete" now.

If you still think this bug report here is valid, then please switch
the state back to "New" within the next 60 days, otherwise this report
will be marked as "Expired". Or please mark it as "Fix Released" if
the problem has been solved with a newer version of QEMU already.

Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Adam Williamson (awilliamson) wrote :

This got resolved along the way and wasn't really a qemu bug anyway.

Changed in qemu:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.