qemu-system-x86_64 crashed with SIGABRT when using option -vga qxl

Bug #1755912 reported by Leonardo Müller
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned
qemu (Ubuntu)
Fix Released
Medium
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * There are conditions where the vga/qxl driver can crash the qemu
   process.

 * It is like a very complex case of a non initialized var - without the
   fix it might try to ask for updates without having a valid primary
   surface.

 * Backport from upstream https://git.qemu.org/?p=qemu.git;a=commit;h=5bd5c27c7d284d01477c5cc022ce22438c46bf9f to avoid the crash

[Test Case]

 * Sometimes booting xubuntu was reported to be enough, at other times it was needed to change resolution a few times to trigger.

  # get xubuntu iso (actually other UI Isos should do as well)
  $ qemu-system-x86_64 -vga qxl -enable-kvm -cpu host -smp cores=2,threads=2 -m 2048 -cdrom xubuntu-18.04-desktop-amd64.iso
  # If it boots successfully, change resolution until it crashes.
  $ while true ; do xrandr --output Virtual-0 --mode 640x480 ; sleep 1 ; xrandr --output Virtual-0 --mode 1280x720 ; sleep 1 ; xrandr --output Virtual-0 --mode 1920x1080 ; sleep 1 ; done

 * Without the fix that will trigger the qemu crash

[Regression Potential]

 * The change "just" adds QXL_MODE_UNDEFINED as one more trigger to leave
   the rendering update. That sounds rather safe. But thinking hard on
   potential updates I could think of theoretical setups that were in
   undefined mode all the time (unlikely or impossible I think) that now
   would get no updates anymore. Well I really don't think this is an
   issue, but since this section should be open thinking on "potential"
   regressions that is what comes to my mind.

[Other Info]

 * Thanks to Leonardo for most of the bisecting and discussion work!

---

When using qemu-system-x86_64 with the option -vga qxl, it crashes. The easiest way to crash it is by trying to change the guest's resolution. However, the system may randomly crash too, not happening only when changing resolution. Here is the terminal output of one of these random crashes:

--------

$ qemu-system-x86_64 -hda /dev/sdb -m 2048 -enable-kvm -cpu host -vga qxl -nodefaults -netdev user,id=hostnet0 -device virtio-net-pci,id=net0,netdev=hostnet0
WARNING: Image format was not specified for '/dev/sdb' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.

(process:21313): Spice-WARNING **: 16:01:45.759: display-channel.c:2431:display_channel_validate_surface: canvas address is 0x7f8eb948ab18 for 0 (and is NULL)

(process:21313): Spice-WARNING **: 16:01:45.759: display-channel.c:2432:display_channel_validate_surface: failed on 0

(process:21313): Spice-CRITICAL **: 16:01:45.759: display-channel.c:2035:display_channel_update: condition `display_channel_validate_surface(display, surface_id)' failed
Abortado (imagem do núcleo gravada)

--------

I was running QEMU as a normal user which is on the groups kvm and disk. Initially I supposed the problem was because I was running QEMU as root, but as a normal user this happens too.

I have tested with guests with different Ubuntu version: 18.04, 17.10 and 16.04. It is happening with them all.

ProblemType: Crash
DistroRelease: Ubuntu 18.04
Package: qemu-system-x86 1:2.11+dfsg-1ubuntu4
ProcVersionSignature: Ubuntu 4.15.0-10.11-generic 4.15.3
Uname: Linux 4.15.0-10-generic x86_64
ApportVersion: 2.20.8-0ubuntu10
Architecture: amd64
CurrentDesktop: XFCE
Date: Wed Mar 14 17:13:52 2018
ExecutablePath: /usr/bin/qemu-system-x86_64
InstallationDate: Installed on 2017-06-13 (273 days ago)
InstallationMedia: Xubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
MachineType: LENOVO 80UG
ProcCmdline: qemu-system-x86_64 -hda /dev/sdb -smp cpus=2 -m 512 -enable-kvm -cpu host -vga qxl -nodefaults -netdev user,id=hostnet0 -device virtio-net-pci,id=net0,netdev=hostnet0
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-10-generic.efi.signed root=UUID=6b4ae5c0-c78c-49a6-a1ba-029192618a7a ro quiet
Signal: 6
SourcePackage: qemu
StacktraceTop:
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
 () at /usr/lib/x86_64-linux-gnu/libspice-server.so.1
Title: qemu-system-x86_64 crashed with SIGABRT
UpgradeStatus: Upgraded to bionic on 2017-10-20 (145 days ago)
UserGroups: adm bluetooth cdrom dialout dip disk kvm libvirt lpadmin netdev plugdev sambashare sudo
dmi.bios.date: 07/10/2017
dmi.bios.vendor: LENOVO
dmi.bios.version: 0XCN43WW
dmi.board.asset.tag: NO Asset Tag
dmi.board.name: Toronto 4A2
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40679 WIN
dmi.chassis.asset.tag: NO Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo ideapad 310-14ISK
dmi.modalias: dmi:bvnLENOVO:bvr0XCN43WW:bd07/10/2017:svnLENOVO:pn80UG:pvrLenovoideapad310-14ISK:rvnLENOVO:rnToronto4A2:rvrSDK0J40679WIN:cvnLENOVO:ct10:cvrLenovoideapad310-14ISK:
dmi.product.family: IDEAPAD
dmi.product.name: 80UG
dmi.product.version: Lenovo ideapad 310-14ISK
dmi.sys.vendor: LENOVO

Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :
Revision history for this message
Apport retracing service (apport) wrote :

StacktraceTop:
 spice_logv (log_domain=0x7fb001524195 "Spice", args=0x7fafbf9fe600, format=0x7fb001525015 "condition `%s' failed", function=0x7fb001527ef0 <__func__.47520> "display_channel_update", strloc=0x7fb001527c0f "display-channel.c:2035", log_level=G_LOG_LEVEL_CRITICAL) at log.c:183
 spice_log (log_level=log_level@entry=G_LOG_LEVEL_CRITICAL, strloc=strloc@entry=0x7fb001527c0f "display-channel.c:2035", function=function@entry=0x7fb001527ef0 <__func__.47520> "display_channel_update", format=format@entry=0x7fb001525015 "condition `%s' failed") at log.c:196
 display_channel_update (display=0x56421590aa30, surface_id=0, area=area@entry=0x56421590ee1c, clear_dirty=1, qxl_dirty_rects=qxl_dirty_rects@entry=0x7fafbf9fe770, num_dirty_rects=num_dirty_rects@entry=0x7fafbf9fe76c) at display-channel.c:2035
 handle_dev_update_async (opaque=0x56421590ebe0, payload=0x56421590ee10) at red-worker.c:428
 dispatcher_handle_single_read (dispatcher=0x56421590e080) at dispatcher.c:284

Revision history for this message
Apport retracing service (apport) wrote : Stacktrace.txt
Revision history for this message
Apport retracing service (apport) wrote : StacktraceSource.txt
Revision history for this message
Apport retracing service (apport) wrote : ThreadStacktrace.txt
Changed in qemu (Ubuntu):
importance: Undecided → Medium
tags: removed: need-amd64-retrace
Revision history for this message
Simon Quigley (tsimonq2) wrote :

Subscribed Christian Ehrhardt, who might have an idea.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm,
I have no good idea unfortunately.
I tried it a few times (18.04 desktop guest 8 resolution changes) - showed no issue for me.
Is this depending on the type of guests that you run?

I drive ti through libvirt, which adds quite some variables in the new format.
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
You might experiment if some of them set would mitigate your issue.

Looking at the warnings - they are from display_channel_validate_surface but they are "safe" which means they check, detect it is null and then skips.
But maybe later on something else crashes on the same canvas being Null maybe?

Are the warnings like:
Spice-WARNING **: 16:01:45.759: display-channel.c:2431:display_channel_validate_surface: canvas address is 0x7f8eb948ab18 for 0 (and is NULL)
appearing right before the crash or when you start?

The crash itself seems to be:
display_channel_update -> display_channel_validate_surface (which emits the warnings) -> spice_warning -> spice_log -> spice_logv
This crashes if >= a certain log level - the warning above triggers it.
So the question is why is the canvas Null?

2429 if (!display->priv->surfaces[surface_id].context.canvas) {
2430 spice_warning("canvas address is %p for %d (and is NULL)\n",
2431 &(display->priv->surfaces[surface_id].context.canvas), surface_id);
2432 spice_warning("failed on %d", surface_id);
2433 return FALSE;
2434 }

handle_dev_update_async gets that value indirectly via
  RedWorker *worker = opaque;
  ...
  display_channel_update(worker->display_channel,
So the update that kills the pointer might be anywhere in between in this async paths.
I'm not a subject matter expert on this async UI updating :-/

If this really affects all releases way back including the latest we should try to build you the latest from source and if it affects this as well open the bug against upstream as well. As there we need a fix/discussion first then.
Can you compile qemu from source for yourself for this check or do you need help with that.

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :

I was able to build it from source. In the first try I was unable to test because it hadn't the spice protocol enabled.

The interface QEMU is using is different, as it has a menu bar with "Machine" and "View" with some options, but I could test and I could reproduce the crash. To clarify, I have recorded the VMs crashing.

Please note I have the notebook screen and a external monitor, so the resolution is a bit strange.

With the command: ./qemu-system-x86_64 -hda ~/Downloads/xubuntu-bionic-desktop-amd64-2018-04-22.iso -m 1536 -vga qxl -enable-kvm -cpu host -smp cpus=2

https://mega.nz/#!98ZiHY5b!ZOaNjb1OaZVj0V80GRjkqafOAL2UinVlAEiTP9aazdk

With the command: ./qemu-system-x86_64 -hda ~/Downloads/xubuntu-bionic-desktop-amd64-2018-04-22.iso -m 1536 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x3 -enable-kvm -cpu host -smp cpus=2

https://mega.nz/#!lwRDRA7I!no-6S6cWxmRf8Q8cjSWfN269PM9DISjLmN7QM6LYBC4

The examples are with Live ISOs, Using -cdrom (the most correct option) instead of -hda still have the same crashes. I can reproduce with a installed VM and with a physical system started in QEMU.

Additional effects I've seen are the guests freezing instead of crashing and a significant RAM memory peak when starting the VM using libvirt. If high amounts of memory are being used in the moment the VM is started, approximately 128 MB of memory goes to SWAP that is never freed. The only way to freed it is to umount the SWAP partition (I tested until SWAP had 658 MB of memory).

What I meant is that with a Bionic host, the guests crash regardless of what the guest is. For now I've tested only with Ubuntu and its derivatives, from 12.04 to 18.04 and they consistently have the issue. I'm downloading CentOS but the download is slow, so it will take some time. I will add the CentOS guest test results after I test it.

Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :

The download of CentOS 7 was finished and I downloaded Fedora Workstation 27 later. With both as guests the VM still crash, so it's not a issue exclusive to Ubuntu guests.

The crash may happen after 8 resolution changes (CentOS) or in the first one (Fedora 27) or if the system is running and suddenly crashes (Xubuntu 17.10 in a external HDD).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Leonardo, with that confirmed:
- not dependent on the guest distribution
- affecting latest upstream
- good logs on the crash

The videos are not needed but nice to proove your case (just as my pre-analysis of the code path is nice but likely not useful to a developer that regularly works on that code sections).

Overall this should really be ready for upstreams attention, I added a Qemu task which will auto mirror this to the Mailing list.

Changed in qemu (Ubuntu):
status: Incomplete → Confirmed
information type: Private → Public
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The bug traces so far had no private information, so I opened up the state to be visible to everyone.

Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :

QEMU from git apparently is fixed, but Ubuntu's version is still problematic.

Using an Xubuntu 18.04 guest, it's possible to reproduce the crash using:

while true ; do xrandr --output Virtual-0 --mode 640x480 ; sleep 1 ; xrandr --output Virtual-0 --mode 1280x720 ; sleep 1 ; xrandr --output Virtual-0 --mode 1920x1080 ; sleep 1 ; done

In less than 20 seconds the guest crash with:

(process:16447): Spice-CRITICAL **: 15:34:52.047: display-channel.c:2035:display_channel_update: condition `display_channel_validate_surface(display, surface_id)' failed
Abortado (imagem do núcleo gravada)

tags: added: cosmic
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Very interesting, Still not triggering for me :-/
Could you check if the PPA in [1] (with qemu 2.12 planned for Cosmic) already fixes it for you?

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3306/+packages

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :

Unfortunately it did not work, the error is still the same although it used the GTK interface this time. The command I used was:

$ qemu-system-x86_64 -vga qxl -enable-kvm -cpu host -smp cores=2,threads=2 -m 2048 -cdrom xubuntu-18.04-desktop-amd64.iso

I have noticed Ubuntu's QEMU and this QEMU don't have the OpenGL option enabled, may this be related to the issue?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yeah we switched to gtk.
With OpenGL which do you mean - the virtgl based support or some other config option?

Since it still fails to reproduce for me, but you already have a self build qemu that is good.
Could you based on this build 2.12 from source as well and confirm that it fails.
If it does you could bisect from 2.12 to master what fixed it so that we can consider that patch.
Otherwise if 2.12 from source in your own build works lets take a look at the build options that you mentioned.

Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :

I meant the option --enable-opengl and, for old versions, --enable-gtk-gl. I know it is required to use virtual machines using Intel GVT-g with dma-buf, and is a option strangely absent from QEMU configuration from Ubuntu's build. Without it, virgl fails too, making virt-manager have an option that does not work (under Spice Display, OpenGL option does not work).

The 2.12 build does crash when tested. That while loop is pretty efficient to trigger it. The git version can keep running it for hours straight with no problem, while the problematic versions crash in seconds.

As QEMU configuration is a result mixed between packages found automatically and those manually set by me, here is the configure command and its results from my QEMU build: https://paste.ubuntu.com/p/z9vnFdTnkD/

Please note that I had no need to build other targets for my use, so the configuration is much smaller than the one used by Ubuntu's QEMU.

Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :

The bisect is done and this is the result:

5bd5c27c7d284d01477c5cc022ce22438c46bf9f is the first new commit
commit 5bd5c27c7d284d01477c5cc022ce22438c46bf9f
Author: Gerd Hoffmann <email address hidden>
Date: Fri Apr 27 13:55:28 2018 +0200

    qxl: fix local renderer crash

    Make sure we only ask the spice local renderer for display updates in
    case we have a valid primary surface. Without that spice is confused
    and throws errors in case a display update request (triggered by
    screendump for example) happens in parallel to a mode switch and hits
    the race window where the old primary surface is gone and the new isn't
    establisted yet.

    Cc: <email address hidden>
    Fixes: https://bugzilla.redhat.com//show_bug.cgi?id=1567733
    Signed-off-by: Gerd Hoffmann <email address hidden>
    Reviewed-by: Marc-André Lureau <email address hidden>
    Message-id: <email address hidden>

:040000 040000 ed86f864314483660b3c2abe045361ae7c98a5dc f0d55b3e98dd0864d6686fba052d83ae5545d007 M hw

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Nice, thanks Leonardo for the Bisect - lets call upstream Fixed Committed then (as there is no 2.13 released yet).

For the separate gl discussion feel free to subscribe/chime in on bug 1657409

Currently 2.12 is on its way into Cosmic.
I'll add and test that patch afterwards, it looks small and safe to me.

Changed in qemu:
status: New → Fix Committed
Changed in qemu (Ubuntu):
status: Confirmed → Triaged
tags: added: qemu-18.10 server-next
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.12+dfsg-3ubuntu3

---------------
qemu (1:2.12+dfsg-3ubuntu3) cosmic; urgency=medium

  * d/p/lp-1755912-qxl-fix-local-renderer-crash.patch: Fix an issue triggered
    by migrations with UI frontends or frequent guest resolution changes
    (LP: #1755912)

 -- Christian Ehrhardt <email address hidden> Thu, 19 Jul 2018 08:26:52 +0200

Changed in qemu (Ubuntu):
status: Triaged → Fix Released
Changed in qemu (Ubuntu Bionic):
status: New → Triaged
Thomas Huth (th-huth)
Changed in qemu:
status: Fix Committed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Leonardo, or anyone else affected,

Accepted qemu into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:2.11+dfsg-1ubuntu7.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in qemu (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Leonardo Müller (leozinho29-eu) wrote :

Thank you for the effort to backport this bug fix, I have tested the proposed version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.5) and the crash when changing resolution is no longer happening. Unfortunately, after some time (about 10 minutes, much longer than without this fix), the guest freezes.

I reported that in https://bugs.launchpad.net/qemu/+bug/1787070

I'll change the tag to verification-done-bionic, as this particular issue was corrected. The other issue still exists in QEMU upstream.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: The code change already passed the general regression checks on the identical content against a PPA (Also on the weekend prior to the full maturing period I'll have another automated run on proposed).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Running the Bionic ISO like:
$ qemu-system-x86_64 -cpu host -smp cores=4,threads=2 -boot d -m 2048 -enable-kvm -vga qxl -vnc :21 -cdrom ubuntu-18.04-desktop-amd64.iso

Attaching like:
$ vncviewer FullColor=1 AutoSelect=0 10.245.168.42:5921
(alternatives on tigervnc)
Well for me it had "-k de" as well :-)

Then boot into the "try Ubuntu" live CD mode.
There opened a terminal to loop on xrandr.

Running the loop of above in that guest to crash it after a while.

Upgrade to proposed:
$ sudo apt install qemu-system-x86=1:2.11+dfsg-1ubuntu7.5
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
  samba vde2 qemu-block-extra sgabios ovmf
The following packages will be upgraded:
  qemu-system-x86
1 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
Need to get 5.168 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-proposed/main amd64 qemu-system-x86 amd64 1:2.11+dfsg-1ubuntu7.5 [5.168 kB]
Fetched 5.168 kB in 1s (7.666 kB/s)
(Reading database ... 127990 files and directories currently installed.)
Preparing to unpack .../qemu-system-x86_1%3a2.11+dfsg-1ubuntu7.5_amd64.deb ...
Unpacking qemu-system-x86 (1:2.11+dfsg-1ubuntu7.5) over (1:2.11+dfsg-1ubuntu7.4) ...
Setting up qemu-system-x86 (1:2.11+dfsg-1ubuntu7.5) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...

Running the loop again with the same setup - no crash in 15 minutes - assume that means good now.
I'd be glad about someone else checking the result as well, best someone formerly affected by it.
(and having a tick in the eye for seeing my right screen change sizes and flicker all the time)

Setting verified

tags: added: verification-done
removed: verification-needed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Leonardo for your check as well!
I agree that this particular issue here is fixed as we hoped.
The other one with the freeze did not occur for me, you might open another bug if you have any pointers how we could go on on this.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Arr reading is hard today, you had the other bug open already ... grml :-)

Never the less - This bug here is fixed in the proposed version and for now that is the important part.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Subscribed there as well now to stay on top of it if upstream gets to a conclusion.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu7.5

---------------
qemu (1:2.11+dfsg-1ubuntu7.5) bionic; urgency=medium

  [Christian Ehrhardt]
  * d/p/lp-1755912-qxl-fix-local-renderer-crash.patch: Fix an issue triggered
    by migrations with UI frontends or frequent guest resolution changes
    (LP: #1755912)

  [ Murilo Opsfelder Araujo ]
  * d/p/ubuntu/target-ppc-extend-eieio-for-POWER9.patch: Backport to
    extend eieio for POWER9 emulation (LP: #1787408).

 -- Christian Ehrhardt <email address hidden> Tue, 21 Aug 2018 11:25:45 +0200

Changed in qemu (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.