20170817 - ISO hangs on boot on qemu with splash screen enabled and qxl graphics driver

Bug #1711358 reported by Jean-Baptiste Lallement on 2017-08-17
64
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Seth Forshee
plymouth (Ubuntu)
Undecided
Unassigned
ubiquity (Ubuntu)
High
Unassigned

Bug Description

Test Case
1. Boot artful desktop 20170817 under qemu with qxl
2. Wait until ubiquity-dm is displayed

Expected result
It boots

Actual result
It hangs on the splash screen and the dots below the Ubuntu logo are not blinking. If the option 'splash' is removed from the boot command line it boots successfully.

It also boots successfully with -vga std but in this case the splash screen is in text mode not graphical unlike with the qxl driver.

- It started with kernel 4.12.
- It is still happening with kernel 4.13.
- It only happens when booting from an ISO and not on an installed system.
- It works with cirrus/vga, but not qxl
- It works if nosplash is set (or splash removed from the boot command line)
- It works on Xenial but not on Artful

CVE References

Jean-Baptiste Lallement (jibel) wrote :

manifest of the last known good image

Jean-Baptiste Lallement (jibel) wrote :

Manifest of the first broken image

Jean-Baptiste Lallement (jibel) wrote :

diff between the manifests

tags: added: artful
description: updated
summary: - 20170817 - ISO hangs on boot with splash screen enabled
+ 20170817 - ISO hangs on boot on qemu with splash screen enabled and qxl
+ graphics driver
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubiquity (Ubuntu):
status: New → Confirmed

I can confirm the issue and dup'ed bug 1714638 onto this where people assumed first it would be an error in the virst stack.
From there TL;DR:
- works with cirrus/vga, but not qxl
- works if nosplash is set
- works on Xenial but not on Artful

Kev Bowring (flocculant) wrote :

Whatever it is causing the issue - it's not specific to Ubuntu as Xubuntu affected as well.

Changed in ubiquity (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Confirmed

Realized a plymouth task would be right when I checked for the bug number to pass it along.

Iain Lane (laney) on 2017-09-12
tags: added: rls-aa-incoming
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in plymouth (Ubuntu):
status: New → Confirmed
tags: added: rls-aa-notfixing
removed: rls-aa-incoming
Brian Murray (brian-murray) wrote :

A comment in a duplicate bug of this states the following:

"Virsh adds video as default like:
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
    </video>

Matched the cmdline: -device qxl-vga

The default of libvirt is:
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
    </video>

Matches the cmdline: -device cirrus-vga,id=video0

Switching back to the default cirros graphics console makes it working again.

Also type='vga' which matched qemu-cmdline "std" works.
So both qemu defaults (cirrus = old, vga = new), and the libvirt default (cirrus) work.
But the qxl as selected by virt-manager fails."

So it seems like this problem will only occur if virt-manager is used to create the virtual machine. Am I understanding that correctly?

It happens even if you start a VM directly with qemu from the command line with the qxl driver (option -vga qxl) independently of virt-manager.

For example, the following command will hang on boot:
/usr/bin/qemu-system-x86_64 -m 2G -smp 2 -localtime -no-reboot -cpu core2duo -enable-kvm -drive file=/tmp/disk.img,if=ide,media=disk -boot menu=on -soundhw all -display sdl -vga qxl -monitor stdio -cdrom /home/j-lallement/iso/ubuntu/artful-desktop-amd64.iso

@bdmurray, I'd like you to reconsider the notfixing tag, it's a real issue for QA because qxl is the only driver that works with wayland. The system falls back to X with the standard vga driver and does not boot after installation with cirrus.

description: updated
Stefan Bader (smb) wrote :

Unfortunately I don't seem to have access to the latest images and this happening only when booting the ISO makes it hard(er) to grab data. I did finally succeed by using "virsh dump --memory-only <domain> <file>" and then "sudo strings <file>|less". And it looks like the qxl drm driver is crashing. I put the data here for reference but if you could do the same with an iso containing the 4.13 kernel, that would be great.

kernel BUG at /build/linux-cK2WUa/linux-4.12.0/drivers/gpu/drm/ttm/ttm_bo_util.
c:589!
invalid opcode: 0000 [#1] SMP
Modules linked in: overlay nls_utf8 isofs dm_mirror dm_region_hash dm_log qxl tt
m drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse virti
o_blk virtio_net pata_acpi floppy
CPU: 0 PID: 264 Comm: plymouthd Not tainted 4.12.0-13-generic #14-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu
1 04/01/2014
task: ffff8e62faee5500 task.stack: ffffa5bc40614000
RIP: 0010:ttm_bo_kmap+0x1b5/0x260 [ttm]
RSP: 0018:ffffa5bc40617bb8 EFLAGS: 00010283
RAX: ffff8e62fa364e90 RBX: ffff8e62fa333c00 RCX: ffff8e62fa333e90
RDX: 0000000000000300 RSI: 0000000000000000 RDI: ffff8e62fa333c58
RBP: ffffa5bc40617bf8 R08: ffff8e62fa333d28 R09: 0000000000000400
R10: 0000000000000008 R11: 000000000000157d R12: ffff8e62fa39e6b0
R13: 0000000000000000 R14: ffff8e62fa2f3cf8 R15: 0000000000000000
FS: 00007f2a9f914b80(0000) GS:ffff8e62fde00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056218cba50eb CR3: 000000007a276000 CR4: 00000000000006f0
Call Trace:
 ? qxl_bo_kunmap_atomic_page+0x85/0x90 [qxl]
 qxl_bo_kmap+0x42/0x70 [qxl]F(9m
 qxl_draw_dirty_fb+0x1f5/0x420 [qxl]>19m
 qxl_framebuffer_surface_dirty+0xa0/0xf0 [qxl]
 ? __kmalloc+0x1bb/0x1f0
 drm_mode_dirtyfb_ioctl+0x17e/0x1c0 [drm]
 drm_ioctl+0x213/0x4d0 [drm]
 ? drm_mode_getfb+0x110/0x110 [drm]
 ? __hrtimer_init+0xb0/0xb0
 do_vfs_ioctl+0xa5/0x610
 ? wake_up_q+0x80/0x80
 SyS_ioctl+0x79/0x90i
 entry_SYSCALL_64_fastpath+0x1e/0xa9
RIP: 0033:0x7f2a9f0024d7
RSP: 002b:00007ffd3c26bf98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd3c34c080 RCX: 00007f2a9f0024d7
RDX: 00007ffd3c26bfd0 RSI: 00000000c01864b1 RDI: 0000000000000009
RBP: 00007ffd3c26bff0 R08: ffffffffffffff98 R09: 00007ffd3c26bf70
R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000b64
R13: 00007ffd3c26c00c R14: 0000000000000000 R15: 00007f2a9f6f72d0

Trace with 4.13
kernel BUG at /build/linux-jlxc7t/linux-4.13.0/drivers/gpu/drm/ttm/ttm_bo_util.c:589!
invalid opcode: 0000 [#1] SMP
Modules linked in: overlay nls_utf8 isofs dm_mirror dm_region_hash dm_log qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse pata_acpi e1000 floppy
CPU: 0 PID: 210 Comm: plymouthd Not tainted 4.13.0-11-generic #12-UbuntuJ
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
task: ffff9b5639bb9600 task.stack: ffffbac6806ac000
RIP: 0010:ttm_bo_kmap+0x1b5/0x260 [ttm]
RSP: 0018:ffffbac6806afb80 EFLAGS: 00010202
RAX: ffff9b5639831f90 RBX: ffff9b5639b18000 RCX: ffff9b5639b18290
RDX: 0000000000000300 RSI: 0000000000000000 RDI: ffff9b5639b18058
RBP: ffffbac6806afbc0 R08: ffff9b5639b18128 R09: 0000000000000400
R10: 0000000000000008 R11: 0000000000001559 R12: ffff9b5639e206a8
R13: 0000000000000000 R14: ffff9b563ab52b40 R15: 0000000000000000
FS: 00007f203aef2b80(0000) GS:ffff9b563dc00000(0000) knlGS:0000000000000000^
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f245a8d3f20 CR3: 000000007a0a7000 CR4: 00000000000006f0
Call Trace:
 ? qxl_bo_kunmap_atomic_page+0x85/0x90 [qxl]-c
 qxl_bo_kmap+0x42/0x70 [qxl]ut
 qxl_draw_dirty_fb+0x1f5/0x420 [qxl]
 qxl_framebuffer_surface_dirty+0xa0/0xf0 [qxl]
 ? __kmalloc+0x1c5/0x200o
 drm_mode_dirtyfb_ioctl+0x17e/0x1c0 [drm]
 ? drm_mode_getfb+0x110/0x110 [drm]
 drm_ioctl_kernel+0x5d/0xb0 [drm]
 drm_ioctl+0x31b/0x3d0 [drm].
 ? drm_mode_getfb+0x110/0x110 [drm]
 ? ep_poll+0xa7/0x3b0
 do_vfs_ioctl+0xa5/0x610H
 ? wake_up_q+0x80/0x80
 SyS_ioctl+0x79/0x90
 entry_SYSCALL_64_fastpath+0x1e/0xa9.
RIP: 0033:0x7f203a5cedd7
RSP: 002b:00007ffdbe6da488 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000564f2479f3a0 RCX: 00007f203a5cedd7
RDX: 00007ffdbe6da4c0 RSI: 00000000c01864b1 RDI: 0000000000000009
RBP: 0000000000000002 R08: ffffffffffffff80 R09: 00007ffdbe6da460
R10: 000000000000000a R11: 0000000000000246 R12: 00007ffdbe6da760
R13: 00000000ffffffff R14: 0000000000000000 R15: 00007f203acd32d0
Code: d0 49 8b be 80 00 00 00 48 c1 e6 0c 41 f6 46 62 04 74 4a 49 03 7e 70 4c 01 e7 e8 97 33 b3 ca 48 89 03 44 8b 45 d0 e9 18 ff ff ff <0f> 0b 4b 8d 7c 2c 58 44 89 45 c4 e8 0b b0 3c cb 44 8b 45 c4 e9
&RIP: ttm_bo_kmap+0x1b5/0x260 [ttm] RSP: ffffbac6806afb80

This bug is also reproducible with the end user stage of an OEM installation which runs ubiquity-dm like a boot from an ISO.

I think 4.14 (from Canonical kernel team's ppa) fixes the issue. I installed this kernel during the OEM prepare stage, then the image boots fine during the end user setup.

Stefan Bader (smb) wrote :

Hm, would be good if you noted down more exactly which version/ppa. I assume it is the mainline builds and likely 4.14~rc2.

Stefan Bader (smb) wrote :

Just as an update as I do not know how much gets accomplished here this week. The info from the crash indicates that some swap linked list which is supposed to be empty seems to have some elements on it. There is no obvious change related to that in 4.14~rc1 (which is the upstream version the tested kernel is based on). So we need to figure out what would be the change which we need. Some reference counting feels like maybe could be related but hard to say and we might need to bisect to find it.

Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1711358

tags: added: iso-testing
Stefan Bader (smb) wrote :

So I think I can now at least explain the weirdness. Booting from the ISO as well as the first boot in OEM mode seem to be in a way where grub does not use graphics mode and hands that off to a vt. In that case, when the kernels starts there will be no VESA framebuffer which later gets switched over to qxl framebuffer.

So what I think broke is the init of the framebuffer drawing context. Which potentially is not needed when it is switched over from VESA to qxl. As soon as there was one successful boot in OEM mode, it looks like graphics handoff is used. So after that both 4.12 and 4.14 (and likely 4.13 as well) will successfully boot as well.

One way I was able to make this work immediately was to interrupt the boot and replace $vt_handoff by vt.handoff=7 (or =1). And then boot on with that setting. In the crashed boots this was not set at all.

Seth Forshee (sforshee) wrote :

I ran a bisect which identifies this as the first commit with this bug:

3538e80a869b drm: qxl: Atomic phase 1: Implement mode_set_nofb

Seth Forshee (sforshee) wrote :

I found that a fix for this was comitted upstream quite recently, cherry picked this to artful and it resolves the issue in my testing. Now applied to artful/master-next.

Changed in linux (Ubuntu):
assignee: nobody → Seth Forshee (sforshee)
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.13.0-16.19

---------------
linux (4.13.0-16.19) artful; urgency=low

  * 20170817 - ISO hangs on boot on qemu with splash screen enabled and qxl
    graphics driver (LP: #1711358)
    - qxl: fix framebuffer unpinning

  * [Bug] USB controller failed to respond on Denverton after loading
    intel_th_pci module (LP: #1715833)
    - SAUCE: PCI: Disable broken RTIT_BAR of Intel TH

  * CVE-2017-5123
    - waitid(): Add missing access_ok() checks

 -- Seth Forshee <email address hidden> Wed, 11 Oct 2017 12:33:10 -0500

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers