Repeatable oops qxl_enc_commit

Bug #1247906 reported by Dave Gilbert on 2013-11-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Unknown
Medium
linux (Ubuntu)
High
Unassigned

Bug Description

I've got an Ubuntu Trusty guest running under a Fedora 20-pre-beta, the guest oops reliably.

To repeat:
    Setup KVM with the guest configured with QXL graphics,
    Install openssh-server in the guest

   Boot it and then send a ctrl-alt-f1
   Problem 1 : Corrupt graphics instead of console

   Now ssh into the guest
   Send a ctrl-alt-f2
   Problem 2 : dmesg on the guest to see backtrace.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.12.0-1-generic 3.12.0-1.3
ProcVersionSignature: Ubuntu 3.12.0-1.3-generic 3.12.0-rc7
Uname: Linux 3.12.0-1-generic x86_64
ApportVersion: 2.12.6-0ubuntu1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon Nov 4 17:08:38 2013
HibernationDevice: RESUME=UUID=0190ef1f-ced8-4fbc-9fc3-bd9f73c329db
InstallationDate: Installed on 2013-10-20 (14 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Beta amd64 (20131012)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Bochs Bochs
MarkForUpload: True
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.12.0-1-generic root=UUID=3072ba2d-eda3-4789-9a31-38240b2aae52 ro quiet splash vt.handoff=7
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.12.0-1-generic N/A
 linux-backports-modules-3.12.0-1-generic N/A
 linux-firmware 1.117
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to trusty on 2013-11-02 (2 days ago)
dmi.bios.date: 01/01/2011
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2011:svnBochs:pnBochs:pvr:cvnBochs:ct1:cvr:
dmi.product.name: Bochs
dmi.sys.vendor: Bochs

Dave Gilbert (ubuntu-treblig) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Did this issue occur in a previous version of Ubuntu, or is this a new issue?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-saucy/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Dave Gilbert (ubuntu-treblig) wrote :
Download full text (4.1 KiB)

Answer to 1st question 1st;
 I've got this on both a saucy and a trusty guest, so not new in Trusty, but Raring doesn't seem to exhibit it

Linux saucy 3.12.0-031200-generic #201311031935 SMP Mon Nov 4 00:36:54 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
still oopsing - slightly different backtrace and it took three vt changes to trigger it:

[ 66.200441] Kernel BUG at ffffffffa01114e0 [verbose debug info unavailable]
[ 66.200445] invalid opcode: 0000 [#1] SMP
[ 66.200457] Modules linked in: bnep rfcomm bluetooth qxl ttm drm_kms_helper ppdev drm psmouse i2c_piix4 serio_raw virtio_balloon virtio_console lp microcode mac_hid parport_pc parport floppy
[ 66.200462] CPU: 0 PID: 1033 Comm: Xorg Not tainted 3.12.0-031200-generic #201311031935
[ 66.200464] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 66.200465] task: ffff880078c02f60 ti: ffff880078cd2000 task.ti: ffff880078cd2000
[ 66.200475] RIP: 0010:[<ffffffffa01114e0>] [<ffffffffa01114e0>] qxl_send_monitors_config+0x150/0x160 [qxl]
[ 66.200477] RSP: 0018:ffff880078cd3698 EFLAGS: 00010246
[ 66.200478] RAX: ffffc900003d8000 RBX: ffff88007a5c0000 RCX: ffffffffa011aeb8
[ 66.200480] RDX: ffffffffa011a640 RSI: ffffffffa011b769 RDI: ffff88007a5c0000
[ 66.200481] RBP: ffff880078cd36b8 R08: 0000000000000000 R09: 0000000000000000
[ 66.200483] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc900003d2004
[ 66.200484] R13: ffff88007986fd68 R14: ffff88007c39e000 R15: ffff8800795af700
[ 66.200487] FS: 00007fb411855980(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 66.200489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 66.200490] CR2: 00007fa4549980e0 CR3: 000000007957a000 CR4: 00000000000006f0
[ 66.200502] Stack:
[ 66.200506] ffff880079863000 ffff88007c39e420 0000000000000000 ffff88007a5c0000
[ 66.200509] ffff880078cd3718 ffffffffa01117f0 0000030000000000 0000000000000400
[ 66.200512] ffff880000000300 ffff880000000001 0000000000000000 ffff88007a5c0000
[ 66.200513] Call Trace:
[ 66.200521] [<ffffffffa01117f0>] qxl_write_monitors_config_for_encoder+0x130/0x210 [qxl]
[ 66.200527] [<ffffffffa01118eb>] qxl_enc_commit+0x1b/0x40 [qxl]
[ 66.200534] [<ffffffffa003c4c1>] drm_crtc_helper_set_mode+0x431/0x5c0 [drm_kms_helper]
[ 66.200542] [<ffffffffa003d74a>] drm_crtc_helper_set_config+0x8fa/0xb70 [drm_kms_helper]
[ 66.200561] [<ffffffffa00ae5cc>] drm_mode_set_config_internal+0x5c/0xe0 [drm]
[ 66.200567] [<ffffffffa003b4a6>] drm_fb_helper_set_par+0x66/0xe0 [drm_kms_helper]
[ 66.200572] [<ffffffff813e7c03>] fb_set_var+0x283/0x3a0
[ 66.200578] [<ffffffff810a5d05>] ? check_preempt_wakeup+0x165/0x260
[ 66.200581] [<ffffffff810a4751>] ? update_curr+0x141/0x200
[ 66.200585] [<ffffffff8101e3ee>] ? __switch_to_xtra+0x14e/0x180
[ 66.200589] [<ffffffff813f56a4>] fbcon_blank+0x1e4/0x2d0
[ 66.200594] [<ffffffff81475190>] do_unblank_screen.part.17+0xa0/0x180
[ 66.200597] [<ffffffff814752b8>] do_unblank_screen+0x48/0x80
[ 66.200601] [<ffffffff8146a4d5>] complete_change_console+0x65/0xf0
[ 66.200605] [<ffffffff8146b68c>] vt_ioctl+0x112c/0x11d0
[ 66.200608] [<ffffffff810963b3>] ? __wake_up+0x53/0x70
[ 66.200613] [<...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

v3.8 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-raring
v3.9 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-saucy
v3.10 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-saucy
v3.11-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11-rc1-saucy/

You don't have to test every kernel, just up until the kernel that first has this bug.

Thanks in advance!

tags: added: performing-bisect
Changed in linux (Ubuntu):
importance: Medium → High
Dave Gilbert (ubuntu-treblig) wrote :

Linux trusty 3.12.0-1-generic #3-Ubuntu SMP Tue Oct 29 18:41:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
  corrupted console on 1st c-a-f1, oops on c-a-f3 (even via grub menu)

Linux trusty 3.10.0-031000-generic #201306301935 SMP Sun Jun 30 23:36:16 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
  got oops on 4th console change
Linux trusty 3.9.0-030900-generic #201305071030 SMP Tue May 7 14:32:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
  corrupted consoles - but no oops
Linux trusty 3.8.0-030800-generic #201302181935 SMP Tue Feb 19 00:36:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
  corrupted consoles - but no oops

so the oops itself looks like it landed between 3.9.0 and 3.10.0, but I'm suspicious that the corrupted consoles in all of them are symptoms of the same bug.

(an fc19 guest running 3.11.6 doesn't oops and has nice lean consoles).
I'll dig into this a bit.

Dave Gilbert (ubuntu-treblig) wrote :

Today's drm-next kernel also fails; linux-headers-3.12.0-996_3.12.0-996.201311070425_all.deb

Download full text (6.8 KiB)

I'm running a FC20 x86-64 pre-beta with an Ubuntu guest under KVM
with spice and can reliably trigger an oops in the guest.
The host is running qemu-kvm-1.6.1-1.fc20.x86_64

The oops happens on both Ubuntu's distro kernels (since about 3.10) and anything else recent including current drm-next (212c444ba 7th November) that I've built.
The user space is Ubuntu Trusty, and X (with Unity etc) works fine.

Note there is also a corrupt text console prior to the oops.

To trigger:
Boot guest and let it sit at lightdm
ssh in
send a ctrl-alt-f1 via virt-manager
 * see a very corrupt text console
send a ctrl-alt-f2
(might oops at this point - check with dmesg via the ssh)
send a ctrl-alt-f3
send a ctrl-alt-f4

I've never had it get past the 4th one without oopsing, with debug on it does it at the second switch.

Here is a log which I turned some drm debug on;

It is sitting at lightdm waiting for me to log in, so I ssh in and do:
echo 255 > debug
and do ctrl-alt-f1

[ 266.165815] [drm:drm_crtc_helper_set_config],
[ 266.165817] [drm:drm_crtc_helper_set_config], [CRTC:3] [FB:33] #connectors=1 (x y) (0 0)
[ 266.165821] [drm:drm_crtc_helper_set_config], crtc has no fb, full mode set
[ 266.165823] [drm:qxl_best_encoder],
[ 266.165823] [drm:drm_crtc_helper_set_config], encoder changed, full mode switch
[ 266.165824] [drm:drm_crtc_helper_set_config], crtc changed, full mode switch
[ 266.165825] [drm:drm_crtc_helper_set_config], [CONNECTOR:4:Virtual-1] to [CRTC:3]
[ 266.165826] [drm:drm_crtc_helper_set_config], attempting to set mode from userspace
[ 266.165828] [drm:drm_mode_debug_printmodeline], Modeline 32:"1024x768" 60 63500 1024 1072 1176 1328 768 771 775 798 0x8 0x6
[ 266.165830] [drm:qxl_enc_mode_fixup],
[ 266.165845] [drm:drm_crtc_helper_set_mode], [CRTC:3]
[ 266.165846] [drm:qxl_enc_prepare],
[ 266.165847] [drm:qxl_enc_dpms],
[ 266.165847] [drm:qxl_enc_dpms],
[ 266.165848] [drm:qxl_enc_dpms],
[ 266.165849] [drm:qxl_crtc_prepare], current: 1024x768+0+0 (1).
[ 266.165850] [drm:qxl_crtc_mode_set], 0x0: not a native mode
[ 266.165851] [drm:qxl_crtc_mode_set], +0+0 (1024,768) => (1024,768)

We have now got a heavily corrupt text console (nothing readable)

I then do a ctrl-alt-f2 here.
[ 276.164189] [drm:qxl_monitors_config_set], 0:1024x768+0+0
[ 276.164207] [drm:drm_crtc_helper_set_mode], [ENCODER:5:Virtual-5] set [MODE:32:1024x768]
[ 276.164209] [drm:qxl_enc_mode_set],
[ 276.164212] [drm:qxl_crtc_commit],
[ 276.164215] [drm:qxl_write_monitors_config_for_encoder], setting head 0 to +0+0 1024x768 out of 1
[ 276.164239] ------------[ cut here ]------------
[ 276.164240] Kernel BUG at ffffffffa00c42d6 [verbose debug info unavailable]
[ 276.164244] invalid opcode: 0000 [#1] SMP
[ 276.164267] Modules linked in: rfcomm bnep bluetooth ppdev(F) nfsd(F) auth_rpcgss(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) snd_hda_intel snd_hda_codec snd_hwdep(F) snd_pcm(F) microcode(F) psmouse(F) snd_page_alloc(F) serio_raw(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) virtio_console snd_seq(F) snd_seq_device(F) snd_timer(F) snd(F) soundcore(F) qxl parport_pc(F) ttm drm_kms_helper drm i2c_piix4 mac_hid lp(F) parport(F) floppy(F)
[ 2...

Read more...

Dave Gilbert (ubuntu-treblig) wrote :

Also fails in set I built from drm-next, so I've reported it upstream in the spice/qxl bug tracker and added the link here

Download full text (6.0 KiB)

The heavily corrupted console got me thinking and there's a more telling/simpler
way to see the problem:

Boot guest to lighdm

ssh in twice and get root.

in the 1st ssh do a chvt 1
  This doesn't return

so that's probably the underlying problem.
In the 2nd vt I did an
echo t > /proc/sysrq-trigger

and for chvt I got:

[ 85.553746] chvt S ffff88007fd14500 0 1800 1799 0x00000000
[ 85.553746] ffff88006b8ddd08 0000000000000002 ffff88006b8ddfd8 0000000000014500
[ 85.553746] ffff88006b8ddfd8 0000000000014500 ffff880067815ec0 ffff88006b8ddd9c
[ 85.553746] ffff880067815ec0 0000000000005607 ffff880036991c00 00000000fffffffa
[ 85.553746] Call Trace:
[ 85.553746] [<ffffffff81710659>] schedule+0x29/0x70
[ 85.553746] [<ffffffff8145409a>] __vt_event_wait.isra.0.part.1+0x5a/0x90
[ 85.553746] [<ffffffff81089020>] ? wake_up_atomic_t+0x30/0x30
[ 85.553746] [<ffffffff81454285>] vt_waitactive+0x65/0xb0
[ 85.553746] [<ffffffff8106e069>] ? ns_capable+0x29/0x50
[ 85.553746] [<ffffffff81454bf7>] vt_ioctl+0x7b7/0x11c0
[ 85.553746] [<ffffffff81448d5d>] tty_ioctl+0x26d/0xbc0
[ 85.553746] [<ffffffff8104f46f>] ? kvm_clock_read+0x1f/0x30
[ 85.553746] [<ffffffff8101b8a9>] ? sched_clock+0x9/0x10
[ 85.553746] [<ffffffff8109b45d>] ? sched_clock_local+0x1d/0x80
[ 85.553746] [<ffffffff811c4615>] do_vfs_ioctl+0x2e5/0x4d0
[ 85.553746] [<ffffffff8109c0b4>] ? vtime_account_user+0x54/0x60
[ 85.553746] [<ffffffff811c4881>] SyS_ioctl+0x81/0xa0
[ 85.553746] [<ffffffff8171ba7f>] tracesys+0xe1/0xe6

with the X processes in:
[ 85.553746] Xorg x ffff88007fc14500 0 950 928 0x00000000
[ 85.553746] ffff88006e48b510 0000000000000002 ffff88006e48bfd8 0000000000014500
[ 85.553746] ffff88006e48bfd8 0000000000014500 ffff880078968000 ffff880078968650
[ 85.553746] ffff880078967ff0 ffff88006d995ec0 ffff880078967ff0 ffff880078968000
[ 85.553746] Call Trace:
[ 85.553746] [<ffffffff81710659>] schedule+0x29/0x70
[ 85.553746] [<ffffffff81066edf>] do_exit+0x6ff/0xa50
[ 85.553746] [<ffffffff817142af>] oops_end+0xaf/0x150
[ 85.553746] [<ffffffff810172bb>] die+0x4b/0x70
[ 85.553746] [<ffffffff817139f0>] do_trap+0x60/0x170
[ 85.553746] [<ffffffff81014512>] do_invalid_op+0xa2/0x100
[ 85.553746] [<ffffffffa00d12d6>] ? qxl_send_monitors_config+0x136/0x140 [qxl]
[ 85.553746] [<ffffffff81088ec8>] ? finish_wait+0x58/0x70
[ 85.553746] [<ffffffffa00d4a2a>] ? wait_for_io_cmd_user+0x20a/0x3c0 [qxl]
[ 85.553746] [<ffffffff8171d09e>] invalid_op+0x1e/0x30
[ 85.553746] [<ffffffffa00d12d6>] ? qxl_send_monitors_config+0x136/0x140 [qxl]
[ 85.553746] [<ffffffffa00d15da>] qxl_enc_commit+0x12a/0x220 [qxl]
[ 85.553746] [<ffffffffa00ac1b1>] drm_crtc_helper_set_mode+0x381/0x510 [drm_kms_helper]
[ 85.553746] [<ffffffffa00ad7d5>] drm_crtc_helper_set_config+0x9c5/0xb20 [drm_kms_helper]
[ 85.553746] [<ffffffffa00545fd>] drm_mode_set_config_internal+0x5d/0xe0 [drm]
[ 85.553746] [<ffffffffa00ab681>] drm_fb_helper_set_par+0x71/0xf0 [drm_kms_helper]
[ 85.553746] [<ffffffff813d1db1>] fb_set_var+0x191/0x430
[ 85.553746] [<ffffffff8109694d>] ? ttwu_do_activate.constprop.75+0x5d/0x...

Read more...

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Dave Gilbert (ubuntu-treblig) wrote :

This has 'gone away' on trusty but not saucy; I think it's the X server update, but can't be sure.
Looking at the debug I sent upstream I'm wondering if the problem is the X server stopping the VT change happening
as the 1st problem, and then things going down hill from there.

On the Ubuntu 'Trusty' guest this problem has still gone away, but it's still there with the 'Saucy' guest; Trusty has just had an X and spice update - so looking at that last trace I posted I wonder if the problem is X stopping the first chvt from working and then once in that state further chvt's breaking things?
(I guess it going away is a good thing - but if the kernel oops was still triggerable with a bad X server I guess that's still a problem)

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/spice/spice-gtk/issues/45.

Changed in linux:
status: Confirmed → Unknown
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.