Xorg indefinitely hangs in kernelspace at least 2-3 times a day

Bug #1813620 reported by Jaak Ristioja
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

The graphical environment seems to hang when scrolling web pages in Firefox. Can't reliably reproduce during every scrolling, but it has happened at least 5 times in the past 7 days. Happens in Kubuntu VM-s running under KVM and QEMU (3.1.0) on multiple host machines.

This issue only started to happen sometime this month. First these VM-s were running 18.04 so I was hoping that updating them to 18.10 would help, but it didn't. I'm guessing that some kernel update for both bionic and cosmic causes this.

Since this bug causes the graphical environment to become totally unresponsive, this has already already cost me some lost data, but very fortunately not much.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: linux-image-4.18.0-13-generic 4.18.0-13.14
ProcVersionSignature: Ubuntu 4.18.0-13.14-generic 4.18.17
Uname: Linux 4.18.0-13-generic x86_64
ApportVersion: 2.20.10-0ubuntu13.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: jotik 1962 F.... pulseaudio
Date: Mon Jan 28 18:09:57 2019
HibernationDevice: RESUME=UUID=9dfc65ab-a122-4254-8806-a580fea3c953
InstallationDate: Installed on 2015-10-18 (1197 days ago)
InstallationMedia: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.18.0-13-generic root=UUID=78ee528a-a065-4522-b2ca-a1959850a740 ro console=ttyS0 verbose
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-13-generic N/A
 linux-backports-modules-4.18.0-13-generic N/A
 linux-firmware 1.175.1
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to cosmic on 2019-01-20 (7 days ago)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.11.0-20180624_124920-prayer
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-2.3
dmi.modalias: dmi:bvnSeaBIOS:bvr1.11.0-20180624_124920-prayer:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-2.3:cvnQEMU:ct1:cvrpc-i440fx-2.3:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-2.3
dmi.sys.vendor: QEMU
---
ProblemType: Bug
ApportVersion: 2.20.10-0ubuntu27.1
Architecture: amd64
DistroRelease: Ubuntu 19.04
InstallationDate: Installed on 2015-10-18 (1387 days ago)
InstallationMedia: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
Package: linux (not installed)
Tags: disco
Uname: Linux 5.2.0-050200rc1-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: Upgraded to disco on 2019-04-16 (112 days ago)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True

Revision history for this message
Jaak Ristioja (jotik) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jaak Ristioja (jotik) wrote : Re: Xorg indefinitely hangs in kernelspace during scrolling a web page in Firefox

This still occurs with 19.04 (Disco Dingo) and kernel 5.0.0-13-generic, and does not seem to be Firefox specific. I've also had this crash when using some KDE apps. This issue seems to occurs at 2-3 times a day on frequent use, each time requiring a forced reboot to fix (because Ubuntu never reboots gracefully when Xorg is hanging in kernelspace).

[170138.010150] INFO: task Xorg:879 blocked for more than 120 seconds.
[170138.011257] Not tainted 5.0.0-13-generic #14-Ubuntu
[170138.012182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[170138.013506] Xorg D 0 879 790 0x00400004
[170138.013508] Call Trace:
[170138.013513] __schedule+0x2d0/0x840
[170138.013514] schedule+0x2c/0x70
[170138.013516] schedule_preempt_disabled+0xe/0x10
[170138.013517] __ww_mutex_lock.isra.11+0x3e0/0x750
[170138.013519] __ww_mutex_lock_slowpath+0x16/0x20
[170138.013520] ww_mutex_lock+0x34/0x50
[170138.013525] ttm_eu_reserve_buffers+0x1f9/0x2e0 [ttm]
[170138.013528] qxl_release_reserve_list+0x67/0x150 [qxl]
[170138.013530] ? qxl_bo_pin+0x11d/0x200 [qxl]
[170138.013532] qxl_cursor_atomic_update+0x1b0/0x2e0 [qxl]
[170138.013540] drm_atomic_helper_commit_planes+0xb9/0x220 [drm_kms_helper]
[170138.013544] drm_atomic_helper_commit_tail+0x2b/0x70 [drm_kms_helper]
[170138.013547] commit_tail+0x67/0x70 [drm_kms_helper]
[170138.013555] drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
[170138.013570] drm_atomic_commit+0x4a/0x50 [drm]
[170138.013574] drm_atomic_helper_update_plane+0xe9/0x100 [drm_kms_helper]
[170138.013581] __setplane_atomic+0xd6/0x120 [drm]
[170138.013588] drm_mode_cursor_universal+0x145/0x270 [drm]
[170138.013596] drm_mode_cursor_common+0x18f/0x200 [drm]
[170138.013603] ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
[170138.013609] drm_mode_cursor2_ioctl+0xe/0x10 [drm]
[170138.013615] drm_ioctl_kernel+0xad/0xf0 [drm]
[170138.013616] ? ___sys_recvmsg+0x16c/0x200
[170138.013622] drm_ioctl+0x233/0x410 [drm]
[170138.013629] ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
[170138.013631] ? ep_scan_ready_list.constprop.23+0x1f0/0x200
[170138.013633] do_vfs_ioctl+0xa9/0x640
[170138.013634] ? __sys_recvmsg+0x88/0xa0
[170138.013635] ksys_ioctl+0x67/0x90
[170138.013637] __x64_sys_ioctl+0x1a/0x20
[170138.013638] do_syscall_64+0x5a/0x110
[170138.013639] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[170138.013640] RIP: 0033:0x7f3a80734417
[170138.013644] Code: Bad RIP value.
[170138.013645] RSP: 002b:00007ffcae8e3488 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[170138.013646] RAX: ffffffffffffffda RBX: 000055aaa657c610 RCX: 00007f3a80734417
[170138.013647] RDX: 00007ffcae8e34c0 RSI: 00000000c02464bb RDI: 000000000000000e
[170138.013647] RBP: 00007ffcae8e34c0 R08: 0000000000000040 R09: 0000000000000010
[170138.013647] R10: 000000000000003f R11: 0000000000003246 R12: 00000000c02464bb
[170138.013648] R13: 000000000000000e R14: 0000000000000000 R15: 000055aaa657a450

summary: - Xorg indefinitely hangs in kernelspace during scrolling a web page in
- Firefox
+ Xorg indefinitely hangs in kernelspace at least 2-3 times a day
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v5.1-rc7 kernel [0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed”, and attach dmesg.

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.1-rc7/

Revision history for this message
Jaak Ristioja (jotik) wrote :

Still happens with 5.1-rc7.

[175938.066756] INFO: task Xorg:903 blocked for more than 120 seconds.
[175938.074630] Not tainted 5.1.0-050100rc7-generic #201904282131
[175938.094726] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[175938.100947] Xorg D 0 903 806 0x00400004
[175938.100955] Call Trace:
[175938.100961] __schedule+0x2d3/0x840
[175938.100963] schedule+0x2c/0x70
[175938.100964] schedule_preempt_disabled+0xe/0x10
[175938.100966] __ww_mutex_lock.isra.11+0x3e0/0x750
[175938.100968] __ww_mutex_lock_slowpath+0x16/0x20
[175938.100969] ww_mutex_lock+0x34/0x50
[175938.100974] ttm_eu_reserve_buffers+0x1f9/0x2e0 [ttm]
[175938.100978] qxl_release_reserve_list+0x67/0x150 [qxl]
[175938.100980] ? qxl_bo_pin+0xaa/0x190 [qxl]
[175938.100982] qxl_cursor_atomic_update+0x1b0/0x2e0 [qxl]
[175938.100990] drm_atomic_helper_commit_planes+0xb9/0x220 [drm_kms_helper]
[175938.100994] drm_atomic_helper_commit_tail+0x2b/0x70 [drm_kms_helper]
[175938.100998] commit_tail+0x67/0x70 [drm_kms_helper]
[175938.101006] drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper]
[175938.101019] drm_atomic_commit+0x4a/0x50 [drm]
[175938.101023] drm_atomic_helper_update_plane+0xe9/0x100 [drm_kms_helper]
[175938.101031] __setplane_atomic+0xd3/0x120 [drm]
[175938.101039] drm_mode_cursor_universal+0x142/0x270 [drm]
[175938.101047] drm_mode_cursor_common+0x18e/0x200 [drm]
[175938.101072] ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
[175938.101079] drm_mode_cursor2_ioctl+0xe/0x10 [drm]
[175938.101085] drm_ioctl_kernel+0xb0/0x100 [drm]
[175938.101088] ? ___sys_recvmsg+0x16c/0x200
[175938.101094] drm_ioctl+0x233/0x410 [drm]
[175938.101101] ? drm_mode_cursor_ioctl+0x60/0x60 [drm]
[175938.101103] ? timerqueue_add+0x57/0x90
[175938.101105] ? enqueue_hrtimer+0x3c/0x90
[175938.101107] do_vfs_ioctl+0xa9/0x640
[175938.101109] ? fput+0x13/0x20
[175938.101110] ? __sys_recvmsg+0x88/0xa0
[175938.101111] ksys_ioctl+0x67/0x90
[175938.101112] __x64_sys_ioctl+0x1a/0x20
[175938.101114] do_syscall_64+0x5a/0x110
[175938.101115] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[175938.101117] RIP: 0033:0x7fb8ae27b417
[175938.101120] Code: Bad RIP value.
[175938.101121] RSP: 002b:00007ffefbee6878 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[175938.101122] RAX: ffffffffffffffda RBX: 0000557c8e388610 RCX: 00007fb8ae27b417
[175938.101122] RDX: 00007ffefbee68b0 RSI: 00000000c02464bb RDI: 000000000000000e
[175938.101123] RBP: 00007ffefbee68b0 R08: 0000000000000040 R09: 0000000000000004
[175938.101123] R10: 000000000000003f R11: 0000000000003246 R12: 00000000c02464bb
[175938.101124] R13: 000000000000000e R14: 0000000000000000 R15: 0000557c8e386450

tags: added: kernel-bug-exists-upstream
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Sorry for the belated reply. Does this issue still happen on 5.2-rc1? If the issue still exists, please raise the issue to the following maintainers via email:
Dave Airlie <email address hidden> (maintainer:DRM DRIVER FOR QXL VIRTUAL GPU)
Gerd Hoffmann <email address hidden> (maintainer:DRM DRIVER FOR QXL VIRTUAL GPU)
David Airlie <email address hidden> (maintainer:DRM DRIVERS)
Daniel Vetter <email address hidden> (maintainer:DRM DRIVERS)
<email address hidden> (open list:DRM DRIVER FOR QXL VIRTUAL GPU)
<email address hidden> (open list:DRM DRIVER FOR QXL VIRTUAL GPU)
<email address hidden> (open list:DRM DRIVERS)
<email address hidden> (open list)

Revision history for this message
Jaak Ristioja (jotik) wrote : ProcCpuinfoMinimal.txt

apport information

tags: added: apport-collected disco
description: updated
Revision history for this message
Jaak Ristioja (jotik) wrote : ProcEnviron.txt

apport information

Revision history for this message
Jaak Ristioja (jotik) wrote :

Sorry for the spam. I'm not used to those Ubuntu bug reporting tools.

I've been running 5.2.0-050200rc1-generic ever since the suggestion to try the mainline RC. This bug did not re-occur until now. Here is a new dmesg.

Revision history for this message
Jaak Ristioja (jotik) wrote :

Reported by e-mail to recipients listed in comment 6. Here's a link to the post in the archive of the Linux Foundation virtualization mailing list:

https://lists.linuxfoundation.org/pipermail/virtualization/2019-August/043083.html

Revision history for this message
Jesse Brandeburg (jesse-brandeburg) wrote :

Here is a better link that might allow anyone who wants to review the thread or use a local email client to reply https://<email address hidden>/

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Jaak Ristioja (jotik) wrote :
Download full text (22.3 KiB)

I found that the latest drm-tip kernel for amd64 was at https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/2019-11-08/ as the more recent ones had failed to compile according to the READMEs.

The 2019-11-08 kernel (5.4.0-994-generic) boots fine, but after logging in, running a GUI application and some clicks the screen goes black, and the kernel logs this:

[ 24.557334] [TTM] Out of kernel memory
[ 24.558380] qxl 0000:00:02.0: object_init failed for (4096, 0x00000001)
[ 24.559604] [drm:qxl_gem_object_create [qxl]] *ERROR* Failed to allocate GEM object (48, 1, 4096, -12)
[ 24.561124] [drm:qxl_alloc_ioctl [qxl]] *ERROR* qxl_alloc_ioctl: failed to create gem ret=-12
[ 24.832000] [TTM] Buffer eviction failed
[ 24.832011] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
[ 24.832029] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 24.832170] [TTM] Buffer eviction failed
[ 24.832174] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
[ 24.832178] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 24.832226] [TTM] Buffer eviction failed
[ 24.832229] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
[ 24.832231] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 24.832400] [TTM] Buffer eviction failed
[ 24.832404] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
[ 24.832407] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 24.834433] [TTM] Buffer eviction failed
[ 24.834437] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
[ 24.834440] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 25.302729] [TTM] Out of kernel memory
[ 25.303803] qxl 0000:00:02.0: object_init failed for (4096, 0x00000001)
[ 25.305162] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 25.355703] [TTM] Out of kernel memory
[ 25.356727] qxl 0000:00:02.0: object_init failed for (4096, 0x00000001)
[ 25.358251] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 25.419575] [TTM] Out of kernel memory
[ 25.421420] qxl 0000:00:02.0: object_init failed for (4096, 0x00000001)
[ 25.422819] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 25.438340] [TTM] Out of kernel memory
[ 25.439412] qxl 0000:00:02.0: object_init failed for (4096, 0x00000001)
[ 25.441711] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[ 25.445922] [TTM] Out of kernel memory
[ 25.446973] qxl 0000:00:02.0: object_init failed for (16384, 0x00000001)
[ 25.448748] [drm:qxl_gem_object_create [qxl]] *ERROR* Failed to allocate GEM object (16384, 1, 4096, -12)
[ 25.450842] [drm:qxl_alloc_ioctl [qxl]] *ERROR* qxl_alloc_ioctl: failed to create gem ret=-12
[ 25.452379] [TTM] Out of kernel memory
[ 25.453410] qxl 0000:00:02.0: object_init failed for (16384, 0x00000001)
[ 25.454580] [drm:qxl_gem_object_create [qxl]] *ERROR* Failed to allocate GEM object (16384, 1, 4096, -12)
[ 25.456191] [drm:qxl_alloc_ioctl [qxl]] *ERROR* qxl_alloc_ioctl: failed to create gem ret=-12
[ 25.458033] [TTM] Out of kernel...

Revision history for this message
QuCheng (qucheng) wrote :

I met same panic too, and enabled lockdep in kernel 4.19.67, the log is:

[17672.449102] RIP: 0033:0x7ff5680c4437
[17672.449643] Code: 00 00 90 48 8b 05 59 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 29 aa 0c 00 f7 d8 64 89 01 48
[17672.452380] RSP: 002b:00007ffcf6fa8a38 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[17672.453502] RAX: 0000000000000000 RBX: 00007ffcf6fa8da0 RCX: 00007ff5680c4437
[17672.454558] RDX: 00007ffcf6fa8ab0 RSI: 0000000040106442 RDI: 000000000000000e
[17672.455613] RBP: 00007ffcf6fa8ab0 R08: 000000000000000e R09: 0000000100311000
[17672.456667] R10: 0000000000000001 R11: 0000000000003246 R12: 0000000040106442
[17672.457731] R13: 000000000000000e R14: 0000000000000000 R15: 0000000000000000

[17672.459051] ================================================
[17672.459901] WARNING: lock held when returning to user space!
[17672.460762] 4.19.67 #6 Tainted: G E
[17672.461630] ------------------------------------------------
[17672.462475] Xorg/2857 is leaving the kernel with locks still held!
[17672.463394] 2 locks held by Xorg/2857:
[17672.463960] #0: 0000000092bc956a (reservation_ww_class_acquire){+.+.}, at: qxl_release_reserve_list+0x63/0x150 [qxl]
[17672.465532] #1: 0000000014261f9b (reservation_ww_class_mutex){+.+.}, at: ttm_eu_reserve_buffers+0x3c1/0x5c0 [ttm]

So it should be not release ww lock when exit, after parse, it's corresponding to ww_mutex_lock_slow_interruptible in ttm_eu_reserve_buffers. I review ttm_eu_reserve_buffers and qxl_release_reserve_list in the matser kernel 5.10, the function logic is still same, so I beleive this issue should still exist in kernel tip code.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.