Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after suspend

Bug #1776887 reported by Alan Jenkins
92
This bug affects 17 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned
Bionic
Fix Committed
High
Unassigned

Bug Description

== SRU Justification ==

This upstream bug has been confirmed to affect Ubuntu users[1]. As per the fix commit (below), the most frequent symptom is a crash of Xorg/Xwayland, i.e. killing the entire GUI, when a laptop is woken from system sleep. Frequency of the bug is described as once every few days[2].

[1] E.g. this user confirms the bug & very specific workaround: https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1760450/comments/11
[2] E.g. this log of crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1553979#c23

This is a bug in blk-core.c. It is not specific to any one hardware driver. Technically the suspend bug is triggered by the SCSI core - which is used by *all SATA devices*.

The commit also includes a test which quickly and reliably proves the existence of a horrifying bug.

I guess you might avoid this bug only if you have root on NVMe. The other way to not hit the Xorg crash is if you don't use all your RAM, so there's no pressure that leads to cold pages of Xorg being swapped. Also, you won't reproduce the Xorg crash if you suspend+resume immediately. (This frustrated my tests at one point, it only triggered after left the system suspended over lunch :).

Fix: "block: do not use interruptible wait anywhere"

in kernel 4.17: https://github.com/torvalds/linux/commit/1dc3039bc87ae7d19a990c3ee71cfd8a9068f428

in kernel 4.16.8: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.16.y&id=7859056bc73dea2c3714b00c83b253d4c22bf7b6

lack of fix in 4.15.0-24.26 (ubuntu 18.04): https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/block/blk-core.c?id=Ubuntu-4.15.0-24.26#n856

I.e., this bug is still present in Ubuntu source package linux-4.15.0-24.26 (and 4.15.0-23.25). I attach hardware details (lspci-vnvn.log) of a system where this bug is known to happen.

Regards
Alan

WORKAROUND: Use kernel parameter:
scsi_mod.scan=sync

== Fix ==
1dc3039bc87a ("block: do not use interruptible wait anywhere")

== Regression Potential ==
Low. This patch has been sent to stable, so it has had additional
upstream review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

Revision history for this message
Alan Jenkins (aj504) wrote :
Revision history for this message
Alan Jenkins (aj504) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1776887

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
Alan Jenkins (aj504) wrote :

Sorry, it's not convenient for me to test Ubuntu at the moment.

I abuse the above instructions to assert that this bug is confirmed, citing the URLs provided. (1: The patch+description linked for kernel 4.17, 2: the lack of fix evidenced in the link for kernel 4.15.0-24.26).

I appeal to authority based on me being the author of the fix, which was merged to the Linux kernel :).

Furthermore, I do so on behalf of two Ubuntu users active on Ubuntu bug linked above.[1] In the first instance, I analysed the crash dump and explained the very distinct signature which it shows, indicating this bug. The second user confirmed that they suffered this bug and used a very specific workaround to avoid it.

[1] https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1760450/

(The Ubuntu bug is marked as affecting 7 people overall. I am certain this is an understatement. I mentioned in my comment in the bug, how the nature of the crashes made them hard for users to identify. One part is the same as the experience we had in Fedora. Automatically reported crashes were not reliably detected as duplicates, because the fatal SIGBUS signal can happen at a number of different points).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: patch
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: New → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1776887

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
penalvch (penalvch)
tags: added: cherry-pick kernel-fixed-upstream regression-release reverse-bisect-done
description: updated
Revision history for this message
Alan Jenkins (aj504) wrote :

Awesome. Thanks.

I can't test suspend in a VM, and I don't have a physical Ubuntu install.

The block developers arranged a more convenient test for the underlying issue in https://github.com/osandov/blktests/

Before (i.e. linux-image-4.15.0-23-generic):

$ sudo ./check block/016
block/016 (send a signal to a process waiting on a frozen queue)
block/016 (send a signal to a process waiting on a frozen queue) [failed]
    runtime ... 8.112s
    --- tests/block/016.out 2018-06-15 10:05:09.080764871 +0100
    +++ results/nodev/block/016.out.bad 2018-06-15 10:05:27.236769759 +0100
    @@ -1,2 +1,3 @@
     Running block/016
    +dd: error reading '/dev/nullb0': Input/output error
     Test complete

After (i.e. 4.15.0-23.26~lp1776887):

$ sudo ./check block/016
block/016 (send a signal to a process waiting on a frozen queue) [passed]
    runtime 8.112s ... 10.086s
$

I don't see any problem with the patched VM, though obviously it would be nice to have someone test both normal suspend and the pm_test reproducer on a physical install.

Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

I mistakenly filed this exact same bug against xorg-server [1], until Alan Jenkins made a comment [2] about this one. I'm surprised that not more people have reported it or registered as affected users.

[1] https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1775593
[2] https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1775593/comments/10

Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

I'd like to test the patched kernel but I've stumbled across this error when I try to install the linux-image-unsigned package and I haven't found out how to solve it:

paulo:~/Downloads$ sudo dpkg -i linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26_lp1776887_amd64.deb
dpkg: regarding linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26_lp1776887_amd64.deb containing linux-image-unsigned-4.15.0-23-generic:
 linux-image-unsigned-4.15.0-23-generic conflicts with linux-image-4.15.0-23-generic
  linux-image-4.15.0-23-generic (version 4.15.0-23.25) is present and installed.

dpkg: error processing archive linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26_lp1776887_amd64.deb (--install):
 conflicting packages - not installing linux-image-unsigned-4.15.0-23-generic
Errors were encountered while processing:
 linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26_lp1776887_amd64.deb

I suspect I'm missing something very basic.

My current kernel image is: 4.15.0-23.25

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did you install the linux-modules, linux-modules-extra packages first?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

Sorry for the delay, @jsalisbury, I hadn't subscribed myself to this bug's emails. Yes, I installed all linux-modules* and linux-headers* packages first.

Revision history for this message
Maxim Loparev (laplandersan) wrote :

Also affected by this bug on Xenial with linux-generic-hwe-16.04 4.15.0.24.46

Revision history for this message
Rahul (rsidd120) wrote :

I'm affected by this bug. I tried the patched kernel posted by jsalisbury above (including modules and headers packages), but it boots in low-res mode (640x480 -- should be 1920x1080), and glxinfo shows the vendor as VMware and device as llvmpipe (should be Intel and Mesa). Is any other information required?

Revision history for this message
Rahul (rsidd120) wrote :

Forgot to say, this is Xubuntu 18.04. Previous kernel was 4.15.0-23.25. I am trying 4.15.0-24.26 from the repo now.

Revision history for this message
Alan Jenkins (aj504) wrote :

Is there an estimate for when users will receive the SRU? (Which was submitted on 2018-06-29). So far I've seen 8 separate questions on askubuntu.com which match this bug. Poke :-).

Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Alan Jenkins (aj504) wrote :

I don't see it! The changelog doesn't include this fix

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776887

the fix never reached the GIT tree to start with, unless it was force-pushed away when I wasn't looking? The interruptible wait remains present in blk-core.c.

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/block/blk-core.c?id=Ubuntu-4.15.0-29.31#n856

Revision history for this message
Alan Jenkins (aj504) wrote :

Also, I can confirm Maxim Loparev's comment #12. The current -hwe kernels in Ubuntu 16.04 "Xenial" are also missing this critical kernel fix. Please can you either update the list of affected Ubuntu versions, or let us know if the bug needs reporting separately?

Link to HEAD source: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/tree/block/blk-core.c?h=hwe#n856

Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

I've just upgraded to 4.15.0.29.31 from proposed, and I can confirm Alan Jenkins' comment #17: the bug is still present:

$ uname -r
4.15.0-29-generic

$ sudo ./check block/016
block/016 (send a signal to a process waiting on a frozen queue) [failed]
    runtime ... 8,390s
    --- tests/block/016.out 2018-07-18 10:04:54.136492698 -0300
    +++ results/nodev/block/016.out.bad 2018-07-18 10:22:55.171574748 -0300
    @@ -1,2 +1,3 @@
     Running block/016
    +dd: error reading '/dev/nullb0': Input/output error
     Test complete

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hi all,

The fix for this issue has been sent to our kernel mailing-list for review but has not yet been applied to the bionic git tree. The automated message regarding this bug being fixed was caused by another commit that was wrongly tagged with the same Launchpad bug number. So the fix is *not yet* available in any kernel build.

We apologize for the misunderstanding.

Changed in linux (Ubuntu Bionic):
status: Fix Committed → In Progress
tags: removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.1 KiB)

This bug was fixed in the package linux - 4.15.0-29.31

---------------
linux (4.15.0-29.31) bionic; urgency=medium

  * linux: 4.15.0-29.31 -proposed tracker (LP: #1782173)

  * [SRU Bionic][Cosmic] kernel panic in ipmi_ssif at msg_done_handler
    (LP: #1777716)
    - ipmi_ssif: Fix kernel panic at msg_done_handler

  * Update to ocxl driver for 18.04.1 (LP: #1775786)
    - misc: ocxl: use put_device() instead of device_unregister()
    - powerpc: Add TIDR CPU feature for POWER9
    - powerpc: Use TIDR CPU feature to control TIDR allocation
    - powerpc: use task_pid_nr() for TID allocation
    - ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
    - ocxl: Expose the thread_id needed for wait on POWER9
    - ocxl: Add an IOCTL so userspace knows what OCXL features are available
    - ocxl: Document new OCXL IOCTLs
    - ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()

  * Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after
    suspend (LP: #1776887)
    - ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL

  * Hard LOCKUP observed on stressing Ubuntu 18 04 (LP: #1777194)
    - powerpc: use NMI IPI for smp_send_stop
    - powerpc: Fix smp_send_stop NMI IPI handling

  * IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
    CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n (LP: #1773964)
    - rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

  * [Regression] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383:
    comm stress-ng: bg 4705: bad block bitmap checksum (LP: #1781709)
    - SAUCE: Revert "UBUNTU: SAUCE: ext4: fix ext4_validate_inode_bitmap: comm
      stress-ng: Corrupt inode bitmap"
    - SAUCE: ext4: check for allocation block validity with block group locked

linux (4.15.0-28.30) bionic; urgency=medium

  * linux: 4.15.0-28.30 -proposed tracker (LP: #1781433)

  * Cannot set MTU higher than 1500 in Xen instance (LP: #1781413)
    - xen-netfront: Fix mismatched rtnl_unlock
    - xen-netfront: Update features after registering netdev

linux (4.15.0-27.29) bionic; urgency=medium

  * linux: 4.15.0-27.29 -proposed tracker (LP: #1781062)

  * [Regression] EXT4-fs error (device sda1): ext4_validate_inode_bitmap:99:
    comm stress-ng: Corrupt inode bitmap (LP: #1780137)
    - SAUCE: ext4: fix ext4_validate_inode_bitmap: comm stress-ng: Corrupt inode
      bitmap

linux (4.15.0-26.28) bionic; urgency=medium

  * linux: 4.15.0-26.28 -proposed tracker (LP: #1780112)

  * failure to boot with linux-image-4.15.0-24-generic (LP: #1779827) // Cloud-
    init causes potentially huge boot delays with 4.15 kernels (LP: #1780062)
    - random: Make getrandom() ready earlier

linux (4.15.0-25.27) bionic; urgency=medium

  * linux: 4.15.0-25.27 -proposed tracker (LP: #1779354)

  * hisi_sas_v3_hw: internal task abort: timeout and not done. (LP: #1777736)
    - scsi: hisi_sas: Update a couple of register settings for v3 hw

  * hisi_sas: Add missing PHY spinlock init (LP: #1777734)
    - scsi: hisi_sas: Add missing PHY spinlock init

  * hisi_sas: improve read performance by pre-allocating slot DMA buffers
    (LP: #1777727)
    - scsi: hisi_sas: use dma_zalloc_cohe...

Read more...

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Released
Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

I had installed 4.15.0-29.31 from proposed, I suppose the packages had been updated, so reinstalled everything:

Start-Date: 2018-07-20 14:30:02
Commandline: apt install --reinstall linux-generic linux-headers-4.15.0-29 linux-headers-4.15.0-29-generic linux-headers-generic linux-image-4.15.0-29-generic linux-image-generic linux-libc-dev linux-modules-4.15.0-29-generic linux-modules-extra-4.15.0-29-generic linux-tools-4.15.0-29 linux-tools-4.15.0-29-generic linux-tools-common linux-tools-virtual
Requested-By: paulo (1000)
Reinstall: linux-headers-generic:amd64 (4.15.0.29.31), linux-image-4.15.0-29-generic:amd64 (4.15.0-29.31), linux-image-generic:amd64 (4.15.0.29.31), linux-headers-4.15.0-29:amd64 (4.15.0-29.31), linux-tools-4.15.0-29:amd64 (4.15.0-29.31), linux-modules-extra-4.15.0-29-generic:amd64 (4.15.0-29.31), linux-tools-4.15.0-29-generic:amd64 (4.15.0-29.31), linux-modules-4.15.0-29-generic:amd64 (4.15.0-29.31), linux-headers-4.15.0-29-generic:amd64 (4.15.0-29.31), linux-tools-virtual:amd64 (4.15.0.29.31), linux-generic:amd64 (4.15.0.29.31)
Upgrade: linux-libc-dev:amd64 (4.15.0-24.26, 4.15.0-29.31), linux-tools-common:amd64 (4.15.0-24.26, 4.15.0-29.31)
End-Date: 2018-07-20 14:31:58

and rebooted:

paulo:~$ uname -r
4.15.0-29-generic

but the patch still hasn't been included:

paulo:~/src/blktests (master)$ sudo ./check block/016
block/016 (send a signal to a process waiting on a frozen queue) [failed]
    runtime 8,390s ... 8,210s
    --- tests/block/016.out 2018-07-18 10:04:54.136492698 -0300
    +++ results/nodev/block/016.out.bad 2018-07-20 14:41:04.159809741 -0300
    @@ -1,2 +1,3 @@
     Running block/016
    +dd: error reading '/dev/nullb0': Input/output error
     Test complete

The changelog mentions a patch that has no relation to this bug:

  * Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after
    suspend (LP: #1776887)
    - ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL

Revision history for this message
penalvch (penalvch) wrote :

Reverting Fix Released given:
1) diff contained no blk-core changes -> https://launchpad.net/ubuntu/+source/linux/4.15.0-29.31
2) Kleber Sacilotto de Souza explicitly mentioned the updates were not in -Proposed -> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776887/comments/20

Changed in linux (Ubuntu Bionic):
status: Fix Released → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.1 KiB)

This bug was fixed in the package linux - 4.15.0-29.31

---------------
linux (4.15.0-29.31) bionic; urgency=medium

  * linux: 4.15.0-29.31 -proposed tracker (LP: #1782173)

  * [SRU Bionic][Cosmic] kernel panic in ipmi_ssif at msg_done_handler
    (LP: #1777716)
    - ipmi_ssif: Fix kernel panic at msg_done_handler

  * Update to ocxl driver for 18.04.1 (LP: #1775786)
    - misc: ocxl: use put_device() instead of device_unregister()
    - powerpc: Add TIDR CPU feature for POWER9
    - powerpc: Use TIDR CPU feature to control TIDR allocation
    - powerpc: use task_pid_nr() for TID allocation
    - ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
    - ocxl: Expose the thread_id needed for wait on POWER9
    - ocxl: Add an IOCTL so userspace knows what OCXL features are available
    - ocxl: Document new OCXL IOCTLs
    - ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()

  * Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after
    suspend (LP: #1776887)
    - ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL

  * Hard LOCKUP observed on stressing Ubuntu 18 04 (LP: #1777194)
    - powerpc: use NMI IPI for smp_send_stop
    - powerpc: Fix smp_send_stop NMI IPI handling

  * IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
    CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n (LP: #1773964)
    - rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

  * [Regression] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383:
    comm stress-ng: bg 4705: bad block bitmap checksum (LP: #1781709)
    - SAUCE: Revert "UBUNTU: SAUCE: ext4: fix ext4_validate_inode_bitmap: comm
      stress-ng: Corrupt inode bitmap"
    - SAUCE: ext4: check for allocation block validity with block group locked

linux (4.15.0-28.30) bionic; urgency=medium

  * linux: 4.15.0-28.30 -proposed tracker (LP: #1781433)

  * Cannot set MTU higher than 1500 in Xen instance (LP: #1781413)
    - xen-netfront: Fix mismatched rtnl_unlock
    - xen-netfront: Update features after registering netdev

linux (4.15.0-27.29) bionic; urgency=medium

  * linux: 4.15.0-27.29 -proposed tracker (LP: #1781062)

  * [Regression] EXT4-fs error (device sda1): ext4_validate_inode_bitmap:99:
    comm stress-ng: Corrupt inode bitmap (LP: #1780137)
    - SAUCE: ext4: fix ext4_validate_inode_bitmap: comm stress-ng: Corrupt inode
      bitmap

linux (4.15.0-26.28) bionic; urgency=medium

  * linux: 4.15.0-26.28 -proposed tracker (LP: #1780112)

  * failure to boot with linux-image-4.15.0-24-generic (LP: #1779827) // Cloud-
    init causes potentially huge boot delays with 4.15 kernels (LP: #1780062)
    - random: Make getrandom() ready earlier

linux (4.15.0-25.27) bionic; urgency=medium

  * linux: 4.15.0-25.27 -proposed tracker (LP: #1779354)

  * hisi_sas_v3_hw: internal task abort: timeout and not done. (LP: #1777736)
    - scsi: hisi_sas: Update a couple of register settings for v3 hw

  * hisi_sas: Add missing PHY spinlock init (LP: #1777734)
    - scsi: hisi_sas: Add missing PHY spinlock init

  * hisi_sas: improve read performance by pre-allocating slot DMA buffers
    (LP: #1777727)
    - scsi: hisi_sas: use dma_zalloc_cohe...

Read more...

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
penalvch (penalvch) wrote :

Reverting "Fix Released" as requested commit not in diff:
https://launchpad.net/ubuntu/+source/linux/4.15.0-29.31

Changed in linux (Ubuntu):
status: Fix Released → In Progress
Revision history for this message
Rahul (rsidd120) wrote :

Any progress on this? The bug is still biting me.

Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

4.15.0-30.32 has just been released, and the patch still hasn't been included.

Revision history for this message
Derek L (ddl-lp) wrote :
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

I've just booted into 4.15.0-31-generic that hit proposed, and it seems that the patch was included. From the changelog:

block: do not use interruptible wait anywhere

Just to make sure:

paulo:~/src/blktests (master)$ sudo ./check block/016
block/016 (send a signal to a process waiting on a frozen queue) [passed]
    runtime 8,210s ... 10,261s

Revision history for this message
Jason (linuxguy39) wrote :

Can anyone confirm weather this is patched in the new kernel out today...4.15.0-32-generic?

Thanks

Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

It was already patched in the previous kernel, 4.15.0-31-generic.

Revision history for this message
Paulo Marcel Coelho Aragão (marcelpaulo) wrote :

I left my laptop suspended for a couple of hours while I went out on some errands, and X crashed when I tried to resume it.

I'm running:

paulo:~$ uname -r
4.15.0-32-generic

Even though it passed the blktests test, the bug is still biting :-(

A crash report was generated, where should I sent it to?

Luca Capasso (ubuluca)
Changed in linux (Ubuntu):
status: Fix Committed → Confirmed
penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Julien Olivier (julo) wrote :

This bug is still very much alive in Ubuntu 19.10...

Revision history for this message
John Shakespeare (shakespeare1) wrote :

This bug affects me. Severely. It's more than once per day.
shakespeare@RYZEN:~$ uname -r
5.0.0-33-generic

Revision history for this message
Alan Jenkins (aj504) wrote :

That's not enough information. At this point you should report a new bug, please.

Notice the linked bug #1760450 includes a crash trace. (Not saying for sure you'd be able to do the same). The fatal signal is SIGBUS, as opposed to the usual SIGSEGV or abort. If you've got a SIGSEGV or an abort instead, you don't have this bug.

If you do, you can link to this bug for the historical reminder. (The analysis of these crash traces is a bit unusual).

I'd be pretty surprised if there was a regression on this. Axboe demanded a regression test for blktests, so it can't be exactly the same. And the cause of the bug was something pretty silly during the transition to blk-mq... I don't know why it would get introduced elsewhere.

Revision history for this message
John Shakespeare (shakespeare1) wrote :

I'd like to report a bug:
Ubuntu 18.04 (bionic), Gnome 3.28.2
The frequency is typically more than once per day-

shakespeare@RYZEN:~$ uname -r
5.0.0-33-generic

I am using a 4K ROG-1060 display with NVIDIA driver metapackage from nvidia-driver-435 (proprietary, tested)
AMD Ryzen 7 1700X Eight-Core Processor with 16 CPUs at 1879.982 MHz family(23) model(1) stepping(1)
I have 32GB of physical RAM (94% free) with swap 15GB (100% free).
NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019

Revision history for this message
Alan Jenkins (aj504) wrote :

I mean, use the link "Report a bug" at the top-right of this page. Not add comments to this bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.