Uh, if anyone else is affected by this, there's a trivial fix upstream already (and a workaround). Hop to it, Ubuntu. gregkh is looking disappointed at you :-). I checked, and it looks like you didn't apply it to you 4.15 tree. See end for links to the fix etc.
For users: The workaround is to add "scsi_mod.scan=sync" on the kernel command line (i.e. edit /etc/default/grub and run `update-grub`).
Please note
1. AFAICT this is near-universal.
It affects all desktop users of kernel 4.15/4.16 who use suspend
(and whose workloads use all their RAM).
It could be avoided by not using SCSI, but it does affect all systems with root on SATA.
2. Although this is horrible when it happens (X crash) and can happen on a near-daily basis,
it can be quite difficult for users to analyze and report. For example, the crash doesn't
have one specific backtrace in Xorg. It tends to generate several different backtraces,
non-deterministicly. Sometimes, making a coredump fails, presumably due to the same bug
that causes the crash.
I remember that Sosha had to make two attempts at reporting this bug
(though I don't remember what was wrong with the first one).
Also, it's triggered by a medium-term SIGALRM timer in Xorg.
This made it really annoying to reproduce, at the time when I didn't know the root cause.
I was able to reproduce the memory pressure needed, but it didn't happen
when testing suspend+resume... only when I broke for lunch and left the machine
suspended for long enough :).
Fix: "block: do not use interruptible wait anywhere"
Uh, if anyone else is affected by this, there's a trivial fix upstream already (and a workaround). Hop to it, Ubuntu. gregkh is looking disappointed at you :-). I checked, and it looks like you didn't apply it to you 4.15 tree. See end for links to the fix etc.
For users: The workaround is to add "scsi_mod. scan=sync" on the kernel command line (i.e. edit /etc/default/grub and run `update-grub`).
Please note
1. AFAICT this is near-universal.
It affects all desktop users of kernel 4.15/4.16 who use suspend
(and whose workloads use all their RAM).
It could be avoided by not using SCSI, but it does affect all systems with root on SATA.
2. Although this is horrible when it happens (X crash) and can happen on a near-daily basis, deterministicly . Sometimes, making a coredump fails, presumably due to the same bug
it can be quite difficult for users to analyze and report. For example, the crash doesn't
have one specific backtrace in Xorg. It tends to generate several different backtraces,
non-
that causes the crash.
I remember that Sosha had to make two attempts at reporting this bug
(though I don't remember what was wrong with the first one).
Also, it's triggered by a medium-term SIGALRM timer in Xorg.
This made it really annoying to reproduce, at the time when I didn't know the root cause.
I was able to reproduce the memory pressure needed, but it didn't happen
when testing suspend+resume... only when I broke for lunch and left the machine
suspended for long enough :).
Fix: "block: do not use interruptible wait anywhere"
in kernel 4.17: https:/ /github. com/torvalds/ linux/commit/ 1dc3039bc87ae7d 19a990c3ee71cfd 8a9068f428
in kernel 4.16.8: https:/ /git.kernel. org/pub/ scm/linux/ kernel/ git/stable/ linux-stable. git/commit/ ?h=linux- 4.16.y& id=7859056bc73d ea2c3714b00c83b 253d4c22bf7b6
lack of fix in 4.15.0-23.25 (ubuntu bionic): https:/ /git.launchpad. net/~ubuntu- kernel/ ubuntu/ +source/ linux/+ git/bionic/ tree/block/ blk-core. c?id=Ubuntu- 4.15.0- 23.25#n856