Ubuntu
xorg-server package

Bug #1760450
Comment #9

Comment 9 for bug 1760450

Revision history for this message

Alan Jenkins (aj504) wrote on 2018-05-29: Re: [nvidia] Xorg crashed with signal 7 in _dl_fixup() from _dl_runtime_resolve_xsavec() called from nvidia_drv.so

Uh, if anyone else is affected by this, there's a trivial fix upstream already (and a workaround). Hop to it, Ubuntu. gregkh is looking disappointed at you :-). I checked, and it looks like you didn't apply it to you 4.15 tree. See end for links to the fix etc.

For users: The workaround is to add "scsi_mod.scan=sync" on the kernel command line (i.e. edit /etc/default/grub and run `update-grub`).

Please note

1. AFAICT this is near-universal.
   It affects all desktop users of kernel 4.15/4.16 who use suspend
   (and whose workloads use all their RAM).
   It could be avoided by not using SCSI, but it does affect all systems with root on SATA.

2. Although this is horrible when it happens (X crash) and can happen on a near-daily basis,
   it can be quite difficult for users to analyze and report. For example, the crash doesn't
   have one specific backtrace in Xorg. It tends to generate several different backtraces,
   non-deterministicly. Sometimes, making a coredump fails, presumably due to the same bug
   that causes the crash.

I remember that Sosha had to make two attempts at reporting this bug
(though I don't remember what was wrong with the first one).

   Also, it's triggered by a medium-term SIGALRM timer in Xorg.
   This made it really annoying to reproduce, at the time when I didn't know the root cause.
   I was able to reproduce the memory pressure needed, but it didn't happen
   when testing suspend+resume... only when I broke for lunch and left the machine
   suspended for long enough :).

Fix: "block: do not use interruptible wait anywhere"

in kernel 4.17: https://github.com/torvalds/linux/commit/1dc3039bc87ae7d19a990c3ee71cfd8a9068f428

in kernel 4.16.8: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.16.y&id=7859056bc73dea2c3714b00c83b253d4c22bf7b6

lack of fix in 4.15.0-23.25 (ubuntu bionic): https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/block/blk-core.c?id=Ubuntu-4.15.0-23.25#n856

Uh, if anyone else is affected by this, there's a trivial fix upstream already (and a workaround).  Hop to it, Ubuntu.  gregkh is looking disappointed at you :-).  I checked, and it looks like you didn't apply it to you 4.15 tree.  See end for links to the fix etc.

For users: The workaround is to add "scsi_mod.scan=sync" on the kernel command line (i.e. edit /etc/default/grub and run `update-grub`).

Please note

2. Although this is horrible when it happens (X crash) and can happen on a near-daily basis,
   it can be quite difficult for users to analyze and report.  For example, the crash doesn't
   have one specific backtrace in Xorg.   It tends to generate several different backtraces,
   non-deterministicly.  Sometimes, making a coredump fails, presumably due to the same bug
   that causes the crash.

I remember that Sosha had to make two attempts at reporting this bug
   (though I don't remember what was wrong with the first one).

Fix: "block: do not use interruptible wait anywhere"

in kernel 4.17: https://github.com/torvalds/linux/commit/1dc3039bc87ae7d19a990c3ee71cfd8a9068f428

in kernel 4.16.8: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.16.y&id=7859056bc73dea2c3714b00c83b253d4c22bf7b6

lack of fix in 4.15.0-23.25 (ubuntu bionic): https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/block/blk-core.c?id=Ubuntu-4.15.0-23.25#n856

Ubuntuxorg-server package

Comment 9 for bug 1760450

Ubuntu
xorg-server package