[nvidia] Xorg crashed with SIGBUS in _dl_fixup() from _dl_runtime_resolve_xsavec() from create_bits_picture() from image_from_pict_internal() from wfb_image_from_pict()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Fedora |
Unknown
|
Unknown
|
||
| xorg-server (Ubuntu) |
Medium
|
Unassigned |
Bug Description
after login session(suspend mode)
ProblemType: Crash
DistroRelease: Ubuntu 18.04
Package: xserver-xorg-core 2:1.19.6-1ubuntu3
ProcVersionSign
Uname: Linux 4.15.0-13-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.9-0ubuntu2
Architecture: amd64
Date: Sun Apr 1 20:34:29 2018
DistroCodename: bionic
DistroVariant: ubuntu
ExecutablePath: /usr/lib/xorg/Xorg
InstallationDate: Installed on 2017-09-02 (211 days ago)
InstallationMedia: Ubuntu-GNOME 17.04 "Zesty Zapus" - Release amd64 (20170412)
ProcCmdline: /usr/lib/xorg/Xorg vt2 -displayfd 3 -auth /run/user/
ProcEnviron:
Signal: 7
SourcePackage: xorg-server
StacktraceTop:
_dl_fixup (l=0x559c10faa5a0, reloc_arg=
_dl_runtime_
?? () from /usr/lib/
wfbComposite () from /usr/lib/
?? () from /usr/lib/
Title: Xorg crashed with signal 7 in _dl_fixup()
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip libvirt lpadmin plugdev sambashare sudo
Apport retracing service (apport) wrote : | #2 |
Changed in xorg-server (Ubuntu): | |
importance: | Undecided → Medium |
tags: | removed: need-amd64-retrace |
Hi Ubuntu users! Signal 7 is SIGBUS. SIGBUS should be relatively unusual on x86 [1].
[1] https:/
I'm excited to inform you that Fedora Linux users also started seeing the same root problem. It is tied to the upgrade from kernel v4.14 to v4.15.
Fedora bug report:
https:/
Arch Linux independently identified this as caused by the kernel upgrade:
https:/
It can happen after resume from suspend, not every time but maybe once every three days. We have reports for both Xwayland and Xorg getting a fatal SIGBUS in _dl_fixup(). (While this is actually a secondary crash in xorg_backtrace(), we have a load of SIGBUS traces that have the same primary trace as each other).
Notice the specific faulting instruction in disassembly you captured: it is not performing a memory access!
=> 0x559c102a4060 <ErrorFSigSafe>: sub $0xd8,%rsp
Instead, notice that this is the first instruction in the function ErrorFSigSafe. This is a big common factor in our traces. (We actually have several different traces captured, with the failing function varying, often along the same call chain).
What's happening is a fault on the instruction fetch. You should be able to confirm this if you look at the address which generates the fault. (si_addr field of struct siginfo. I don't know where the Ubuntu crash collector saves this information)
The kernel failed to load in the page which holds the program code at this point. That's the real problem: some sort of transient IO error during wakeup. Users sometimes see other symptoms of these IO errors as well:
PM: resume devices took 1.017 seconds
Restarting tasks ...
Read-error on swap-device (253:1:836184)
PM: suspend exit
systemd-
and
PM: suspend exit
EXT4-fs error (device dm-2): ext4_find_
Buffer I/O error on dev dm-2, logical block 0, lost sync page write
WARNING: CPU: 1 PID: 748 at fs/buffer.c:1108 mark_buffer_
(and a kernel backtrace)
Launchpad Janitor (janitor) wrote : | #7 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in xorg-server (Ubuntu): | |
status: | New → Confirmed |
Daniel van Vugt (vanvugt) wrote : Re: [nvidia] Xorg crashed with signal 7 in _dl_fixup() from _dl_runtime_resolve_xsavec() called from nvidia_drv.so | #8 |
I found it hard to follow your log files because of so many errors from extension "<email address hidden>". Maybe try removing your extensions if they're that buggy/noisy.
Also, what version of the nvidia driver are you using?
summary: |
- Xorg crashed with signal 7 in _dl_fixup() + Xorg crashed with signal 7 in _dl_fixup() from + _dl_runtime_resolve_xsavec() |
summary: |
- Xorg crashed with signal 7 in _dl_fixup() from - _dl_runtime_resolve_xsavec() + [nvidia] Xorg crashed with signal 7 in _dl_fixup() from + _dl_runtime_resolve_xsavec() called from nvidia_drv.so |
tags: | added: nvidia |
Alan Jenkins (aj504) wrote : | #9 |
Uh, if anyone else is affected by this, there's a trivial fix upstream already (and a workaround). Hop to it, Ubuntu. gregkh is looking disappointed at you :-). I checked, and it looks like you didn't apply it to you 4.15 tree. See end for links to the fix etc.
For users: The workaround is to add "scsi_mod.
Please note
1. AFAICT this is near-universal.
It affects all desktop users of kernel 4.15/4.16 who use suspend
(and whose workloads use all their RAM).
It could be avoided by not using SCSI, but it does affect all systems with root on SATA.
2. Although this is horrible when it happens (X crash) and can happen on a near-daily basis,
it can be quite difficult for users to analyze and report. For example, the crash doesn't
have one specific backtrace in Xorg. It tends to generate several different backtraces,
non-
that causes the crash.
I remember that Sosha had to make two attempts at reporting this bug
(though I don't remember what was wrong with the first one).
Also, it's triggered by a medium-term SIGALRM timer in Xorg.
This made it really annoying to reproduce, at the time when I didn't know the root cause.
I was able to reproduce the memory pressure needed, but it didn't happen
when testing suspend+resume... only when I broke for lunch and left the machine
suspended for long enough :).
Fix: "block: do not use interruptible wait anywhere"
in kernel 4.17: https:/
in kernel 4.16.8: https:/
lack of fix in 4.15.0-23.25 (ubuntu bionic): https:/
Sosha (soshaw) wrote : | #10 |
@Daniel
i use nvidia-driver-396.
Julien Olivier (julo) wrote : | #11 |
I confirm that adding "scsi_mod.
Alan Jenkins (aj504) wrote : | #12 |
Thanks for your confirmation, Julien. I have asked Ubuntu to import the proper fix from upstream and they responded very promptly. See:
https:/
They have posted a test kernel. I don't have a ubuntu install to test it on - only a VM which cannot suspend. It *might* be useful if someone wants to volunteer to try using the test kernel.
You could test that your normal suspend still works, and test the command mentioned in the commit. I.e.
$ sudo -i
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
On a "bad" kernel, any time you run this command it should show a message about an IO error. On a "good" kernel, the system will appear to suspend and resume, but there should be no IO error.
Vedant Bhatia (vedant19) wrote : | #13 |
Hi. I faced a similar issue and when I tried to add "scsi_mod.
/usr/sbin/
Would appreciate any help, thanks.
Vedant Bhatia (vedant19) wrote : | #14 |
The problem hasn't occurred so far after adding "csi_mod.
Alan Jenkins (aj504) wrote : | #15 |
Hi Vedant. Change the line in /etc/default/grub e.g.
GRUB_CMDLINE_
to
GRUB_CMDLINE_
and then re-run update-grub.
summary: |
- [nvidia] Xorg crashed with signal 7 in _dl_fixup() from - _dl_runtime_resolve_xsavec() called from nvidia_drv.so + [nvidia] Xorg crashed with SIGBUS in _dl_fixup() from + _dl_runtime_resolve_xsavec() from create_bits_picture() from + image_from_pict_internal() from wfb_image_from_pict() |
StacktraceTop: <optimized out>) at ../elf/ dl-runtime. c:84 resolve_ xsavec () at ../sysdeps/ x86_64/ dl-trampoline. h:125 bits_picture (yoff=0x7fff0a6 c9594, xoff=0x7fff0a6c 9590, has_clip=1, pict=0x559c129a 29d0) at ../../. ./../fb/ fbpict. c:325 from_pict_ internal (pict=pict@ entry=0x559c129 a29d0, has_clip= has_clip@ entry=1, xoff=xoff@ entry=0x7fff0a6 c9590, yoff=yoff@ entry=0x7fff0a6 c9594, is_alpha_ map=is_ alpha_map@ entry=0) at ../../. ./../fb/ fbpict. c:457 from_pict (pict=pict@ entry=0x559c129 a29d0, has_clip= has_clip@ entry=1, xoff=xoff@ entry=0x7fff0a6 c9590, yoff=yoff@ entry=0x7fff0a6 c9594) at ../../. ./../fb/ fbpict. c:487
_dl_fixup (l=0x559c10faa5a0, reloc_arg=
_dl_runtime_
create_
image_
wfb_image_