Segfault in libmutter-2.so after suspend/resume using wayland. Core files are always truncated and invalid while the default shell is zsh.

Bug #1772638 reported by Thorsten on 2018-05-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apport (Ubuntu)
Undecided
Unassigned
gnome-shell (Ubuntu)
Medium
Unassigned

Bug Description

Hi,

i have the problem when running the gnome wayland session in ubuntu 18.04 that in ~1 out of 5 resume/suspend cycles a segfault in libmutter-2.so occurs which then kills my current session. Here is my journctl from the crash:

Mai 22 12:39:03 x1 systemd[1]: Reached target Sleep.
Mai 22 12:39:03 x1 systemd[1]: Starting Suspend...
Mai 22 12:39:03 x1 systemd-sleep[2349]: Suspending system...
Mai 22 12:39:03 x1 kernel: PM: suspend entry (s2idle)
Mai 22 12:39:03 x1 kernel: PM: Syncing filesystems ... done.
Mai 22 12:39:03 x1 gnome-shell[1483]: Failed to set CRTC mode 2560x1440: Permission denied
Mai 22 12:39:03 x1 wpa_supplicant[976]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Mai 22 12:39:03 x1 kernel: gnome-shell[1483]: segfault at 20 ip 00007ff704b28b17 sp 00007ffd6c1cd1b0 error 4 in libmutter-2.so.0.0.0[7ff704a3b000+156000]
Mai 22 12:39:03 x1 kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Mai 22 13:11:15 x1 kernel: Freezing user space processes ... (elapsed 0.001 seconds) done.
Mai 22 13:11:15 x1 kernel: OOM killer disabled.
Mai 22 13:11:15 x1 kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Mai 22 13:11:15 x1 kernel: Suspending console(s) (use no_console_suspend to debug)
Mai 22 13:11:15 x1 kernel: thinkpad_acpi: unknown possible thermal alarm or keyboard event received
Mai 22 13:11:15 x1 kernel: thinkpad_acpi: unhandled HKEY event 0x6032
Mai 22 13:11:15 x1 kernel: thinkpad_acpi: please report the conditions when this event happened to <email address hidden>
Mai 22 13:11:15 x1 kernel: [drm] GuC: Loaded firmware i915/kbl_guc_ver9_39.bin (version 9.39)
Mai 22 13:11:15 x1 kernel: i915 0000:00:02.0: GuC firmware version 9.39
Mai 22 13:11:15 x1 kernel: i915 0000:00:02.0: GuC submission enabled
Mai 22 13:11:15 x1 kernel: i915 0000:00:02.0: HuC disabled
Mai 22 13:11:15 x1 kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Mai 22 13:11:15 x1 kernel: OOM killer enabled.
Mai 22 13:11:15 x1 kernel: Restarting tasks ... done.
Mai 22 13:11:15 x1 systemd-logind[986]: Lid opened.
Mai 22 13:11:15 x1 kernel: rfkill: input handler enabled
Mai 22 13:11:15 x1 systemd[1]: Started Run anacron jobs.
Mai 22 13:11:15 x1 anacron[2420]: Anacron 2.3 started on 2018-05-22
Mai 22 13:11:15 x1 anacron[2420]: Normal exit (0 jobs run)
Mai 22 13:11:15 x1 kernel: thermal thermal_zone6: failed to read out thermal zone (-61)
Mai 22 13:11:15 x1 tilix[1980]: Error reading events from display: Broken pipe
Mai 22 13:11:15 x1 update-notifier[1901]: Error reading events from display: Broken pipe
Mai 22 13:11:15 x1 systemd-sleep[2349]: System resumed.
Mai 22 13:11:15 x1 kernel: PM: suspend exit
Mai 22 13:11:15 x1 org.gnome.Shell.desktop[1483]: (EE)
Mai 22 13:11:15 x1 org.gnome.Shell.desktop[1483]: Fatal server error:
Mai 22 13:11:15 x1 org.gnome.Shell.desktop[1483]: (EE) failed to read Wayland events: Broken pipe
Mai 22 13:11:15 x1 org.gnome.Shell.desktop[1483]: (EE)
Mai 22 13:11:15 x1 systemd[1]: Started Suspend.
Mai 22 13:11:15 x1 systemd[1]: sleep.target: Unit not needed anymore. Stopping.
Mai 22 13:11:15 x1 systemd[1]: Stopped target Sleep.
Mai 22 13:11:15 x1 systemd[1]: Reached target Suspend.
Mai 22 13:11:15 x1 systemd[1]: suspend.target: Unit not needed anymore. Stopping.
Mai 22 13:11:15 x1 systemd[1]: Stopped target Suspend.
Mai 22 13:11:15 x1 gnome-session[1455]: gnome-session-binary[1455]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11
Mai 22 13:11:15 x1 gnome-session-binary[1455]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11
Mai 22 13:11:15 x1 gnome-session-binary[1455]: Unrecoverable failure in required component org.gnome.Shell.desktop
Mai 22 13:11:15 x1 kdeconnectd.desktop[1725]: ICE default IO error handler doing an exit(), pid = 1725, errno = 11
Mai 22 13:11:15 x1 org.gnome.SettingsDaemon.Power.desktop[1611]: xcb_connection_has_error() returned true
Mai 22 13:11:15 x1 firefox.desktop[2050]: xcb_connection_has_error() returned true
Mai 22 13:11:15 x1 pulseaudio[2430]: [pulseaudio] client-conf-x11.c: xcb_connection_has_error() returned true
Mai 22 13:11:15 x1 polkitd(authority=local)[1059]: Unregistered Authentication Agent for unix-session:2 (system bus name :1.74, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Mai 22 13:11:15 x1 rtkit-daemon[1171]: Successfully made thread 2434 of process 2434 (n/a) owned by '1000' high priority at nice level -11.
Mai 22 13:11:15 x1 rtkit-daemon[1171]: Supervising 2 threads of 2 processes of 1 users.
Mai 22 13:11:15 x1 pulseaudio[2434]: [pulseaudio] pid.c: Stale PID file, overwriting.
Mai 22 13:11:15 x1 gdm-password][1430]: pam_unix(gdm-password:session): session closed for user thorsten
Mai 22 13:11:15 x1 gsd-color[1217]: failed to connect to device: Failed to connect to missing device /org/freedesktop/ColorManager/devices/xrandr_AU_Optronics_thorsten_1000
Mai 22 13:11:15 x1 pulseaudio[2434]: [pulseaudio] sink.c: Default and alternate sample rates are the same.
... followup crashes...

Moreover, here is the apport.log:
ERROR: apport (pid 2357) Tue May 22 12:39:03 2018: called for pid 1483, signal 11, core limit 0, dump mode 1
ERROR: apport (pid 2357) Tue May 22 12:39:03 2018: executable: /usr/bin/gnome-shell (command line "/usr/bin/gnome-shell")
ERROR: apport (pid 2357) Tue May 22 12:39:03 2018: debug: session gdbus call: (true,)

ERROR: apport (pid 2357) Tue May 22 13:11:15 2018: wrote report /var/crash/_usr_bin_gnome-shell.1000.crash
ERROR: apport (pid 2429) Tue May 22 13:11:15 2018: called for pid 1502, signal 6, core limit 18446744073709551615, dump mode 1
ERROR: apport (pid 2429) Tue May 22 13:11:15 2018: ignoring implausibly big core limit, treating as unlimited
ERROR: apport (pid 2429) Tue May 22 13:11:15 2018: executable: /usr/bin/Xwayland (command line "/usr/bin/Xwayland :0 -rootless -terminate -accessx -core -listen 4 -listen 5 -displayfd 6")
ERROR: apport (pid 2429) Tue May 22 13:11:15 2018: gdbus call error: Error: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files

ERROR: apport (pid 2429) Tue May 22 13:11:15 2018: debug: session gdbus call:
ERROR: apport (pid 2429) Tue May 22 13:11:19 2018: wrote report /var/crash/_usr_bin_Xwayland.1000.crash
ERROR: apport (pid 2429) Tue May 22 13:11:19 2018: writing core dump to /home/thorsten/core (limit: -1)
ERROR: apport (pid 2429) Tue May 22 13:11:20 2018: writing core dump /home/thorsten/core of size 107855872

Additionally, i have ~100MB "core" file in my home directory.

I tried to report the created _usr_bin_gnome-shell.1000.crash file using the workaround from https://bugs.launchpad.net/ubuntu/+source/apport/+bug/994921 and ubuntu-bug. However, i get the error: "The problem cannot be reported: Invalid core dump: BFD: warning /tmp/apport_core_ersfx7ym is truncated: expected core file size >= 846290944, found: 216288"

reporting the corresponding _usr_bin_Xwayland.1000.crash file forwards me to https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1731911. But this only seems to indicate that Xwayland crashes since mutter has crashed.

This error only occurs when using the wayland and not in the xorg session.

I am running 4.16.10 mainline kernel due to some driver issues with my laptop (thinkpad x1 carbon gen6) but this also occurs when using the 4.15 kernel and with and without loading the guc firmware.

Thanks!
---
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
CurrentDesktop: GNOME
DisplayManager: gdm3
DistroRelease: Ubuntu 18.04
Package: gnome-shell 3.28.1-0ubuntu2
PackageArchitecture: amd64
Tags: wayland-session bionic
Uname: Linux 4.16.10-041610-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True

Daniel van Vugt (vanvugt) wrote :

Sorry but we can't diagnose a crash without a crash report, as outlined in the standard response:
https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

If you do not find success from following those steps then the next step I suggest is to look up automatic crash reports your system has made:

  https://errors.ubuntu.com/user/ID
  where ID is the contents of /var/lib/whoopsie/whoopsie-id

If you find any relevant links there then please share them here.

As a last resort you can also:

  1. Change this bug to Private.
  2. Attach the .crash file to this bug.

but chances we can debug it that way are usually low.

Please also run 'apport-collect 1772638' to provide us with more system information.

Changed in gnome-shell (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected bionic wayland-session
description: updated

apport information

apport information

apport information

Hi, thanks for the answer. My problem is always something like "The problem cannot be reported: Invalid core dump: BFD: warning /tmp/apport_core_ersfx7ym is truncated: expected core file size >= 846290944, found: 216288" (the size mentioned after found differs.)

I found my crash to be reported at https://errors.ubuntu.com/user/ID:

from a crash which just happened: https://errors.ubuntu.com/oops/fec4e512-5e60-11e8-90f5-fa163e8d4bab

and from yesterday: https://errors.ubuntu.com/oops/bf429442-5db2-11e8-8a90-fa163eec78fa

I did run apport-collect 1772638 and send my system information.

Daniel van Vugt (vanvugt) wrote :

Unfortunately we can't do anything with such invalid core files. What we can do is try to make them valid in future...

Please run 'ulimit -a' and send us the output.

And please do the same for 'cat /proc/sys/kernel/core_pipe_limit'.

Is your system low on disk space at all? (run 'df -h')

Daniel van Vugt (vanvugt) wrote :

I notice you're using zsh... I vaguely recall other people having trouble with crash reports related to that. Maybe try resetting your default shell to /bin/bash

Thorsten (thorstenr-42) wrote :

Hi,

i switched my login shell back to bash (chsh -s /bin/bash) and i was not able to reproduce the issue in ~30 resume/suspend cycles. This is still no valid statistic but is it possible that zsh as login shell can segfault mutter? Of course, I will report if it occurs!

ubuntu-bug still has the problem with the old crash file, probably since it wrongly generated or?

Thanks and sry for this non-default setup.

Here is the additional system information:

ulimit -a
    core file size (blocks, -c) 0
    data seg size (kbytes, -d) unlimited
    scheduling priority (-e) 0
    file size (blocks, -f) unlimited
    pending signals (-i) 62722
    max locked memory (kbytes, -l) 16384
    max memory size (kbytes, -m) unlimited
    open files (-n) 1024
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    real-time priority (-r) 0
    stack size (kbytes, -s) 8192
    cpu time (seconds, -t) unlimited
    max user processes (-u) 62722
    virtual memory (kbytes, -v) unlimited
    file locks (-x) unlimited

cat /proc/sys/kernel/core_pipe_limit
    0

should be enough space on /: df -h
    Filesystem Size Used Avail Use% Mounted on
    udev 7,7G 0 7,7G 0% /dev
    tmpfs 1,6G 10M 1,6G 1% /run
    /dev/nvme0n1p6 246G 39G 195G 17% /
    tmpfs 7,8G 41M 7,7G 1% /dev/shm
    tmpfs 5,0M 4,0K 5,0M 1% /run/lock
    tmpfs 7,8G 0 7,8G 0% /sys/fs/cgroup
    /dev/loop0 3,8M 3,8M 0 100% /snap/gnome-system-monitor/39
    /dev/loop3 140M 140M 0 100% /snap/gnome-3-26-1604/64
    /dev/loop4 141M 141M 0 100% /snap/gnome-3-26-1604/59
    /dev/loop2 3,4M 3,4M 0 100% /snap/gnome-system-monitor/36
    /dev/loop1 21M 21M 0 100% /snap/gnome-logs/25
    /dev/loop5 1,7M 1,7M 0 100% /snap/gnome-calculator/154
    /dev/loop6 87M 87M 0 100% /snap/core/4486
    /dev/loop7 2,4M 2,4M 0 100% /snap/gnome-calculator/167
    /dev/loop8 87M 87M 0 100% /snap/core/4571
    /dev/loop9 22M 22M 0 100% /snap/gnome-logs/31
    /dev/loop11 13M 13M 0 100% /snap/gnome-characters/86
    /dev/loop10 13M 13M 0 100% /snap/gnome-characters/69
    /dev/loop12 87M 87M 0 100% /snap/core/4650
    /dev/loop13 141M 141M 0 100% /snap/gnome-3-26-1604/62
    /dev/nvme0n1p2 96M 29M 68M 30% /boot/efi
    /dev/nvme0n1p7 473G 236G 238G 50% /media/thorsten/daten
    tmpfs 1,6G 20K 1,6G 1% /run/user/119
    tmpfs 1,6G 7,1M 1,6G 1% /run/user/1000

Thorsten (thorstenr-42) wrote :

okay its definitely associated with zsh as login shell. I now have >50 successful resume/suspends.

So to summarize:
- mutter crashes in 1 out of 10 suspend/resumes when zsh is set as login shell
- however, i cannot use apport to get a crash log due to zsh

-> i cannot get both apport (requires bash as login shell) and reproduce the segfault (requires zsh as login shell)

Is there a way to still debug the problem? I am still willing to help! From my end it is resolved, since i just changed the login shell back to bash and set a custom command to zsh in my terminal.

Daniel van Vugt (vanvugt) wrote :

At a guess, probably some Bourne shell script used in suspend/resume is missing the line '#!/bin/sh' at its top. That would cause the script to fail if you changed default shell.

I have no idea where to start looking for such a mistake though.

summary: - segfault in libmutter-2.so after suspend/resume using wayland
+ Segfault in libmutter-2.so after suspend/resume using wayland. Core
+ files are always truncated and invalid while the default shell is zsh.
Changed in gnome-shell (Ubuntu):
status: Incomplete → New
importance: Undecided → Medium
Thorsten (thorstenr-42) wrote :

okay, sry the bug still occurs just far less frequently (1 out of 100 or so). As stated above i am now using /bin/bash as login shell but i still cannot submit a crash file.

This time an unhandled exception occurs in the apport log (/var/log/apport.log)

so the segfault in the dmesg was
gnome-shell[1493]: segfault at 20 ip 00007fe7ecafeb17 sp 00007fff5e70ba60 error 4 in libmutter-2.so.0.0.0[7fe7eca11000+156000]

but the pid 1493 was not found by apport:

ERROR: apport (pid 5733) Mon May 28 14:48:22 2018: Unhandled exception:
Traceback (most recent call last):
  File "/usr/share/apport/apport", line 451, in <module>
    get_pid_info(pid)
  File "/usr/share/apport/apport", line 68, in get_pid_info
    pidstat = os.stat('/proc/%s/stat' % pid)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/1493/stat'

Daniel van Vugt (vanvugt) wrote :

I wonder if the truncated core files are a bug in apport? Maybe apport's not waiting long enough for the core files to be written. They should be around 1GB each.

Daniel van Vugt (vanvugt) wrote :

... because what we see from your crashes is:

UnreportableReason
Invalid core dump: BFD: warning: /tmp/apport_core_r0cnuvqg is truncated: expected core file size >= 923877376, found: 1114112

UnreportableReason
Invalid core dump: BFD: warning: /tmp/apport_core_8lexsvus is truncated: expected core file size >= 846290944, found: 2162688

beatmaggy (beatmag) wrote :

gnome-shell[1493]: segfault at 20 ip 00007fe7ecafeb17 sp 00007fff5e70ba60 error 4 in libmutter-2.so.0.0.0[7fe7eca11000+156000]

I have this exact segfault too. I’m on a cherry trail atom z8350.
Previously this segfault didn’t occur and the system went to suspend and it could be brought up via keyboard.
I didn’t want it to suspens so changed the suspend settings which didn’t change anything.
Suspend still occurred every 5 min.

So I did disabled suspend via the command line through systemctl.
Then the libmitter2 segfault kept occurring and eventually made the system run out of memory.

I’ll try and get a core dump if it’s available. But my problem is the libmitter2 segfault.
I thought 18.04 defaults to xorg. Why is libmitter2 even segfaulting???

Thorsten (thorstenr-42) wrote :

Hi, i reinstalled my system and tried to reproduced the bug and switch from xorg to wayland session. I think that this time i was able to sent a complete crash report:

https://errors.ubuntu.com/oops/415ba0dc-7077-11e8-997a-fa163e192766

I don't know whether the new system install or the updates to apport/gdm or something else fixed the crash reporting or whether it was just luck.

But i hope this helps to identify the issue.

Daniel van Vugt (vanvugt) wrote :

Thanks.

Your new crash report is a known issue (bug 1754949):
https://errors.ubuntu.com/problem/16426125ad8d92ae4dc9ce9e89450153b0a8b665

So this is now a duplicate of bug 1754949.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers