gnome-shell and Xwayland sometimes leave $HOME/core files (should be /var/crash files)

Bug #1746874 reported by Daniel van Vugt
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
apport (Ubuntu)
Fix Released
High
Brian Murray
Bionic
Fix Released
High
Brian Murray

Bug Description

gnome-shell and Xwayland sometimes leave $HOME/core files (should be /var/crash files). This issue was first mentioned in bug 1746653. But then I noticed my machine doing it too...

This is despite:

$ grep . /proc/sys/kernel/core_*
/proc/sys/kernel/core_pattern:|/usr/share/apport/apport %p %s %c %d %P
/proc/sys/kernel/core_pipe_limit:0
/proc/sys/kernel/core_uses_pid:0

So really there are two problems:

1. ~/core files are created but no /var/crash/ files
2. Because they all have the same name (core_uses_pid == 0), when gnome-shell crashes as a result of an Xwayland crash, you can only get the core from one of them at most.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: gnome-shell 3.26.2-0ubuntu2
ProcVersionSignature: Ubuntu 4.13.0-25.29-generic 4.13.13
Uname: Linux 4.13.0-25-generic x86_64
ApportVersion: 2.20.8-0ubuntu7
Architecture: amd64
Date: Fri Feb 2 15:13:52 2018
DisplayManager:

InstallationDate: Installed on 2017-12-12 (52 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20171211)
SourcePackage: gnome-shell
UpgradeStatus: No upgrade log present (probably fresh install)
---
ApportVersion: 2.20.8-0ubuntu7
Architecture: amd64
CrashReports:
 640:1000:118:3122469:2018-01-29 16:25:03.658845409 +0800:2018-02-02 15:39:18.799970611 +0800:/var/crash/_usr_bin_Xwayland.1000.crash
 640:1000:118:24558024:2018-01-29 16:22:23.127058238 +0800:2018-02-02 15:39:18.887970685 +0800:/var/crash/_usr_bin_gdb.1000.crash
 640:120:118:23982380:2018-01-29 15:52:09.799486456 +0800:2018-01-29 15:51:51.467493515 +0800:/var/crash/_usr_bin_gnome-shell.120.crash
 640:1000:118:6315115:2018-01-29 16:18:12.723424671 +0800:2018-01-29 16:18:10.555428082 +0800:/var/crash/_usr_bin_gnome-shell.1000.crash
 640:120:118:3103046:2018-01-29 15:52:12.923485257 +0800:2018-01-29 15:52:09.859486433 +0800:/var/crash/_usr_bin_Xwayland.120.crash
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2017-12-12 (52 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20171211)
Package: mutter
PackageArchitecture: all
ProcVersionSignature: Ubuntu 4.13.0-25.29-generic 4.13.13
Tags: bionic
Uname: Linux 4.13.0-25-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True

Related branches

Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This issue was first mentioned in bug 1746653. But then I noticed my machine doing it too.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I wonder - is this the default kernel behaviour if the pipe command fails?

/proc/sys/kernel/core_pattern:|/usr/share/apport/apport %p %s %c %d %P

summary: - gnome-shell and Xwayland sometimes leave ~/core files (not crash files)
+ gnome-shell and Xwayland sometimes leave $HOME/core files (should be
+ crash files)
description: updated
summary: gnome-shell and Xwayland sometimes leave $HOME/core files (should be
- crash files)
+ /var/crash files)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1746874

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote : ApportLog.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Dependencies.txt

apport information

Revision history for this message
Daniel van Vugt (vanvugt) wrote : JournalErrors.txt

apport information

Revision history for this message
Daniel van Vugt (vanvugt) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Daniel van Vugt (vanvugt) wrote : ProcEnviron.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This might be an apport bug alone. Historically apport has had trouble working without a graphical shell. And in all cases of this bug seen so far it was when there wasn't a graphical shell running any more :)

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apport (Ubuntu):
status: New → Confirmed
Changed in gnome-shell (Ubuntu):
status: New → Confirmed
Changed in mutter (Ubuntu):
status: New → Confirmed
Revision history for this message
Brian Murray (brian-murray) wrote :

You could try passing a core file to apport when there is not a graphical shell running via a command like the following:

/usr/share/apport/apport <pid> <signal number> <core file ulimit> <dump mode> < $my-pretend.core

With that you should get a .crash file in /var/crash. Here's an example where 11736 was a running xeyes process and core was real crashdump file.

 $ /usr/share/apport/apport 11736 11 0 1 < /etc/X11/core
ERROR: apport (pid 11743) Fri Feb 2 14:47:35 2018: called for pid 11736, signal 11, core limit 0, dump mode 1
ERROR: apport (pid 11743) Fri Feb 2 14:47:35 2018: executable: /usr/bin/xeyes (command line "xeyes")
ERROR: apport (pid 11743) Fri Feb 2 14:47:35 2018: debug: session gdbus call: (true,)

ERROR: apport (pid 11743) Fri Feb 2 14:47:49 2018: wrote report /var/crash/_usr_bin_xeyes.1000.crash

Keep in mind that if you stop the apport service /proc/sys/kernel/core_pattern will revert back to the default which is just core.

 $ cat /proc/sys/kernel/core_pattern
core

So it's likely apport wasn't running on the system and is not the issue.

Changed in apport (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Brian Murray (brian-murray) wrote :

Information from 'journalctl -u apport' around the time of the core file's creation would be useful or /var/log/apport.log if apport was really running.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Turning core files into crash files after the fact is a bit too late in my case. With Xwayland and gnome-shell both racing to write ~/core I will still be missing one of them.

------------------------------

journalctl -u apport says:

...
-- Reboot --
Feb 05 12:01:27 haz systemd[1]: Starting LSB: automatic crash report generation...
Feb 05 12:01:27 haz apport[740]: * Starting automatic crash report generation: apport
Feb 05 12:01:27 haz apport[740]: ...done.
Feb 05 12:01:27 haz systemd[1]: Started LSB: automatic crash report generation.

------------------------------

$ cat /var/log/apport.log
ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: called for pid 923, signal 5, core limit 0, dump mode 1
ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: executable: /usr/bin/gnome-shell (command line "/usr/bin/gnome-shell")
ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: gdbus call error: Error: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files

ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: debug: session gdbus call:
ERROR: apport (pid 1657) Mon Feb 5 12:03:04 2018: wrote report /var/crash/_usr_bin_gnome-shell.120.crash
ERROR: apport (pid 1946) Mon Feb 5 12:05:58 2018: another apport instance is already running, aborting
ERROR: apport (pid 1941) Mon Feb 5 12:05:55 2018: called for pid 1767, signal 6, core limit 18446744073709551615, dump mode 1
ERROR: apport (pid 1941) Mon Feb 5 12:05:55 2018: ignoring implausibly big core limit, treating as unlimited
ERROR: apport (pid 1941) Mon Feb 5 12:05:55 2018: executable: /usr/bin/Xwayland (command line "/usr/bin/Xwayland :0 -rootless -terminate -core -listen 4 -listen 5 -displayfd 6")
ERROR: apport (pid 1941) Mon Feb 5 12:05:55 2018: gdbus call error: Error: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files

ERROR: apport (pid 1941) Mon Feb 5 12:05:55 2018: debug: session gdbus call:
ERROR: apport (pid 1941) Mon Feb 5 12:05:58 2018: wrote report /var/crash/_usr_bin_Xwayland.1000.crash
ERROR: apport (pid 1941) Mon Feb 5 12:05:58 2018: writing core dump to /home/dan/core (limit: -1)
ERROR: apport (pid 1941) Mon Feb 5 12:05:59 2018: writing core dump /home/dan/core of size 91402240

------------------------------

$ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %d %P

Changed in apport (Ubuntu):
status: Incomplete → Confirmed
Changed in apport (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
tags: added: rls-bb-incoming
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

In case it's not clear to all readers yet; Xwayland and gnome-shell are co-dependent processes. If one crashes then the other will also crash. So we need apport to be capable of handling both crashes simultaneously.

no longer affects: gnome-shell (Ubuntu)
no longer affects: linux (Ubuntu)
no longer affects: mutter (Ubuntu)
description: updated
Steve Langasek (vorlon)
tags: removed: rls-bb-incoming
Revision history for this message
Julian Andres Klode (juliank) wrote :

I added a merge proposal https://code.launchpad.net/~juliank/apport/lp1746874/+merge/337558 that simply makes apport wait for the lock instead of failing if apport is already running. I made it time out after 30 seconds, I'd hope that's enough.

Revision history for this message
Brian Murray (brian-murray) wrote :

Looking at the log file Daniel provided I'm not certain 30 seconds is enough:

ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: called for pid 923, signal 5, core limit 0, dump mode 1
ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: executable: /usr/bin/gnome-shell (command line "/usr/bin/gnome-shell")
ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: gdbus call error: Error: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files

ERROR: apport (pid 1657) Mon Feb 5 12:02:48 2018: debug: session gdbus call:
ERROR: apport (pid 1657) Mon Feb 5 12:03:04 2018: wrote report /var/crash/_usr_bin_gnome-shell.120.crash
ERROR: apport (pid 1946) Mon Feb 5 12:05:58 2018: another apport instance is already running, aborting
ERROR: apport (pid 1941) Mon Feb 5 12:05:55 2018: called for pid 1767, signal 6, core limit 18446744073709551615, dump mode 1

It is strange though that the crash file is written at 12:03:04. I wonder what is going on after that.

Revision history for this message
Julian Andres Klode (juliank) wrote :

So, what we're seeing is that 1657 ended, but 1941 and 1946 started at the same time, roughly. hence 1941 got the lock and 1946 not. The run of 1657 took less than 20 seconds, so it seems to me 30 seconds are probably fine, but we can go up to a minute.

Revision history for this message
Brian Murray (brian-murray) wrote :

Got it, I missed the difference between 1941 and 1946. Thanks for the clarification.

Changed in apport (Ubuntu Bionic):
assignee: nobody → Brian Murray (brian-murray)
status: Triaged → In Progress
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Fix committed to lp:apport at revision 3186.

Changed in apport (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brian Murray (brian-murray) wrote :

There are some test failures we need to sort out here though.

Changed in apport (Ubuntu Bionic):
status: Fix Committed → In Progress
tags: added: id-5a7c7b8d92fe9d5d80fead28
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apport - 2.20.9-0ubuntu1

---------------
apport (2.20.9-0ubuntu1) bionic; urgency=medium

  * New upstream release:
    - apport/sandboxutils.py: when installing extra packages do not install
      the debug versions of them as this was installing gdb-dbg. If debug
      versions of a package are specifically required they can be passed as
      an extra-package.
    - backends/packaging-apt-dpkg.py: when reusing a sandbox do not remove
      conflicting packages when they conflict with themselves, or when they
      conflict with but also provide a virtual package.
    - bin/apport-retrace: add a --no-stracktrace-source option that does not
      do the work of creating a StacktraceSource field in the retraced report
      thereby decreasing the time to retrace.
    - data/apport: wait for lock, with 30s timeout (LP: #1746874)

 -- Brian Murray <email address hidden> Wed, 14 Feb 2018 10:21:20 -0800

Changed in apport (Ubuntu Bionic):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.