dbus timeout-ed during an upgrade, taking services down including gdm

Bug #1871538 reported by Christian Ehrhardt 
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
D-Bus
Unknown
Unknown
systemd
New
Unknown
accountsservice (Ubuntu)
Status tracked in Impish
Focal
Undecided
Unassigned
Groovy
Undecided
Unassigned
Hirsute
Undecided
Unassigned
Impish
Undecided
Unassigned
dbus (Ubuntu)
Status tracked in Impish
Focal
High
Unassigned
Groovy
Undecided
Unassigned
Hirsute
Undecided
Unassigned
Impish
High
Unassigned
gnome-shell (Ubuntu)
Status tracked in Impish
Focal
Undecided
Unassigned
Groovy
Undecided
Unassigned
Hirsute
Undecided
Unassigned
Impish
Undecided
Unassigned

Bug Description

This morning I found my computer on the login screen.
But not the one of the screen log, no a new one - so something must have crashed.

Logging in again confirmed that all apps were gone and the gnome shell was brought down what seems like triggered by a background update o accountsservice.

As always things are not perfectly clear :-/
The following goes *back* in time through my logs one by one.

Multiple apps crashed at 06:09, but we will find later that this is a follow on issue of the underlying gnome/X/... recycling.
-rw-r----- 1 paelzer whoopsie 52962868 Apr 8 06:09 _usr_bin_konversation.1000.crash
-rw-r----- 1 paelzer whoopsie 986433 Apr 8 06:09 _usr_lib_x86_64-linux-gnu_libexec_drkonqi.1000.crash

rdkit was failing fast and giving up (that will be a different bug, it just seems broken on my system):
Apr 08 06:10:13 Keschdeichel systemd[1]: Started RealtimeKit Scheduling Policy Service.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Successfully called chroot.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Successfully dropped privileges.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Successfully limited resources.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: pthread_create failed: Resource temporarily unavailable
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Canary thread running.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Exiting canary thread.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Demoting known real-time threads.
Apr 08 06:10:13 Keschdeichel rtkit-daemon[1729333]: Demoted 0 threads.
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Failed with result 'exit-code'.
Apr 08 06:10:13 Keschdeichel dbus-daemon[1208]: [system] Activating via systemd: service name='org.freedesktop.RealtimeKit1' unit='rtkit-daemon.service' requested by ':1.1176' (uid=121 pid=>
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Start request repeated too quickly.
Apr 08 06:10:13 Keschdeichel systemd[1]: rtkit-daemon.service: Failed with result 'exit-code'.
Apr 08 06:10:13 Keschdeichel systemd[1]: Failed to start RealtimeKit Scheduling Policy Service.
Apr 08 06:10:13 Keschdeichel bluetoothd[1729331]: Bluetooth daemon 5.53

But that already was only triggered by a gnome restart that kicked of earlier:

Apr 08 06:09:27 Keschdeichel systemd[1726656]: Started GNOME Shell on Wayland.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Shell on Wayland.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Session is initialized.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Wayland Session.
Apr 08 06:09:27 Keschdeichel systemd[1726656]: Reached target GNOME Session (session: gnome-login).

X was recycleing before:
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopping GNOME Shell on X11...
...
Apr 08 06:09:22 Keschdeichel /usr/lib/gdm3/gdm-x-session[10710]: (EE) systemd-logind: ReleaseControl failed: Unknown object '/org/freedesktop/login1/session/_32'.
Apr 08 06:09:22 Keschdeichel /usr/lib/gdm3/gdm-x-session[10710]: (II) Server terminated successfully (0). Closing log file.

It seems like some internal service broke and everything that followed was a secondary issue to that:
Apr 08 06:09:19 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: wpa_supplicant.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: thermald.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel thermald[1256]: [WARN]Terminating ...
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Got SIGTERM, quitting.
Apr 08 06:09:19 Keschdeichel systemd[1]: udisks2.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel ModemManager[1308]: <info> Caught signal, shutting down...
Apr 08 06:09:19 Keschdeichel systemd[1]: switcheroo-control.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe78:26c2.
Apr 08 06:09:19 Keschdeichel systemd[1]: accounts-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface strswanbr2.IPv4 with address 192.168.2.0.
Apr 08 06:09:19 Keschdeichel systemd[1]: gdm.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
Apr 08 06:09:19 Keschdeichel systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface strswanbr1.IPv4 with address 192.168.1.0.
Apr 08 06:09:19 Keschdeichel systemd[1]: systemd-machined.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lxdbr0.IPv6 with address fd42:fa18:c923:35d5::1.
Apr 08 06:09:19 Keschdeichel systemd-machined[1255]: Failed to drop reference to machine scope, ignoring: Transport endpoint is not connected
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lxdbr0.IPv4 with address 10.253.194.1.
Apr 08 06:09:19 Keschdeichel systemd[1]: switcheroo-control.service: Succeeded.
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface wlp4s0.IPv6 with address fe80::6832:c873:a074:2c7b.
Apr 08 06:09:19 Keschdeichel systemd[1]: systemd-machined.service: Succeeded.
Apr 08 06:09:19 Keschdeichel bluetoothd[1724347]: Terminating
Apr 08 06:09:19 Keschdeichel systemd[1]: avahi-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface wlp4s0.IPv4 with address 10.7.0.42.
Apr 08 06:09:19 Keschdeichel systemd[1]: ModemManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface enp0s31f6.IPv6 with address fe80::80a8:743:6b37:e561.
Apr 08 06:09:19 Keschdeichel systemd[1]: polkit.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface enp0s31f6.IPv4 with address 10.7.0.69.
Apr 08 06:09:19 Keschdeichel systemd[1]: bolt.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lo.IPv6 with address ::1.
Apr 08 06:09:19 Keschdeichel systemd[1]: rtkit-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
Apr 08 06:09:19 Keschdeichel systemd[1]: packagekit.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel wpa_supplicant[1265]: nl80211: deinit ifname=p2p-dev-wlp4s0 disabled_11b_rates=0
Apr 08 06:09:19 Keschdeichel systemd[1]: colord.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: upower.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: bluetooth.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: fwupd.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 06:09:19 Keschdeichel xdg-desktop-por[11011]: xdg-desktop-portal-gtk: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Apr 08 06:09:19 Keschdeichel NetworkManager[1209]: <info> [1586318959.7797] caught SIGTERM, shutting down normally.
Apr 08 06:09:19 Keschdeichel avahi-daemon[1204]: avahi-daemon 0.7 exiting.
Apr 08 06:09:19 Keschdeichel systemd[1]: rtkit-daemon.service: Succeeded.
Apr 08 06:09:19 Keschdeichel systemd[1]: Starting Process error reports when automatic reporting is enabled...
Apr 08 06:09:19 Keschdeichel systemd[1]: colord.service: Succeeded.
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopped target GNOME X11 Session (session: ubuntu).
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopped target Current graphical user session.
Apr 08 06:09:19 Keschdeichel systemd[10683]: unicast-local-avahi.path: Succeeded.
Apr 08 06:09:19 Keschdeichel systemd[10683]: Stopped Path trigger for Avahi .local domain notifications.
Apr 08 06:09:19 Keschdeichel systemd[10683]: update-notifier-release.path: Succeeded.
Apr 08 06:09:19 Keschdeichel gnome-shell[11036]: gnome-shell: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.

The last things before was systemd reload - maybe due to a background upgrade?

Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: Unknown group "power" in message bus configuration file
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: [system] Reloaded configuration
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: Unknown group "power" in message bus configuration file
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: [system] Reloaded configuration
Apr 08 06:08:54 Keschdeichel dbus-daemon[1208]: Unknown group "power" in message bus configuration file
Apr 08 06:08:54 Keschdeichel systemd[1]: Reloading.

And indeed there was an upgrade going on:

40773 2020-04-08 06:08:46 status installed libc-bin:amd64 2.31-0ubuntu7
40774 2020-04-08 06:08:54 startup archives unpack
40775 2020-04-08 06:08:54 upgrade accountsservice:amd64 0.6.55-0ubuntu10 0.6.55-0ubuntu11
40776 2020-04-08 06:08:54 status half-configured accountsservice:amd64 0.6.55-0ubuntu10
40777 2020-04-08 06:08:54 status unpacked accountsservice:amd64 0.6.55-0ubuntu10
40778 2020-04-08 06:08:54 status half-installed accountsservice:amd64 0.6.55-0ubuntu10
40779 2020-04-08 06:08:54 status triggers-pending dbus:amd64 1.12.16-2ubuntu2
40780 2020-04-08 06:09:19 status unpacked accountsservice:amd64 0.6.55-0ubuntu11
40781 2020-04-08 06:09:20 upgrade libaccountsservice0:amd64 0.6.55-0ubuntu10 0.6.55-0ubuntu11
40782 2020-04-08 06:09:20 status triggers-pending libc-bin:amd64 2.31-0ubuntu7
40783 2020-04-08 06:09:20 status half-configured libaccountsservice0:amd64 0.6.55-0ubuntu10
40784 2020-04-08 06:09:20 status unpacked libaccountsservice0:amd64 0.6.55-0ubuntu10
40785 2020-04-08 06:09:20 status half-installed libaccountsservice0:amd64 0.6.55-0ubuntu10
40786 2020-04-08 06:09:20 status unpacked libaccountsservice0:amd64 0.6.55-0ubuntu11

Nothing suspicious before that time.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: accountsservice 0.6.55-0ubuntu11
ProcVersionSignature: Ubuntu 5.3.0-45.37~18.04.1-generic 5.3.18
Uname: Linux 5.3.0-45-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu24
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Apr 8 07:51:03 2020
InstallationDate: Installed on 2018-10-12 (543 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
SourcePackage: accountsservice
UpgradeStatus: Upgraded to focal on 2020-04-03 (4 days ago)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
tags: added: champagne
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Most likely this was bug 1843982. So let's use that for now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Another crash just happened:

Apr 08 10:28:06 Keschdeichel systemd[1]: Reloading.
Apr 08 10:28:06 Keschdeichel systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run>
Apr 08 10:28:17 Keschdeichel dbus-daemon[1726352]: [session uid=1000 pid=1726352] Activating via systemd: service name='org.freedesktop.Tracker1' unit='tracker-store.service' requested by '>
Apr 08 10:28:17 Keschdeichel systemd[10683]: Starting Tracker metadata database store and lookup manager...
Apr 08 10:28:17 Keschdeichel dbus-daemon[1726352]: [session uid=1000 pid=1726352] Successfully activated service 'org.freedesktop.Tracker1'
Apr 08 10:28:17 Keschdeichel systemd[10683]: Started Tracker metadata database store and lookup manager.
Apr 08 10:28:32 Keschdeichel systemd[1]: We couldn't coldplug machine-qemu\x2d4\x2df\x2drtkit.scope, proceeding anyway: Connection timed out
Apr 08 10:28:32 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 10:28:32 Keschdeichel avahi-daemon[1726370]: Got SIGTERM, quitting.
Apr 08 10:28:32 Keschdeichel systemd[1]: wpa_supplicant.service: Unexpected error response from GetNameOwner(): Connection terminated
Apr 08 10:28:32 Keschdeichel ModemManager[1727115]: <info> Caught signal, shutting down...
Apr 08 10:28:32 Keschdeichel systemd[1]: thermald.service: Unexpected error response from GetNameOwner(): Connection terminated

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Daniel,
none of the crashes that I had has the same signature as those that are reported on the dup.
Furthermore as I outlined the crashes seem to be secondary issues after soemthing breaks and recycles gnome-shell.

I'd ask for re-triage as that doesn't seem to be the same thing to me.

Revision history for this message
Sebastien Bacher (seb128) wrote :

@Christian, could you add the journalctl log from that session, that might include some hints. Also do you have any gnome-shell/gdm crash collected in /var/crash?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

Changed in accountsservice (Ubuntu):
status: New → Incomplete
Changed in dbus (Ubuntu):
status: New → Incomplete
Changed in gnome-shell (Ubuntu):
status: New → Incomplete
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> Also do you have any gnome-shell/gdm crash collected in /var/crash?

No, just these:
$ ll /var/crash/*.crash
-rw-r----- 1 paelzer whoopsie 3589735 Apr 6 08:34 /var/crash/_usr_bin_gnome-clocks.1000.crash
-rw-r----- 1 paelzer whoopsie 53170176 Apr 8 10:36 /var/crash/_usr_bin_konversation.1000.crash
-rw-r----- 1 paelzer whoopsie 986433 Apr 8 06:09 /var/crash/_usr_lib_x86_64-linux-gnu_libexec_drkonqi.1000.crash

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> could you add the journalctl log from that session, that might include some hints

Sure attached here, you see in the initial report and the later comments the time indexes to look out for.

Also FYI for the rtkit issue that you will see in there => bug 1871543

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This looks strongly like bug 1843982. If you disagree and still want a more thorough analysis then please proceed to step 2 in comment #6.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

> 1. Look in /var/crash for crash files and if found run:
> ubuntu-bug YOURFILE.crash
> Then tell us the ID of the newly-created bug.

Again, they all seem to be prior or follow on issues, but they already have IDs in the error tracker.

/var/crash/_usr_bin_gnome-clocks.1000.uploaded
a615fed4-77d0-11ea-a806-fa163ee63de6
/var/crash/_usr_bin_konversation.1000.uploaded
0e56c64e-7976-11ea-ab51-fa163e983629
/var/crash/_usr_lib_x86_64-linux-gnu_libexec_drkonqi.1000.uploaded
29cd6100-7976-11ea-a249-fa163e6cac46

They all show up in my personal error-list at:
https://errors.ubuntu.com/user/350ea932144813c645983ba6ebba1ab8a271bbbcc08853010a42e92011f8802f77829fce9a7ad953c6c31a499d176f6312a0148c01ef7facaaa87be74566a786

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Bug 1843982 is specifically about gnome-shell (and gdm) crashing when accountsservice gets restarted, which is what you mentioned at the top of this bug. So even if you are experiencing multiple bugs, I think that's the one that matches the main bug report here.

Revision history for this message
Sebastien Bacher (seb128) wrote :

@Daniel, that doesn't look like this bug, from the journal

Apr 08 06:09:19 Keschdeichel systemd[1]: We couldn't coldplug machine-qemu\x2d2\x2df\x2dnvidia.scope, proceedin
g anyway: Connection timed out
Apr 08 06:09:19 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner():
 Connection terminated
Apr 08 06:09:19 Keschdeichel systemd[1]: wpa_supplicant.service: Unexpected error response from GetNameOwner():
 Connection terminated

etc for a stack of dbus using system services including gdm

It looks like a problem at the systemd or dbus level, I'm going to mark it invalid for the desktop components

Changed in accountsservice (Ubuntu):
status: Incomplete → Invalid
Changed in gnome-shell (Ubuntu):
status: Incomplete → Invalid
Changed in dbus (Ubuntu):
importance: Undecided → High
Revision history for this message
Sebastien Bacher (seb128) wrote :

@Christian, dbus got reloaded a bunch of time during the upgrade according to the log, that's expected, what isn't is that it stopped responding.
I haven't seen other reports from that so far so it's probably an uncommon bug/issue hit there, I'm unsure how to debug it, especially after facts...

summary: - Gnome Shell crashed on background upgrade of accountsservice
+ dbus timeout-ed during an upgrade, taking services down including gdm
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Sure @seb128 - I'll let you know if it happens again (as I did a cleaning-reboot now).

Revision history for this message
Steve Beattie (sbeattie) wrote :

Hi, I had a similar crash this morning upgrading focal, after trying to get unattended-upgrades to stop spinning on missing focal-security apt list files. In this case, I don't use gnome-shell as my desktop environment, but it still tore down my entire desktop environment and caused gdm3 to respawn.

I've attached the journalctl log from when I started apt upgrade to when I successfully re-logged into my i3 session.

Revision history for this message
Steve Beattie (sbeattie) wrote :

Oh, and I have no crash files in /var/crash/.

tags: added: rls-ff-incoming
tags: added: id-5e988df7fb344884f67bc04f
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (8.8 KiB)

I hit that again today on an upgrade.
Network manager didn't come up again.

Mai 20 11:04:46 Keschdeichel systemd[1]: NetworkManager.service: Unexpected error response from GetNameOwner(): Connection terminated
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9491] caught SIGTERM, shutting down normally.
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9504] dhcp4 (enp0s31f6): canceled DHCP transaction
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9504] dhcp4 (enp0s31f6): state changed extended -> done
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9504] device (enp0s31f6): DHCPv4: trying to acquire a new lease within 90 seconds
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9733] dhcp4 (wlp4s0): canceled DHCP transaction
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9812] dhcp4 (wlp4s0): state changed bound -> done
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9813] device (wlp4s0): DHCPv4: trying to acquire a new lease within 90 seconds
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9817] device (p2p-dev-wlp4s0): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'managed')
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9825] device (strswanbr2-nic): released from master device strswanbr2
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9829] device (strswanbr1-nic): released from master device strswanbr1
Mai 20 11:04:46 Keschdeichel NetworkManager[1330]: <info> [1589965486.9831] device (veth47ecccc5): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0047] device (virbr0-nic): released from master device virbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0050] device (vethcc7405c8): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0069] device (vetha09a9db4): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0077] device (veth754add92): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0094] device (vethc060f004): released from master device virbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0101] device (veth3152ecc9): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.0109] device (vethb6dd0569): released from master device lxdbr0
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1008] device (virbr0): bridge port vnet1 was detached
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1025] manager: NetworkManager state is now CONNECTED_SITE
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1030] device (virbr0): bridge port vnet0 was detached
Mai 20 11:04:47 Keschdeichel NetworkManager[1330]: <info> [1589965487.1629] exiting (success)
Mai 20 11:04:47 Keschdeichel systemd[1]: NetworkManager...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for dbus (Ubuntu Focal) because there has been no activity for 60 days.]

Changed in dbus (Ubuntu Focal):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for dbus (Ubuntu) because there has been no activity for 60 days.]

Changed in dbus (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Steve Beattie (sbeattie) wrote :
Download full text (5.0 KiB)

I also hit this again in focal on 2020-06-25, with an update to systemd 245.4-4ubuntu3.1; I had previously updated dbus to 1.12.16-2ubuntu2.1 on 2020-06-17 without event. It's still an issue at least with updates to systemd in focal.

Similar messages end up in the journal:

Jun 25 13:04:55 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:04:55 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:04:55 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:04:55 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:09 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:09 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:10 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:10 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:10 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:10 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:10 kryten dbus-daemon[1541]: Unknown group "power" in message bus configuration file
Jun 25 13:05:10 kryten systemd[1]: Reloading.
Jun 25 13:05:11 kryten systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
Jun 25 13:05:11 kryten systemd[1]: /lib/systemd/system/fancontrol.service:11: PIDFile= references a path below legacy directory /var/run/, updating /var/run/fancontrol.pid → /run/fancontrol.pid; please update the unit file accordingly.
Jun 25 13:05:36 kryten systemd[1]: We couldn't coldplug machine-qemu\x2d1\x2dkeybase\x2dbionic\x2damd64.scope, proceeding anyway: Connection timed out
Jun 25 13:05:36 kryten dbus-daemon[1541]: [system] Reloaded configuration
Jun 25 13:05:36 kryten audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 25 13:05:36 kryten audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=switcheroo-control comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 25 13:05:36 kryten audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd" exe...

Read more...

Changed in dbus (Ubuntu):
status: Expired → Confirmed
Changed in dbus (Ubuntu Focal):
status: Expired → Confirmed
Changed in systemd:
status: Unknown → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The last two updates went smooth for me, not triggering it.
I'm still too afraid to "just update while working" as I did in the past since it could drag things down thou :-/
I was feeling like this could be solved, but reading that Steve is also affected I'm worried that I might just not hit the same race/trigger by accident.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I've a similar issue on Groovy. I got it 2 days in a row after my daily package update.
Reverting the machine to a state before the upgrade and re-upgrading the same packages doesn't reproduce the issue. However, when it worked with the same package update the message "Unexpected error response from GetNameOwner()" is not present in the logs.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Hm, so is desktop working on this? Have we figured out at all what is happening? Or just the general symptoms that dbus seems to be non-responsive and then everything goes bad?

Changed in dbus (Ubuntu):
status: Confirmed → Incomplete
Changed in dbus (Ubuntu Focal):
status: Confirmed → Incomplete
Revision history for this message
Sebastien Bacher (seb128) wrote :

@Dimitri desktop isn't working on it at this point, unsure how to figure out where the issue is exactly but it sounds like it could be either dbus or systemd which are owned by foundations

Revision history for this message
Lukas Märdian (slyon) wrote :

By chance (while developing some netplan code), I seem to have found a reproducer for this bug... I don't know what is going on, but calling the 'Trigger' method from the attached 'dbus.c' file will kill the GDM session in a reproducible way.

The invalid `kill(-1, SIGINT);` of line 21, seems to somehow crash the whole DBus/GDM and other services. So there must be some problem inside DBus's error handling, which is somehow triggered during the updates.

 * WARNING: calling the Trigger method via busctl will kill your GDM session
 *
 * compile: gcc -o out dbus.c `pkg-config --cflags --libs libsystemd glib-2.0`
 * execute: sudo ./out
 * trigger: sudo busctl call io.netplan.Netplan /io/netplan/Netplan io.netplan.Netplan Trigger

Revision history for this message
Julian Andres Klode (juliank) wrote :

kill(-1, SIGINT) kills all processes on the system with SIGINT, so this is hardly surprising.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Well, more correctly -1 means to all processes you can send to, and if you're root, well it's everyone.

Revision history for this message
Iain Lane (laney) wrote :

Yeah, sorry, I don't think that's a reproducer for the bug. You killed dbus, gdm and everything else, and they obligingly died on you. :)

My feeling is that this bug is about a deadlock that somehow happens when dbus and/or systemd are reloaded, possibly at the same time. It looks like dbus stops responding to systemd, and so when it calls GetNameOwner() to be able to watch type=dbus units, it gets no reply and then decides to kill all type=dbus units, which is a lot of things. If that's right, it's most likely to be a dbus bug that it can get into this state. But I'm personally at a bit of a loss as to how to approach it without being able to reproduce.

Revision history for this message
Lukas Märdian (slyon) wrote :

Oh.. I wasn't really aware of that (-1 = all) and it looked so similar.

Right, so we still somehow need to find a proper reproducer...

tags: removed: rls-ff-incoming
tags: removed: champagne
Revision history for this message
Gauthier Jolly (gjolly) wrote :

I seems that we are seeing this issue on Azure with Ubuntu >=20.04. The CPC team is running a series of tests before publishing images, those tests spin up ~50 instances in Azure and usually one of them is having this issue.

You can find the system logs of a failing instance here: https://pastebin.ubuntu.com/p/WC4M7JdTGk/ in which you can read the following:

Jun 3 06:57:12 alan-focal-base-uuisspbroykxijgbixkk systemd[1]: accounts-daemon.service: Unexpected error response from GetNameOwner(): Connection terminated
Jun 3 06:57:12 alan-focal-base-uuisspbroykxijgbixkk systemd[1]: systemd-logind.service: Unexpected error response from GetNameOwner(): Connection terminated
Jun 3 06:57:12 alan-focal-base-uuisspbroykxijgbixkk systemd[1]: polkit.service: Unexpected error response from GetNameOwner(): Connection terminated

I built a custom hirsute image with dbus-broker installed and I was unable to reproduce the issue with it.

Revision history for this message
Iain Lane (laney) wrote (last edit ):

That's interesting actually. I thought this was an interaction between reloading dbus-daemon and daemon-reloading systems at the same time. But in this cloud case, I don't think dbus is being reloaded, is it?

It's going to be next to impossible for me to look into this in detail without being able to reproduce it or getting a bit more debug information.

Maybe you could crank up the systemd log level and try to reproduce this?

Lukas Märdian (slyon)
tags: added: rls-ff-incoming rls-hh-incoming
tags: added: rls-ii-incoming
Revision history for this message
Gauthier Jolly (gjolly) wrote :

@laney I built a custom image with system logs level set to debug and I was able to reproduce the issue. You can find the logs attached.

Revision history for this message
Gauthier Jolly (gjolly) wrote :

Also, systemd is actually reloaded by cloud-init. In cloud-init logs, I can read:

2021-06-16 08:54:51,608 - subp.py[DEBUG]: Running command ['systemctl', 'daemon-reload'] with allowed return codes [0] (shell=False, capture=True)
2021-06-16 08:54:51,953 - cc_mounts.py[DEBUG]: Activate mounts: PASS:systemctl daemon-reload
2021-06-16 08:54:51,953 - handlers.py[DEBUG]: finish: init-network/config-mounts: SUCCESS: config-mounts ran successfully

Revision history for this message
Iain Lane (laney) wrote :

Thanks! Here's where I think the problem starts:

Ok, let's authenticate to dbus:

Jun 16 08:54:54 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Bus bus-api-system: changing state AUTHENTICATING → HELLO
...
Jun 16 08:54:54 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Bus bus-api-system: changing state HELLO → RUNNING

Right, systemd thinks it's done it at least. You can see a whole bunch of messages being returned, the connection is working.

After a while (note this first timestamp jump; is that significant? looks to be waiting for udev to settle I think so maybe not; could be snapd related) we decide to do something with a Type=dbus unit:

Jun 16 08:55:25 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=AddMatch cookie=1511 reply_cookie=0 signature=s error-name=n/a error-message=n/a
Jun 16 08:55:25 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Sent message type=method_call sender=n/a destination=org.freedesktop.DBus path=/org/freedesktop/DBus interface=org.freedesktop.DBus member=GetNameOwner cookie=1512 reply_cookie=0 signature=s error-name=n/a error-message=n/a

And another timestamp jump, waiting for this timeout:

Jun 16 08:55:45 alan-hirsute-base-aiamcicscciaelhaktpo dbus-daemon[711]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 45129ms)
Jun 16 08:55:45 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Got message type=method_return sender=org.freedesktop.DBus destination=:1.4 path=n/a interface=n/a member=n/a cookie=42 reply_cookie=1512 signature=s error-name=n/a error-message=n/a

The dbus-daemon didn't think that the connection was authenticated.

The last message AFAICS received on the bus connection was:

Jun 16 08:55:00 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Got message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=Reload cookie=1 reply_cookie=0 signature=n/a error-name=n/a error-message=n/a
Jun 16 08:55:00 alan-hirsute-base-aiamcicscciaelhaktpo systemd[1]: Reloading.

No smoking gun yet, at least not to me. But interesting that the last thing which happened successfully was reloading. Can we have another go please, with these changes vs. an upstream clean image?

  - LogLevel=debug in /etc/systemd/system.conf
  - Using the dbus 1.12.20-2ubuntu1+ppa1 from ppa:ci-train-ppa-service/4587. This dbus is just the one from impish backported to hirsute with verbose debugging turned on.

And then attach the full `journalctl -b <the boot that failed>` which will probably be quite large. :-)

Maybe we'll get some information out of the dbus side to say what it thinks goes wrong with the connection.

** Please note ** The "connection has not authenticated" string does not appear in the earlier logs from Christian or Steve. It could be that there are two bugs here, with similar symptoms that systemd loses its connection to the system bus.

Revision history for this message
Gauthier Jolly (gjolly) wrote :

Ok I will cook an other custom image and try to reproduce.

Concerning this log line:

Jun 16 08:55:45 alan-hirsute-base-aiamcicscciaelhaktpo dbus-daemon[711]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 45129ms)

Please note that during our first investigation with @slyon, we added this
<busconfig>
  <limit name="auth_timeout">90000</limit>
</busconfig>

in /etc/dbus-1/system-local.conf and this particular error disappeared. However the original issue ("Unexpected error response from GetNameOwner()" and slow critical-chain) was still there.

Revision history for this message
Iain Lane (laney) wrote :

I filed an upstream bug with D-Bus: https://gitlab.freedesktop.org/dbus/dbus/-/issues/337

Hopefully a maintainer there will be able to give a better opinion as to what's going on here

tags: added: fr-1460
tags: removed: rls-ff-incoming rls-hh-incoming rls-ii-incoming
Iain Lane (laney)
Changed in accountsservice (Ubuntu Groovy):
status: New → Invalid
Changed in accountsservice (Ubuntu Hirsute):
status: New → Invalid
Changed in gnome-shell (Ubuntu Groovy):
status: New → Invalid
Changed in gnome-shell (Ubuntu Hirsute):
status: New → Invalid
Changed in dbus (Ubuntu Groovy):
status: New → Incomplete
Changed in dbus (Ubuntu Hirsute):
status: New → Incomplete
Revision history for this message
Lukas Märdian (slyon) wrote :

An interesting observation I made is that all journal logs (in this LP bug and the upstream systemd bug) show the "polkit.service: Unexpected error response from GetNameOwner(): Connection terminated" message, in addition to a bunch of other "Connection terminated" errors from changing services across the different installations (NetworkManager.service, accounts-daemon.service, systemd-logind.service, ...). I've also seen logs (from Gauthier) where only polkit.service is failing.

The polkit.service "Unexpected error response from GetNameOwner()" message is a log from systemd effectively telling us that systemd cannot reach org.freedesktop.PolicyKit1 on DBus. But the root cause for that must be that polkitd is not running (or dbus is somehow stuck in a deadlock, not responding anymore). This gives a new twist to this issue, as we might be searching for a phantom in (lib-)systemd... PolicyKit is using glib's gdbus implementation.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.