Dist upgrades from Jammy to Noble crash [Oh no! Something has gone wrong.]

Bug #2054761 reported by Daniel van Vugt
78
This bug affects 21 people
Affects Status Importance Assigned to Milestone
mutter (Ubuntu)
Invalid
High
Daniel van Vugt
Noble
Invalid
Undecided
Unassigned
systemd (Ubuntu)
Invalid
High
Nick Rosbrook
Noble
Fix Committed
High
Nick Rosbrook

Bug Description

[Impact]
During upgrades from Jammy to Noble, systemd.postinst trys to reexec all running user managers. It does so using a feature that was not added until v250, and attempting this against a v249 daemon results in it being killed instead, which brings down all user sessions.

Hence, during the upgrades, the user session is killed, and the system is left in a bad state.

[Test Plan]

Run an upgrade from Jammy to Noble on Ubuntu desktop. The upgrade should proceed normally (or at least not have the entire session killed by systemd.postinst).

[Where problems could occur]

The fix is to add a version guard against this logic in systemd.postinst. If the version string was wrong or typo'd, the fix would not work as expected.

[Original Description]

Feb 21 21:39:12 autopkgtest gnome-shell[17945]: Settings schema 'org.gnome.mutter.wayland' does not contain a key named 'xwayland-allow-byte-swapped-clients'
Feb 21 21:39:12 autopkgtest gnome-session-binary[17908]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5
Feb 21 21:39:12 autopkgtest gnome-shell[17959]: Settings schema 'org.gnome.mutter.wayland' does not contain a key named 'xwayland-allow-byte-swapped-clients'
Feb 21 21:39:12 autopkgtest gnome-session-binary[17908]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5

https://errors.ubuntu.com/problem/bf714caff944bed915a3c4321664107c65547d1f
https://errors.ubuntu.com/problem/db8f7e3dfc79e658b9b2aa8c596b014ce4b9f217
https://errors.ubuntu.com/oops/af2e99fc-d101-11ee-8a58-fa163ec8ca8c

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Fix released in mutter 45.0.

Changed in mutter (Ubuntu):
status: New → Fix Released
tags: added: rls-jj-incoming
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

That's weird. The crash came from a system that already had mutter 45 in the process of upgrading (bug 2054319). Given the fix already existed in the new binary, I'm not sure what we could do to stop the old binary from crashing. Maybe more gentle upgrades somehow?

tags: removed: rls-jj-incoming
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also the missing key was introduced in mutter-common version 44.0. It sounds like it must have been installed at the time of the crash, but not compiled yet?

Changed in mutter (Ubuntu):
status: Fix Released → Incomplete
Changed in mutter:
status: Unknown → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This appears to be the most common (and currently only) recurring gnome-shell crash being reported in Noble:

https://errors.ubuntu.com/problem/bf714caff944bed915a3c4321664107c65547d1f

Although you have to dig into the individual reports and check their journals to confirm they're bug 2054761.

Changed in mutter (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → Medium
Changed in gnome-shell (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
description: updated
tags: added: noble
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Maybe fixing bug 2060423 would also help here?

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/2054761

tags: added: iso-testing
Revision history for this message
Skia (hyask) wrote :

With the latest upload of mutter (46.0-1ubuntu7), we still have `gnome-shell` crashes during Jammy to Noble upgrades.

Here is a recent oops: https://errors.ubuntu.com/oops/0ec860d4-fd60-11ee-9dae-fa163ec44ecd
Let me know if you need anything else.

description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Bumping importance since this is officially a blocker:
https://discourse.ubuntu.com/t/noble-numbat-24-04-release-status-tracking/44043

Changed in gnome-shell (Ubuntu):
importance: Medium → High
Changed in mutter (Ubuntu):
importance: Medium → High
milestone: none → noble-updates
Changed in gnome-shell (Ubuntu):
milestone: none → noble-updates
Changed in mutter (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Note to self: Check if bug 2065587 is related.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

When trying to reproduce this bug I am seeing fatal errors failing to find schema 'org.gnome.desktop.peripherals.pointingstick'. Even though it is installed.

I suspect bug 2065587 is the same issue but am yet to confirm a working fix.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I have now reproduced the original issue twice:

  Settings schema 'org.gnome.mutter.wayland' does not contain a key named 'xwayland-allow-byte-swapped-clients'

What happened was that gnome-shell 42.9 got SIGKILL'd (along with all other user processes). And after that the system tries to restart it as gnome-shell 46.0 which isn't fully installed/configured.

The SIGKILL seems to have come from a systemd restart. I did say yes to automatically restarting services, so maybe I shouldn't have?

may 16 16:17:07 jammytest2 systemd[1]: Reexecuting.
...
may 16 16:17:09 jammytest2 systemd[1]: user@1000.service: Killing process 1228 (gnome-shell) with signal SIGKILL.

tags: added: rls-nn-incoming
Revision history for this message
Nick Rosbrook (enr0n) wrote :

I believe I figured out what is happening here. In systemd v250, systemd user managers began interpreting the signal RTMIN+25 as a command to daemon-reexec. This was done so that:

systemctl kill --kill-whom='main' --signal='SIGRTMIN+25' 'user@*.service'

could be used as a way to reexec all user instances at once. So, systemd.postinst now contains such a line so that user managers are also reexec'd during package upgrades.

However, since Jammy is running systemd v249, the running user manager at the time of the upgrade does not have this reexec logic, and appears to behave as if it were killed normally. So, the fix should be to not attempt to restart all the user managers if the old version is too old.

Changed in systemd (Ubuntu):
importance: Undecided → High
assignee: nobody → Nick Rosbrook (enr0n)
status: New → Triaged
tags: added: systemd-sru-next
no longer affects: mutter
Changed in gnome-shell (Ubuntu):
status: Confirmed → Invalid
Changed in mutter (Ubuntu):
status: Confirmed → Invalid
milestone: noble-updates → none
Changed in gnome-shell (Ubuntu):
milestone: noble-updates → none
summary: - gnome-shell crashed with signal 5: Settings schema
- 'org.gnome.mutter.wayland' does not contain a key named 'xwayland-allow-
- byte-swapped-clients'
+ Dist upgrades from Jammy to Noble crash [Oh no! Something has gone
+ wrong.]
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Workaround:

1. Ctrl+Alt+F3

2. Log in

3. Some combination of:

   sudo dpkg -i /var/cache/apt/archives/*.deb
   sudo apt --fix-broken install

4. Reboot

It should also be possible for someone to find a simpler workaround than that.

Nick Rosbrook (enr0n)
description: updated
Revision history for this message
Nick Rosbrook (enr0n) wrote :

This is not a problem for oracular, because no supported upgrade paths to oracular would have an old enough systemd to hit this.

Changed in systemd (Ubuntu Noble):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Nick Rosbrook (enr0n)
Changed in systemd (Ubuntu):
status: Triaged → Invalid
no longer affects: gnome-shell (Ubuntu)
no longer affects: gnome-shell (Ubuntu Noble)
Changed in mutter (Ubuntu Noble):
status: New → Invalid
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Daniel, or anyone else affected,

Accepted systemd into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/255.4-1ubuntu8.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Noble):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-noble
Revision history for this message
Nick Rosbrook (enr0n) wrote :

I have verified the fix using systemd 255.4-1ubuntu8.1 from noble-proposed. On a Jammy Desktop system, I simply ran an upgrade:

nr@clean-jammy-amd64:~$ wget http://archive.ubuntu.com/ubuntu/dists/noble-proposed/main/dist-upgrader-all/24.04.18/noble.tar.gz
--2024-05-22 11:45:16-- http://archive.ubuntu.com/ubuntu/dists/noble-proposed/main/dist-upgrader-all/24.04.18/noble.tar.gz
Resolving archive.ubuntu.com (archive.ubuntu.com)... 91.189.91.82, 185.125.190.36, 91.189.91.83, ...
Connecting to archive.ubuntu.com (archive.ubuntu.com)|91.189.91.82|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1274850 (1.2M) [application/x-gzip]
Saving to: \u2018noble.tar.gz\u2019

noble.tar.gz 100%[===================>] 1.21M --.-KB/s in 0.1s

2024-05-22 11:45:16 (9.20 MB/s) - \u2018noble.tar.gz\u2019 saved [1274850/1274850]

nr@clean-jammy-amd64:~$ tar xf noble.tar.gz
nr@clean-jammy-amd64:~$ sudo -E ./noble --frontend DistUpgradeViewNonInteractive
[ ... ]

The upgrade proceeded without issue. Namely, my session was not killed during the upgrade.

nr@clean-jammy-amd64:~$ apt policy systemd
systemd:
  Installed: 255.4-1ubuntu8.1
  Candidate: 255.4-1ubuntu8.1
  Version table:
 *** 255.4-1ubuntu8.1 500
        500 http://archive.ubuntu.com/ubuntu noble-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     255.4-1ubuntu8 500
        500 http://archive.ubuntu.com/ubuntu noble/main amd64 Packages

tags: added: verification-done-noble
removed: verification-needed-noble
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/255.4-1ubuntu8.1)
Download full text (4.9 KiB)

All autopkgtests for the newly accepted systemd (255.4-1ubuntu8.1) for noble have finished running.
The following regressions have been reported in tests triggered by the package:

389-ds-base/unknown (armhf, s390x)
aide/unknown (s390x)
amavisd-new/unknown (s390x)
apport/2.28.1-0ubuntu3 (arm64)
apport/unknown (s390x)
appstream/unknown (s390x)
apt/2.7.14build2 (armhf)
apt/unknown (s390x)
asterisk/unknown (s390x)
at-spi2-core/unknown (s390x)
ayatana-indicator-session/unknown (s390x)
bind9/unknown (amd64, armhf, i386, s390x)
bolt/unknown (amd64, s390x)
casper/unknown (s390x)
casync/unknown (s390x)
ceph/unknown (s390x)
clamav/unknown (amd64)
clevis/unknown (s390x)
cloudflare-ddns/unknown (s390x)
clutter-1.0/unknown (amd64)
cockpit/unknown (s390x)
collectd/unknown (s390x)
colord/unknown (s390x)
comitup/unknown (amd64, s390x)
conntrack-tools/unknown (amd64)
corosync/3.1.7-1ubuntu3 (arm64)
corosync/unknown (s390x)
corosync-qdevice/unknown (amd64, s390x)
coturn/unknown (armhf, s390x)
cron/unknown (ppc64el, s390x)
crun/unknown (amd64, s390x)
cryptsetup/2:2.7.0-1ubuntu4 (arm64)
csync2/unknown (amd64, s390x)
cups/unknown (amd64, s390x)
dbus/1.14.10-4ubuntu4 (i386)
dbus/unknown (amd64, armhf, s390x)
dbus-broker/unknown (amd64, s390x)
debos/unknown (amd64)
dhcpcd/unknown (armhf, s390x)
dlm/unknown (amd64)
dovecot/unknown (amd64)
dpdk/23.11-1build3 (amd64)
dq/20240101-1 (amd64)
exim4/unknown (amd64, armhf)
expeyes/unknown (amd64)
fcgiwrap/unknown (amd64)
fluidsynth/unknown (amd64, i386)
freedom-maker/unknown (amd64, armhf, i386)
freedombox/unknown (amd64)
freeradius/unknown (amd64)
fwupd/unknown (amd64)
gamemode/unknown (i386)
gdm3/unknown (amd64)
golang-github-coreos-go-systemd/unknown (amd64)
gpsd/unknown (amd64)
gvfs/unknown (s390x)
haproxy/unknown (amd64)
hddemux/unknown (amd64)
hwloc/unknown (amd64, s390x)
incus/unknown (amd64)
init-system-helpers/unknown (amd64)
initramfs-tools/unknown (amd64)
interception-tools/unknown (amd64)
janus/unknown (amd64)
keyman/unknown (amd64)
knot/unknown (amd64)
knot-resolver/unknown (amd64)
libcamera/0.2.0-3fakesync1build6 (amd64, armhf)
libei/1.2.1-1 (amd64)
libinput/unknown (amd64)
liblinux-systemd-perl/unknown (amd64)
libreswan/unknown (amd64)
libsdl2/unknown (amd64)
libsfml/2.6.1+dfsg-2build2 (armhf)
libsfml/unknown (amd64)
libsoup2.4/2.74.3-6ubuntu1 (ppc64el)
libsoup2.4/unknown (amd64)
libsoup3/unknown (amd64)
libusbauth-configparser/unknown (amd64)
libvirt/unknown (amd64)
lighttpd/unknown (amd64)
linux-ibm/unknown (amd64)
logiops/unknown (amd64)
logrotate/unknown (amd64)
mariadb/unknown (amd64)
mediawiki/unknown (amd64)
mir/unknown (amd64)
mkosi/unknown (amd64)
monitoring-plugins-systemd/unknown (amd64)
mosquitto/unknown (amd64)
multipath-tools/unknown (amd64)
munin/unknown (amd64)
mutter/unknown (amd64)
nagios-tang/unknown (amd64, s390x)
ndctl/unknown (armhf, s390x)
network-manager/unknown (amd64, s390x)
nextepc/unknown (amd64)
nix/unknown (amd64, armhf)
nut/unknown (amd64, armhf, s390x)
open-build-service/unknown (amd64)
openssh/unknown (amd64)
openvpn/unknown (amd64)
openzwave/unknown (amd64)
ovn/unknown (amd64)
pam/unknown (amd64, i386)
pdns-recursor/unknown (amd64)
pgagroal/unknown (amd64)
pgbouncer/unknown ...

Read more...

Revision history for this message
Nick Rosbrook (enr0n) wrote :

The autopkgtest regressions were all resolved with retries.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.