System fails to reboot from live session or ubiquity-dm - squashfs_read_data failed to read block

Bug #1840122 reported by Jean-Baptiste Lallement
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
subiquity
Invalid
Undecided
Unassigned
casper (Ubuntu)
New
Critical
Dimitri John Ledkov
Bionic
Fix Released
Critical
Unassigned
Eoan
Fix Released
Critical
Dimitri John Ledkov
Focal
Fix Released
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Unassigned
Impish
Won't Fix
Undecided
Unassigned
Jammy
New
Critical
Dimitri John Ledkov
finalrd (Ubuntu)
Fix Released
Undecided
Paride Legovini
Bionic
New
Undecided
Unassigned
Eoan
Won't Fix
Undecided
Unassigned
Focal
New
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Unassigned
Impish
Won't Fix
Undecided
Unassigned
Jammy
Fix Released
Undecided
Paride Legovini
linux (Ubuntu)
Confirmed
High
Unassigned
Bionic
Confirmed
High
Unassigned
Eoan
Won't Fix
High
Unassigned
Focal
New
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Unassigned
Impish
Won't Fix
Undecided
Unassigned
Jammy
Confirmed
High
Unassigned

Bug Description

Last known good image: Eoan Ubuntu Desktop 20190715

Similar results started to happen with new linux-hwe kernel based on eoan for 18.04.4 release.

Test Case:
1. Boot eoan desktop to a live session
2. Wait a couple of minutes until snapd settles
3. Reboot the system from the system menu or from the command line

Expected result:
The system reboots

Actual result:
The systems fails to reboot or shutdown and displays some errors about failing to unmount /cdrom and squashfs errors in a loop.

Unmounting /cdrom...
[FAILED] Failed unmounting /cdrom.
[ OK ] Started Shuts down the "li…" preinstalled system cleanly.
[ OK ] Reached target Final Step.
[ OK ] Started Reboot.
[ OK ] Reached target Reboot.
[ 115.744188] print_req_error: I/O error, dev sr0, sector 1508872 flags 80700
[ 115.768139] print_req_error: I/O error, dev sr0, sector 1508872 flags 0
[ 115.771469] print_req_error: I/O error, dev loop0, sector 1501550 flags 0
[ 115.775824] SQUASHFS error: squashfs_read_data failed to read block 0x2dd2d998

This also causes daily tests to fail and is reproducible in a VM and bare metal booted in legacy bios mode

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: casper 1.414
ProcVersionSignature: Ubuntu 5.2.0-10.11-generic 5.2.4
Uname: Linux 5.2.0-10-generic x86_64
ApportVersion: 2.20.11-0ubuntu7
Architecture: amd64
CasperVersion: 1.414
CurrentDesktop: ubuntu:GNOME
Date: Wed Aug 14 08:31:30 2019
LiveMediaBuild: Ubuntu 19.10 "Eoan Ermine" - Alpha amd64 (20190813)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: casper
UpgradeStatus: No upgrade log present (probably fresh install)
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu7
Architecture: amd64
CasperVersion: 1.414
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 19.10
LiveMediaBuild: Ubuntu 19.10 "Eoan Ermine" - Alpha amd64 (20190814)
Package: linux
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.2.0-10.11-generic 5.2.4
Tags: eoan
Uname: Linux 5.2.0-10-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1788 F.... pulseaudio
CasperVersion: 1.414
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 19.10
IwConfig:
 lo no wireless extensions.

 ens3 no wireless extensions.
LiveMediaBuild: Ubuntu 19.10 "Eoan Ermine" - Alpha amd64 (20190814)
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd QEMU USB Tablet
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: file=/cdrom/preseed/username.seed initrd=/casper/initrd --- keyboard-configuration/layoutcode=fr keyboard-configuration/variantcode=oss
ProcVersionSignature: Ubuntu 5.2.0-10.11-generic 5.2.4
RelatedPackageVersions:
 linux-restricted-modules-5.2.0-10-generic N/A
 linux-backports-modules-5.2.0-10-generic N/A
 linux-firmware 1.181
RfKill:

Tags: eoan
Uname: Linux 5.2.0-10-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.12.0-1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-disco
dmi.modalias: dmi:bvnSeaBIOS:bvr1.12.0-1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-disco:cvnQEMU:ct1:cvrpc-i440fx-disco:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-disco
dmi.sys.vendor: QEMU
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1788 F.... pulseaudio
CasperVersion: 1.414
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 19.10
IwConfig:
 lo no wireless extensions.

 ens3 no wireless extensions.
LiveMediaBuild: Ubuntu 19.10 "Eoan Ermine" - Alpha amd64 (20190814)
Lsusb:
 Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd QEMU USB Tablet
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
Package: linux-image-5.2.0-10-generic 5.2.0-10.11
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: file=/cdrom/preseed/username.seed initrd=/casper/initrd --- keyboard-configuration/layoutcode=fr keyboard-configuration/variantcode=oss
ProcVersionSignature: Ubuntu 5.2.0-10.11-generic 5.2.4
RelatedPackageVersions:
 linux-restricted-modules-5.2.0-10-generic N/A
 linux-backports-modules-5.2.0-10-generic N/A
 linux-firmware 1.181
RfKill:

Tags: eoan
Uname: Linux 5.2.0-10-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: 1.12.0-1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-disco
dmi.modalias: dmi:bvnSeaBIOS:bvr1.12.0-1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-disco:cvnQEMU:ct1:cvrpc-i440fx-disco:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-disco
dmi.sys.vendor: QEMU

Related branches

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Changed in casper (Ubuntu):
importance: Undecided → High
status: New → Confirmed
description: updated
tags: added: rls-ee-incoming
description: updated
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Diff between the manifests of 20190715 and 20190716.

There are 2 suspects from this list:
casper 1.413 (from 1.4.11)
linux-image 5.2.0-8.9 (from 5.0.0-20.21

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1840122

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : Dependencies.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : AlsaInfo.txt

apport information

description: updated
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : CRDA.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : CurrentDmesg.txt

apport information

description: updated
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : AlsaInfo.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : CRDA.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : Dependencies.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : Lspci.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : ProcModules.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : PulseList.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : UdevDb.txt

apport information

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
summary: - System fails to reboot from live session or ubiquity-dm
+ System fails to reboot from live session or ubiquity-dm -
+ squashfs_read_data failed to read block
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I had a look at this and got pretty confused! Feels more like a kernel problem than a casper one, somehow.

tags: added: id-5d557981385f317578944153
description: updated
tags: removed: rls-ee-incoming
Norbert (nrbrtx)
tags: added: iso-testing
Revision history for this message
Richard Vajdel (richardv22) wrote :

I tested it with the current iso (2019-09-17).

1. Booted into live session
2. Waited for 30 minutes
3. Restarted via System menu
4. Restart of the system was successful without any error

Can anybody test it and verify it if it works for them too?

Thanks

Revision history for this message
Chris Guiver (guiverc) wrote :

Booted daily Ubuntu x86_64 19.10 (2019-09-17) on
hp 8200 elite sff (i5-2400, 8gb, nvidia quadro 600)

Booted live, waited 5 mins
Suspend system, waited 2 mins (was looking at wrong test-steps screen, noting here as 2 mins had passed before I noticed my mistake)
resumed system, waited 2 mins
Clicked restart top-right of display

No issue, system restarted.

I did it again, shorter (untimed) wait times without suspending on same box (it was much much slower to restart this time, but did restart)

Revision history for this message
Jane Atkinson (irihapeti) wrote :

I'm having this problem when installing in QEMU/KVM vms. If I disconnect the ISO as per install instructions, the system hangs, with the errors shown in the OP if I choose another tty. Leaving the ISO attached allows the system to reboot but it boots into the ISO.

Revision history for this message
Brian Murray (brian-murray) wrote :

While recreating this with an image from today I noticed that before seeing the squashfs errors, after pressing an error key in the plymouth screen, there were messages regarding "Failed unmounting /cdrom". I've attached a screenshot showing the messages.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

subiquity image has similar, but it manages to reboot fine (attaching subiquity screenshots)

squashfs errors are still bad though.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Changed in casper (Ubuntu Eoan):
importance: High → Critical
assignee: nobody → Dimitri John Ledkov (xnox)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package casper - 1.421

---------------
casper (1.421) eoan; urgency=medium

  * Drop empty directory (doesn't track well in git)
  * Use debian/casper.dirs to install conf.d

casper (1.420) eoan; urgency=medium

  * Add dependency on finalrd for reliable live-session shutdown. Casper
    setup live systems do have layering violations (/ actually depends on
    /cdrom) and it appears that (although masked) the system manages to
    rip it out, resulting in filesystem errors preventing completing the
    shutdown. That's where pivot to finalrd comes in, to save the day and
    blast the running system into oblivion and complete the shutdown. LP:
    #1840122

 -- Dimitri John Ledkov <email address hidden> Tue, 01 Oct 2019 13:05:51 +0100

Changed in casper (Ubuntu Eoan):
status: Confirmed → Fix Released
Revision history for this message
Jane Atkinson (irihapeti) wrote :

Install into QEMU/KVM vm with Ubuntu desktop 20191001.2 completed successfully.

Changed in linux (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → High
Changed in casper (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → Critical
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Jean-Baptiste, or anyone else affected,

Accepted casper into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/casper/1.394.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To properly test it you will need to obtain and boot a daily build of a Live CD for bionic. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in casper (Ubuntu Bionic):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I verified on Ubuntu Desktop 20200128, that casper from proposed pulls finalrd and that it successfully fixes the issue on both a VM and baremetal.

Marking as verification-done.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package casper - 1.394.3

---------------
casper (1.394.3) bionic; urgency=medium

  * Cherrypick from eoan: Add dependency on finalrd for reliable
    live-session shutdown. Casper setup live systems do have layering
    violations (/ actually depends on /cdrom) and it appears that
    (although masked) the system manages to rip it out, resulting in
    filesystem errors preventing completing the shutdown. That's where
    pivot to finalrd comes in, to save the day and blast the running
    system into oblivion and complete the shutdown. LP: #1840122

 -- Dimitri John Ledkov <email address hidden> Tue, 28 Jan 2020 11:09:52 +0000

Changed in casper (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for casper has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Brian Murray (brian-murray) wrote :

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in linux (Ubuntu Eoan):
status: Confirmed → Won't Fix
Norbert (nrbrtx)
tags: removed: eoan
Revision history for this message
Paride Legovini (paride) wrote (last edit ):
Download full text (7.8 KiB)

I am occasionally seeing this again with the Impish live-server (subiquity) daily ISO images. I saw this happening only on ppc64el and on arm64, and only on preseeded installs (via answers.yaml).

This doesn't prevent the installation from succeeding: what fails is just the post-install reboot. Hard-rebooting the system results in the installed system booting correctly.

This is not easy to reproduce, I went with something on these lines:

for i in {1..100}; do
    echo "Round $i"
    virt-install --os-variant ubuntu18.04 --name paride-impish-preseed --memory 2048 --disk image.qcow2 --disk answers.img --cdrom impish-live-server-ppc64el.iso --noautoconsole --wait -1 || break;
    sleep 5;
    virsh destroy paride-impish-preseed || break;
    virsh undefine paride-impish-preseed || break;
done

When the problem happens virt-install will never exit as the installer system gets caught in the "SQUASHFS error: Failed to read block" error loop. I'm attaching the answers.yaml baked in answers.img (see LP: #1946398 on how to create the answers image file).

Note: that answers.yaml preseeds the installation of a snap. I have the impression that it increases the probability of hitting the failure.

=== kernel log excerpt on failure ===

[ 78.374287] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 79.296032] overlayfs: "xino" feature enabled using 32 upper inode bits.
[ 134.464075] overlayfs: "xino" feature enabled using 2 upper inode bits.
[ 355.493023] SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
[ 355.513889] JFS: nTxBlock = 459, nTxLock = 3673
[ 355.536975] ntfs: driver 2.1.32 [Flags: R/O MODULE].
[ 355.564466] QNX4 filesystem 0.2.3 registered.
[ 355.648190] Btrfs loaded, crc32c=crc32c-vpmsum, zoned=yes
[ 372.037306] VFS: busy inodes on changed media sr0
[ 372.039566] sr 0:0:0:2: [sr0] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 372.039572] sr 0:0:0:2: [sr0] tag#19 Sense Key : Not Ready [current]
[ 372.039575] sr 0:0:0:2: [sr0] tag#19 Add. Sense: Medium not present
[ 372.039578] sr 0:0:0:2: [sr0] tag#19 CDB: Read(10) 28 00 00 00 a6 35 00 00 20 00
[ 372.039580] blk_update_request: I/O error, dev sr0, sector 170196 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 372.039633] sr 0:0:0:2: [sr0] tag#20 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ 372.039636] sr 0:0:0:2: [sr0] tag#20 Sense Key : Not Ready [current]
[ 372.039638] sr 0:0:0:2: [sr0] tag#20 Add. Sense: Medium not present
[ 372.039641] sr 0:0:0:2: [sr0] tag#20 CDB: Read(10) 28 00 00 00 a6 35 00 00 20 00
[ 372.039642] blk_update_request: I/O error, dev sr0, sector 170196 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 372.039651] blk_update_request: I/O error, dev loop0, sector 36870 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
[ 372.051333] SQUASHFS error: Failed to read block 0x1200dd0: -5
[ 372.051337] SQUASHFS error: Unable to read data cache entry [1200dd0]
[ 372.051339] SQUASHFS error: Unable to read page, block 1200dd0, size 8380
[ 372.051461] sr 0:0:0:2: [sr0] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE c...

Read more...

Changed in casper (Ubuntu Focal):
status: New → Fix Released
Changed in casper (Ubuntu):
status: Fix Released → New
Revision history for this message
Paride Legovini (paride) wrote :

Attachment: full dmesg.

Revision history for this message
Paride Legovini (paride) wrote :

I have more evidence that the issue is indeed the same.

The original bug was fixed by making casper depend on finalrd, see the explanatory casper 1.420 d/changelog entry reported in comment 33. However it appears that the fix is racey: sometimes finalrd is not triggered in time at shutdown/reboot, and the system gets stuck with the errors that pivoting to /run/initramfs would prevent.

I verified this by jumping in a shell during a subiquity install and running

# systemctl stop finalrd

and logging out. After doing this the reboot doesn't fail anymore. (Note that finalrd.service has

ExecStart=/bin/true
ExecStop=/usr/bin/finalrd

so it actually does its /run/initramfs overlay mount at *stop* (= at shutdown), and not at start.) Apparently this doesn't happen early or fast enough to save the day when rebooting the live session.

I guess we need to enforce some ordering in the shutdown targets execution.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

we should check ordering of services for stop in a booted live session (desktop / server / next-installer) and then figure out if we can add additional dependencies to finalr.service (after) such that its stop is ordered before everything else is stopped.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Our installer could stop finalrd before issuing shutdown too, as a workaround.

Revision history for this message
Paride Legovini (paride) wrote :

Added a subiquity task as this could be worked around there, while I think the kernel tasks could be set to Invalid or Wontfix...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I wonder if we need before= or after= umount.target

Revision history for this message
Paride Legovini (paride) wrote :

From [1]:

> if a unit is configured with After= on another unit,
> the former is stopped before the latter if both are
> shut down.

so maybe we need After=umount.target in finalrd.service.

[1] https://www.freedesktop.org/software/systemd/man/systemd.unit.html

Revision history for this message
Paride Legovini (paride) wrote :

Hmm but umount.target is special (as in systemd.special(7)), so that's not so obvious...

Revision history for this message
Paride Legovini (paride) wrote (last edit ):

As we have a Conflicts=umount.target, then

> It doesn't matter which of the two ordering dependencies is
> used, because stop jobs are always ordered before start jobs

literally as xnox said. I'm preparing a finalrd with Before=umount.target in a PPA [1].

[1] https://launchpad.net/~paride/+archive/ubuntu/lp1840122-finalrd-race

Paride Legovini (paride)
Changed in finalrd (Ubuntu Eoan):
status: New → Won't Fix
Revision history for this message
Paride Legovini (paride) wrote :

@brian-murray prepared a test Jammy arm64 ISO with finalrd from the PPA above (serial: 20211208.1). I chose arm64 as it's the architecture most heavily affected by this issue (at least in our test environment).

I performed several ISO test runs on the image and the success rate went from <50% to a solid 100%. I think it's enough to say that the fix works. I'll prepare an upload for Jammy, then we'll have to SRU the fix.

Changed in finalrd (Ubuntu):
assignee: nobody → Paride Legovini (paride)
status: New → Triaged
Revision history for this message
Paride Legovini (paride) wrote :

Actually there may be an even better solution:

   Before=shutdown.target

which lintian has been nagging us about:

  https://lintian.debian.org/tags/systemd-service-file-shutdown-problems

"There is race condition between stopping units and systemd getting a request to exit the main loop, so it may proceed with shutdown before all pending stop jobs have been processed."

Changed in finalrd (Ubuntu Jammy):
milestone: none → ubuntu-22.04-feature-freeze
Paride Legovini (paride)
tags: added: rls-jj-incoming
removed: amd64 apport-bug apport-collected id-5d557981385f317578944153 verification-done verification-done-bionic
Revision history for this message
Brian Murray (brian-murray) wrote :

I've uploaded this for Paride in the interest of getting it into the archive quickly (I assume he's in bed).

 $ dput finalrd_9_source.changes
Trying to upload package to ubuntu
Checking signature on .changes
gpg: /home/bdmurray/source-trees/finalrd/finalrd_9_source.changes: Valid signature from 1E918B66765B3E31
Checking signature on .dsc
gpg: /home/bdmurray/source-trees/finalrd/finalrd_9.dsc: Valid signature from 1E918B66765B3E31
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading finalrd_9.dsc: done.
  Uploading finalrd_9.tar.xz: done.
  Uploading finalrd_9_source.buildinfo: done.
  Uploading finalrd_9_source.changes: done.
Successfully uploaded packages.

Changed in finalrd (Ubuntu Jammy):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package finalrd - 9

---------------
finalrd (9) jammy; urgency=medium

  * finalrd.service: set Before=shutdown.target.
    Eliminate race between finalrd and shutdown.target.
    See also: lintian's systemd-service-file-shutdown-problems.
    Thanks to Dimitri John Ledkov (LP: #1840122)
  * d/control: bump dh compat level to 11 (via debhelper-compat)
    Compat level 11 ensures compatibility with Bionic.
    Changes:
     - d/rules: switch to dh_installsystemd
  * d/control: set Rules-Requires-Root: no
  * d/control: bump Standards-Version to 4.6.0 (no changes needed)

 -- Paride Legovini <email address hidden> Thu, 09 Dec 2021 11:55:29 +0100

Changed in finalrd (Ubuntu Jammy):
status: In Progress → Fix Released
Revision history for this message
Paride Legovini (paride) wrote :

I can confirm the Jammy arm64 images are rebooting fine.

Norbert (nrbrtx)
tags: added: jammy
tags: removed: rls-jj-incoming
Changed in finalrd (Ubuntu Focal):
milestone: none → ubuntu-20.04.4
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

What is the status of this for 20.04.4? Is this just a matter of backporting the finalrd change from 9 to impish and focal? Is there anything blocking this? Since I assume this is reproducible on focal as well, right?

Revision history for this message
Paride Legovini (paride) wrote (last edit ):

Hi, interestingly this is _not_ reproducible on Focal. I started seeing the failures in Impish. On paper the issue should be present on Focal, but due to its racey nature it's difficult tell exactly why it is not showing up.

I can quickly prepare a SRU upload with only the minimal Before= change, leaving out the packaging changes, but there won't be a real failure case to test against, since the failure is not reproducible on Focal. So I'm unsure on what's best here.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Thank you for the explanation. I think it's good to have this in mind for the future (when someone notices this being broken there as well), but in this case I'm taking it off from the .4 milestone.

Changed in finalrd (Ubuntu Focal):
milestone: ubuntu-20.04.4 → none
Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 21.10 (Impish Indri) has reached end of life, so this bug will not be fixed for that specific release.

Changed in casper (Ubuntu Impish):
status: New → Won't Fix
Changed in finalrd (Ubuntu Impish):
status: New → Won't Fix
Changed in linux (Ubuntu Impish):
status: New → Won't Fix
Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 21.04 (Hirsute Hippo) has reached end of life, so this bug will not be fixed for that specific release.

Changed in casper (Ubuntu Hirsute):
status: New → Won't Fix
Changed in finalrd (Ubuntu Hirsute):
status: New → Won't Fix
Changed in linux (Ubuntu Hirsute):
status: New → Won't Fix
Dan Bungert (dbungert)
Changed in subiquity:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.