Crash and failure installing focal

Bug #1862846 reported by Alberto Donato on 2020-02-11
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
subiquity
Undecided
Unassigned
curtin (Ubuntu)
High
Ryan Harper
Eoan
Undecided
Unassigned
Focal
High
Ryan Harper
util-linux (Debian)
Fix Released
Unknown
util-linux (Ubuntu)
Medium
Mauricio Faria de Oliveira
Eoan
Medium
Mauricio Faria de Oliveira
Focal
Medium
Mauricio Faria de Oliveira

Bug Description

[Impact]

 * lsblk no longer prints a partition's parent
   kernel device name (the wholedisk).
   (i.e., 'lsblk -no PKNAME /dev/partition')

 * Another impact is the 'removable media' check
   always return zero for partitions.
   (i.e., 'lsblk -no RM /dev/partition')

 * The regression was introduced on v2.34, only
   Eoan (v2.34) and later are affected.
   Disco (v2.33) and earlier are not affected.

 * The regression is fixed in v2.35, in commit
   e3bb9bfb76c1 ("lsblk: force to print PKNAME
   for partition"); fixes RM for partition too.

[Test Case]

 * $ lsblk -no PKNAME /dev/vda1 # partition

 * Expected output: vda # wholedisk

 * Current output: (nothing)

 * $ lsblk -no RM /dev/sdb1 # partition in removable disk

 * Expected output: 1 # removable media

 * Current output: 0 # not removable media

[Regression Potential]

 * Columns that depend on a partition device's
   parent device (i.e., seen as 'wholedisk')
   could in theory show incorrect values if
   another bug is present in v2.34 for that.

 * Other usages of 'parent' pointer in the
   function have been examined and reported
   (e.g. issue w/ removable media column),
   and others found to not have issues
   (e.g. --merge option, to group multiple
   parents of a device, as in RAID.)

[Other Info]

 * The impacts to the curtin source package
   have been addressed in other way, it no
   longer requires util-linux, comment #14.

 * util-linux github issue:
   https://github.com/karelzak/util-linux/issues/813

[Original Bug Description]

During an install of the daily live image for 20.04 Ubuntu Server, the installer first crashed and restarted itself, then failed to install the system.

Attached are the logs left on the install USB key.

Related branches

Alberto Donato (ack) wrote :
  • logs Edit (404.8 KiB, application/x-tar)
tags: added: champagne focal
Ryan Harper (raharper) wrote :

Hrm, this is a strange install.

The storage config has some strange settings..., first nothing is modified at all, all disks and partitions are marked preserve = true, as well as all filesystems. There is this strange mount:

   {
    "device": "format-0",
    "id": "mount-0",
    "path": "",
    "type": "mount"
   },

Heres the rootfs

   {
    "device": "format-partition-sda3",
    "id": "mount-2",
    "path": "/",
    "type": "mount"
   },

And EFI

   {
    "device": "format-partition-sda1",
    "id": "mount-1",
    "path": "/boot/efi",
    "type": "mount"
   }

The failure appears here:

        + [ -f /boot/efi/EFI/ubuntu/grubx64.efi ]
        + [ -z ]
        + [ -f /boot/efi/EFI/ubuntu/shimx64.efi ]
        + break
        + echo /EFI/ubuntu/shimx64.efi
        + sed s|/|\\|g
        + loader=\EFI\ubuntu\shimx64.efi
        + efibootmgr --create --write-signature --label ubuntu --disk /dev/ --part 1 --loader \EFI\ubuntu\shimx64.efi
        efibootmgr: ** Warning ** : Boot0000 has same label ubuntu
        Could not prepare Boot variable: Success
        failed to install grub!

There's a but in the install-grub helper in how it determines the disk device;

Ryan Harper (raharper) wrote :

@Lee

The efi_dev parsing code from the centos8 branch isn't happy:

Command: ['sh', '-c', 'exec "$0" "$@" 2>&1', 'install-grub', '--uefi', '--update-nvram', '--os-family=debian', '/target', '/dev/sda1']
Exit code: 1
Reason: -
Stdout: carryover command line params ''
        setting GRUB_CMDLINE_LINUX_DEFAULT to '' in etc/default/grub
        updated /target/etc/default/grub to set GRUB_CMDLINE_LINUX_DEFAULT=""
        curtin uefi: installing grub-efi-amd64 to: /boot/efi
        + echo before grub-install efiboot settings
        before grub-install efiboot settings
        + efibootmgr -v
        BootCurrent: 0007
        Timeout: 1 seconds
        BootOrder: 0005,0006,0007
        Boot0005* UEFI: IP4 Realtek PCIe GBE Family Controller PciRoot(0x0)/Pci(0x1c,0x4)/Pci(0x0,0x0)/MAC(902b346ca369,0)/IPv4(0.0.0.00.0.0.0,0,0)AMBO
        Boot0006* UEFI: IP6 Realtek PCIe GBE Family Controller PciRoot(0x0)/Pci(0x1c,0x4)/Pci(0x0,0x0)/MAC(902b346ca369,0)/IPv6([::]:<->[::]:,0,0)AMBO
        Boot0007* UEFI: SanDisk U3 Cruzer Micro 8.02 HD(1,MBR,0x7f4edb39,0x2d40,0x1f00)/File(\EFI\BOOT\BOOTX64.EFI)AMBO
        + bootid=ubuntu
        + efi_disk=/dev/
        + efi_part_num=1
        + grubpost=
        + grubcmd=grub-install
        + dpkg-reconfigure grub-efi-amd64

Ryan Harper (raharper) wrote :

We only see this failure on shim/secure-boot enabled setups.

Ryan Harper (raharper) on 2020-02-12
Changed in curtin (Ubuntu):
importance: Undecided → High
status: New → Triaged
Ryan Harper (raharper) wrote :

OK, I think I've found the bug; there are two issues;

if [ "${#grubdevs_new[@]}" -eq 1 ] && [ -f "${grubdevs_new[0]}" ]; then
    # Currently UEFI can only be pointed to one system partition. If
    # for some reason multiple install locations are given only use the
    # first.
    efi_dev="${grubdevs_new[0]}"
elif [ "${#grubdevs_new[@]}" -gt 1 ]; then
    error "Only one grub device supported on UEFI!"
    exit 1
else
    # If no storage configuration was given try to determine the system
    # partition.
    efi_dev=$(awk -v "MP=${mp}/boot/efi" '$2 == MP { print $1 }' /proc/mounts)
fi

The [ -f "${grubdevs_new[0]}" ] is checking if the target disk/partition is a file.
This fails as they are block devices; the check should be [ -b "${grubdevs_new[0]}" ].

Because this fails, we fall into the else clause, which is able to figure out from
/proc/mounts that the efi_dev is /dev/sda1.

Now, further down when we convert the efi_dev into the disk and partition we run this code

        # The partition number of block device name need to be determined here
        # so both getting the UEFI device from Curtin config and discovering it
        # work.
        efi_part_num=$(cat /sys/class/block/$(basename $efi_dev)/partition)
        efi_disk="/dev/$(lsblk -no pkname $efi_dev)"

lsblk -no pkname $efi_dev

returns an empty string; that's because 'pkname' is an unknown column in lsblk,
rather the value should be 'kname'. This error results in efi_disk being set to "/dev/"

This isn't found on non-shim based installs as efi_disk variable is not used unless we are creating our own efibootmgr entry.

Lee Trager (ltrager) wrote :

I agree a -b or -e should be used instead of -f. However pkname is a valid column in lsblk. From lsblk --help:

KNAME internal kernel device name
PKNAME internal parent kernel device name

pkname not working is a regression which was introduced in util-linux-2.34[1]. Upstream has fixed this[2].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1751290
[2] https://github.com/karelzak/util-linux/commit/e3bb9bfb76c17b1d05814436ced62c05c4011f48

Lee Trager (ltrager) wrote :

Upstream util-linux has fixed this in 2.34.1+

https://github.com/karelzak/util-linux/issues/813

Alberto Donato (ack) wrote :

@Ryan FWIW the reason the install doesn't mark anything as modified is that I was trying to keep the btrfs root partition (/dev/sda3) and just install there.

I thought the installer would work similarly to what the desktop installer does with btrfs, where it creates subvolumes for / and /home in the root partition (as @ and @home).

On my desktop, when I upgrade I just move @ out of the way (by renaming it) and the installer creates a new one.
This is kinda nice as you can easily keep/revert to the old rootfs by just changing which subvol is mounted at boot.

Is there any way to do that with subiquity?

Ryan Harper (raharper) on 2020-02-12
tags: added: rls-ff-incoming
Changed in curtin (Ubuntu Eoan):
status: New → Invalid
Ryan Harper (raharper) wrote :

@Lee Thanks for tracking down the util-linux bug.

Since this is broken in 2.34 (eoan/focal); I'm thinking we should use sysfs to find the parent via device name walking;

Given a kname (nvme0n1p1) of the target partition

# look up sysfs path from kname
% realpath /sys/class/block/nvme0n1p1
/sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0/nvme/nvme0/nvme0n1/nvme0n1p1

# check if it's a partition
% ls -al /sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0/nvme/nvme0/nvme0n1/nvme0n1p1/partition
-r--r--r-- 1 root root 4096 Feb 12 08:08
/sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0/nvme/nvme0/nvme0n1/nvme0n1p1/partition

# extract parent device path
% dirname /sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
/sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0/nvme/nvme0/nvme0n1

# extract parent device major/minor
% cat /sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0/nvme/nvme0/nvme0n1/dev
259:0

# udev symlinks/dev/block/$MAJOR:$MINOR
% ls -al /dev/block/259:0
lrwxrwxrwx 1 root root 10 Jan 18 00:16 /dev/block/259:0 -> ../nvme0n1
% realpath /dev/block/259:0
/dev/nvme0n1

Ryan Harper (raharper) wrote :

@Alberto

> I thought the installer would work similarly to what the desktop installer
> does with btrfs, where it creates subvolumes for / and /home in the root
> partition (as @ and @home).
>
> On my desktop, when I upgrade I just move @ out of the way (by renaming it)
> and the installer creates a new one.
> This is kinda nice as you can easily keep/revert to the old rootfs by
> just changing which subvol is mounted at boot.

Subiquity/curtin does not have support for btrfs subvolumes.

Changed in util-linux (Debian):
status: Unknown → New
Ryan Harper (raharper) on 2020-02-13
Changed in curtin (Ubuntu Focal):
assignee: nobody → Ryan Harper (raharper)
status: Triaged → In Progress
tags: removed: rls-ff-incoming

This bug is fixed with commit 82f23e3d to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=82f23e3d

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 19.3-26-g82f23e3d-0ubuntu1

---------------
curtin (19.3-26-g82f23e3d-0ubuntu1) focal; urgency=medium

  * New upstream snapshot.
    - install-grub: refactor uefi partition/disk searching (LP: #1862846)
    - doc: update Canonical contributors URL [Paul Tobias]
    - block-discover: detect additional "extended" partition types in MBR
      (LP: #1861251)
    - vmtests: skip focal bcache tests due to kernel bug
    - net/deps.py: detect openvswitch cfg and install openvswitch packages
    - vmtest: collection of vmtest related fixes to make things triple green
    - clear-holders: umap the parent mpath to wipe the underlying partitions
    - vmtests: bump fixby date out and fix false positive when date passes
      (LP: #1855148)
    - vmtests: drop disco tests using a tool to automate the process

 -- Ryan Harper <email address hidden> Thu, 13 Feb 2020 21:08:59 -0600

Changed in curtin (Ubuntu Focal):
status: In Progress → Fix Released

@ltrager
I'd be happy to handle the patch for util-linux on Ubuntu E/F if that helps; just let me know.

Lee Trager (ltrager) wrote :

Ryan has updated Curtin to no longer require that util-linux feature. I do think it would be good to carry that patch in Ubuntu as it other users will be effected by that regression.

@ltrager, Indeed, I see the refactor in curtin. :)

Absolutely agree w/ you, it's a regression in util-linux,
and after following up with you on IRC yesterday that it
is OK for me to submit the fix to Ubuntu, I worked on it.

I've tested the patch today, and it's currently building
on all architectures in a test PPA. If all goes well, it
should move forward to Focal and Eoan in the coming days.

Providing test steps in the next comment.
Attaching the debdiffs for reference.

cheers,
Mauricio

util-linux / test steps
===

Bionic
---

No regression: parent kernel device name

$ dpkg -s util-linux | grep ^Version:
Version: 2.31.1-0.4ubuntu3.5

$ lsblk -no pkname /dev/vda1
vda

Eoan
---

Before: empty string

$ dpkg -s util-linux | grep ^Version:
Version: 2.34-0.1ubuntu2.2

$ lsblk -no pkname /dev/vda1

$

After: parent kernel device name

$ dpkg -s util-linux | grep ^Version:
Version: 2.34-0.1ubuntu2.2+pkname1

$ lsblk -no pkname /dev/vda1
vda

Focal
---

Before: empty string

$ dpkg -s util-linux | grep ^Version:
Version: 2.34-0.1ubuntu6

$ lsblk -no pkname /dev/vda1

$

After: parent kernel device name

$ dpkg -s util-linux | grep ^Version:
Version: 2.34-0.1ubuntu6+pkname1

$ lsblk -no pkname /dev/vda1
vda

Changed in util-linux (Ubuntu Eoan):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
Changed in util-linux (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Mauricio Faria de Oliveira (mfo)
tags: added: sts-sponsor-mfo
tags: added: patch
tags: removed: champagne
Eric Desrochers (slashd) wrote :

util-linux uploaded in focal.

Thanks Mauricio !

@slashd, thanks for uploading util-linux to Focal.
I've added the SRU template and uploaded to Eoan.

description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package util-linux - 2.34-0.1ubuntu7

---------------
util-linux (2.34-0.1ubuntu7) focal; urgency=medium

  * d/p/lsblk-force-to-print-PKNAME-for-partition.patch: fix regression
    that lsblk doesn't print PKNAME column for partitions (LP: #1862846)

 -- Mauricio Faria de Oliveira <email address hidden> Thu, 20 Feb 2020 11:09:29 -0300

Changed in util-linux (Ubuntu Focal):
status: In Progress → Fix Released
Łukasz Zemczak (sil2100) wrote :

Hey! The package looks good so I'll accept it. In the impact you have mentioned that currently lsblk -no RM also doesn't work correctly, so maybe we should add it to the test case?

Changed in util-linux (Ubuntu Eoan):
status: In Progress → Fix Committed

Hello Alberto, or anyone else affected,

Accepted util-linux into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/util-linux/2.34-0.1ubuntu2.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Hi Lukasz, thanks! Sure thing, I'll add it.

$ lsb_release -cs
eoan

Test device: partition (sdb1) in removable / USB flash disk (sdb)

$ ls -1d /sys/block/sdb/sdb1
/sys/block/sdb/sdb1

$ cat /sys/block/sdb/removable
1

eoan-proposed:
---

- PKNAME shows partition's parent/wholedisk.
- RM shows 1 for removable disk's partition.

$ dpkg -s util-linux | grep ^Version:
Version: 2.34-0.1ubuntu2.3

$ lsblk -no PKNAME /dev/sdb1
sdb

$ lsblk -no RM /dev/sdb1
 1

eoan-updates:
---

- PKNAME shows nothing.
- RM shows 0 despite it's actually a removable disk's partition.

$ dpkg -s util-linux | grep ^Version:
Version: 2.34-0.1ubuntu2.2

$ lsblk -no PKNAME /dev/sdb1

$

$ lsblk -no RM /dev/sdb1
 0

description: updated
tags: added: verification-done-eoan
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package util-linux - 2.34-0.1ubuntu2.3

---------------
util-linux (2.34-0.1ubuntu2.3) eoan; urgency=medium

  * d/p/lsblk-force-to-print-PKNAME-for-partition.patch: fix regression
    that lsblk doesn't print PKNAME column for partitions (LP: #1862846)

 -- Mauricio Faria de Oliveira <email address hidden> Thu, 20 Feb 2020 11:13:53 -0300

Changed in util-linux (Ubuntu Eoan):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for util-linux has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in util-linux (Debian):
status: New → Confirmed
Changed in util-linux (Debian):
status: Confirmed → Fix Released
Changed in subiquity:
status: New → Fix Released
tags: removed: sts-sponsor-mfo
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.