Several blockprobe errors if trying to install the groovy daily live on LPAR

Bug #1893818 reported by Frank Heimes
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Canonical Foundations Team
curtin
Fix Released
Undecided
Unassigned
subiquity
Triaged
Undecided
Unassigned
livecd-rootfs (Ubuntu)
Fix Released
Undecided
Unassigned
multipath-tools (Ubuntu)
Invalid
High
Unassigned
probert (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

If trying to do a zFCP installation with groovy daily live I run into these blockprobe errors:

┌────────────────────────────────────────────────────────────────────────┐
│ │
│ Sorry, there was a problem examining the storage devices on this ▴ │
│ system. █ │
│ █ │
│ [ View full report ] █ │
│ █ │
│ If you want to help improve the installer, you can send an error █ │
│ report. █ │
│ █ │
│ [ Send to Canonical ] █ │
│ █ │
│ You can continue and the installer will just present the disks █ │
│ present in the system and not other block devices, or you may be █ │
│ able to fix the issue by switching to a shell and reconfiguring the │
│ system's block devices manually. │
│ ▾ │
│ │
└────────────────────────────────────────────────────────────────────────┘

2020-09-01 15:51:05,760 DEBUG subiquitycore.prober:35 Prober() init finished, data:None
 2020-09-01 15:51:05,761 DEBUG subiquitycore.core:97 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')
 2020-09-01 15:51:05,761 DEBUG asyncio:54 Using selector: EpollSelector
 2020-09-01 15:51:05,762 DEBUG subiquity.signals:50 connect_signal: network-proxy-set -> <function Subiquity.__init__.<locals>.<lambda> at 0x3ffa73e56a8>
 2020-09-01 15:51:05,762 DEBUG subiquity.signals:50 connect_signal: network-change -> Subiquity._network_change
 2020-09-01 15:51:05,762 DEBUG subiquitycore.core:643 Application.run
 2020-09-01 15:51:05,764 DEBUG curtin:89 Running command ['dpkg', '--print-architecture'] with allowed return codes [0] (capture=True)

I've attached a tgz with /var/crash and /var/log content.

Related branches

Revision history for this message
Frank Heimes (fheimes) wrote :
Changed in ubuntu-z-systems:
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Frank Heimes (fheimes) wrote :

Btw. after clicking 'Continue' I am able to complete the installation, despite all the files/crashes in /var/crash.

Changed in ubuntu-z-systems:
importance: Undecided → Medium
Revision history for this message
Ryan Harper (raharper) wrote :

For some reason, udev has not yet created partition node entries for
all of the member paths.

So, /dev/sda and /dev/sdc are two paths to the same lun, and when we dump
udev, we get:

/dev/sda
/dev/sdc

but no /dev/sda1 or /dev/sdc1;

Now, the partition exists, /dev/dm-1; however, there should also be
the /dev/sda1 device node.

I'm not sure why it's not there.

The current probe code assumes that we'll see both a /dev/sda1 as well as
the device-mapper disk for multipath partitions.

It'll take some to figure out if we can construct type:partitions entries
for multipath partitions if we're missing sda1 in the list of partitions
from udev.

Revision history for this message
Frank Heimes (fheimes) wrote :

Just a side not: this doesn't happen with 20.04.1 - only on 20.10 installation.

description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Even after upgrading to subiquity 20.09.1 (2027) I'm still running still into this situation.
The installer crashes, but it allows me to 'Continue' and I can complete the installation.
I've attached the /var/crash and /var/log content again (this time like they are with 20.09.1).

Changed in ubuntu-z-systems:
importance: Medium → High
Revision history for this message
Frank Heimes (fheimes) wrote :

'+1' comment #3 this happens with zFCP/multipath only (normal DASD installations are fine afaiks).

Revision history for this message
Frank Heimes (fheimes) wrote :

It's still the case with the latest QA tracker image: Version: 20200927

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1893818

tags: added: iso-testing
Paride Legovini (paride)
tags: added: rls-gg-incoming
Revision history for this message
Frank Heimes (fheimes) wrote :

I was asked to try on 20.04.1 with latest subiquity from edge and can confirm that it still work in this case (on 20.04.1):

Subiquity help menu: 20.07.1+git2.5de9df3e
root@ubuntu-server:/# snap list | grep subiquity
subiquity 20.07.1+git2.5de9df3e 1969 latest/stable/… canonical* classic
root@ubuntu-server:/# snap refresh --edge subiquity
subiquity (edge) 20.09.1+git84.399e711b from Canonical* refreshed
root@ubuntu-server:/# snap list | grep subiquity
subiquity 20.09.1+git84.399e711b 2082 latest/stable/… canonical* classic
root@ubuntu-server:/#
Subiquity help menu: 20.09.1+git84.399e711b

Revision history for this message
Frank Heimes (fheimes) wrote :

So I tried now a 20.04.1 installation but upgraded to multipath-tools (incl. dependencies) from groovy.
It's obviously a bit ugly to upgrade to multipath-tools from groovy in a focal install, since further packages are pulled in and services need to be restarted (which didn't successfully work on first try) etc. -- anyway, finally I was able to proceed with the installation and it looks like it failed in a similar way compared to a groovy installation:
" 2020-10-09 13:06:10,740 ERROR root:39 finish: subiquity/Filesystem/_probe/probe_once: FAIL: Invalid dep_id (partition-sda1) not in storage config"

For the steps I took see the attached file - and /var/log and /var/crash in the attached tgz in the next comment.
(Please notice that several ui crash files are in /var/crash - this is only because I restarted the installer - but the block-probe-fail crash files are there, too.)

Revision history for this message
Frank Heimes (fheimes) wrote :
Paride Legovini (paride)
Changed in multipath-tools (Ubuntu):
importance: Undecided → High
Revision history for this message
Paride Legovini (paride) wrote :

Subscribed ubuntu-server as now there is an indicator that this could be a regression in multipath-tools, and we need to discuss how to tackle it.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Some aspects of this don't make much sense to me. /dev/sda1 is not managed by udev, it's created by the kernel / devtmpfs. If it's not there on the filesystem something really wacky is going on. But it pretty clearly isn't in the udev database, which is almost as strange...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in multipath-tools (Ubuntu):
status: New → Confirmed
Changed in subiquity:
status: New → Triaged
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Confirmed
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

OK, I was wrong /dev/sda1 (and similar) is really not present. If I run "partprobe /dev/sd*" things start behaving but that should in no way be needed...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

OK some more clues: if i deactivate and reactivate the zdevs, the nodes for the partitions do not appear. If I remove multipath-tools and deactivate and reactivate the zdevs, the nodes for the partitions *do* appear. If I install multipath-tools and remove /lib/udev/rules.d/60-multipath.rules and deactivate and reactivate the zdevs, the nodes for the partitions still appear.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

60-multipath.rules is the same between focal and groovy though. Good night for now!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Aaah so it turns out there has always been a udev rule that should have removed the /dev/sda1 etc nodes for a drive that is multipathed, but thanks to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=959727 it didn't work. So I guess probert / curtin need updates to cope with this new reality but given how close the release is perhaps we just break the udev rule again for compatibility and fix this early next cycle?

Paride Legovini (paride)
tags: added: server-next
Revision history for this message
Paride Legovini (paride) wrote :

Good catch! Given where we are in the cycle I'm definitely for disabling the rule again in Groovy and fix this properly early in the HH cycle.

Paride Legovini (paride)
tags: removed: server-next
Changed in livecd-rootfs (Ubuntu):
milestone: none → ubuntu-20.10
Revision history for this message
Paride Legovini (paride) wrote :

Per IRC discussion, the plan is to workaround the issue in Groovy by disabling the udev rule in the live installer environment *only*. This should minimize the overall risk of regressions.

This will require a change in the image build scripts.

tags: added: fr-732
Changed in curtin:
status: New → Confirmed
Changed in multipath-tools (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.691

---------------
livecd-rootfs (2.691) groovy; urgency=medium

  * Remove 68-del-part-nodes.rules from installer squashfs to work around it
    breaking curtin. (LP: #1893818)

 -- Michael Hudson-Doyle <email address hidden> Thu, 15 Oct 2020 08:56:11 +1300

Changed in livecd-rootfs (Ubuntu):
status: New → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

I can confirm that it's fixed with image:
http://cdimage.ubuntu.com/ubuntu-server/daily-live/20201014.1/groovy-live-server-s390x.iso
I did two installations - on z/VM and on LPAR (both of course zFCP) - and the blockprobe issues no longer occurred.

Changed in curtin:
status: Confirmed → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Confirmed → In Progress
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit e113b055 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=e113b055

Changed in curtin:
status: In Progress → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As discussed with fheimes no extra task on probert needed.
But maybe the status of curtin "fix committed" is by now released or needs to be re-checked?

Changed in probert (Ubuntu):
status: New → Invalid
Revision history for this message
Dan Bungert (dbungert) wrote : Fixed in curtin version 21.3.

This bug is believed to be fixed in curtin in version 21.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.