two identical disks are only offered as a multipath device

Bug #1902855 reported by Matthias Klose
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
curtin (Ubuntu)
Invalid
Undecided
Unassigned
multipath-tools (Ubuntu)
Invalid
Undecided
Unassigned
probert (Ubuntu)
Invalid
Undecided
Unassigned
subiquity (Ubuntu)
Confirmed
Medium
Olivier Gayot

Bug Description

two identical SSDs are only offered as a multipath device in the server installer, not as single devices. The install succeeds, however the following boot falls back to the initramfs because the volume group cannot be found. Same hardware configuration as in LP: #1902845.

blacklisting the dm_multipath module on the installer boot, shows the two disks, however the installer later fails because the multipath and multipathd commands fail.

Related branches

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

On bug LP: #1902845 you have had attached the install tgz which contains cloud-init/curtin/subiquity logs.

Could you attach the same here so that it can be checked why it is behaving the way it is.
Also required to be able to decide which component we are actually looking at for the issue.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Also - are these really two SSDs - or one SSD over two paths?
Eventually that is the problem some component of the install fails on, do you happen to know the box what it really physically is in this case?

Matthias Klose (doko)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With multipath running in the install environment how does e.g. "sudo multipath -ll -v 3" look like?
You'd usually expect all multipath devices to be listed with path and ID e.g.:

mpatha (36005076306ffd6b6000000000000240a) dm-3 IBM,2107900
size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 0:0:0:1074413604 sdc 8:32 active ready running
  |- 0:0:1:1074413604 sdh 8:112 active ready running

I wonder if we can derive anything from this in your case.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Maybe also report:

/sys/block/sda/device/[vendor,model,vendor,model}

and

sudo hdparm -i /dev/sd*

Revision history for this message
Matthias Klose (doko) wrote :
Revision history for this message
Matthias Klose (doko) wrote :

$ cat /sys/block/nvme0n1/device/model
Force MP600
$ cat /sys/block/nvme2n1/device/model
Force MP600

Revision history for this message
Matthias Klose (doko) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So here we see multipath is indeed considering it to be one disk:

mpatha (eui.6479a73730830210) dm-0 NVME,Force MP600
size=932G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 0:1:1:1 nvme0n1 259:0 active undef running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 2:1:1:1 nvme2n1 259:2 active undef running

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Maybe the same as https://bugs.centos.org/view.php?id=16893 ?

There people also used dm-multipath.blacklist=1 to get around it as you mentioned that works for you as well right?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Maybe have a look with
  sudo apt install nvme-cli
If you can get any "bad same ID" off these devices?

Revision history for this message
Matthias Klose (doko) wrote :

> There people also used dm-multipath.blacklist=1 to get around
> it as you mentioned that works for you as well right?

No, as mentioned in the bug description. the installer still seems to call multipath and multipathd commands which then fail.

Revision history for this message
Matthias Klose (doko) wrote :

$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 202382840001300528DE Force MP600 1 1.00 TB / 1.00 TB 512 B + 0 B EGFM11.3
/dev/nvme1n1 7QH007NY Seagate FireCuda 520 SSD ZP2000GM30002 1 2.00 TB / 2.00 TB 512 B + 0 B STNSC014
/dev/nvme2n1 202382840001300528EF Force MP600 1 1.00 TB / 1.00 TB 512 B + 0 B EGFM11.3

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I know I have asked in IRC, but want to avoid forgetting it.
Is that a system of yours or is it in a computing center we might get access for debugging?

Revision history for this message
Matthias Klose (doko) wrote :

no, personal machine

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So this is more a bug in multipath-tools then? I mean subiquity _could_ grow a feature for disabling multipath but I'd really rather not unless this breakage is common.

Revision history for this message
Matthias Klose (doko) wrote :

$ sudo nvme id-ns /dev/nvme0n1 | grep eui64
eui64 : 6479a73730830210
$ sudo nvme id-ns /dev/nvme2n1 | grep eui64
eui64 : 6479a73730830210

Revision history for this message
Matthias Klose (doko) wrote :

see also LP: #1871611, failing the install with disabled multipath.

You find several more reports about multipath being problematic, and usually you can work around the issue by disabling multipath support on boot by not loading the module. However multipath is hard wired into the installer.

I'll remove the disk for now for the installation, however this is not possible when you can't access the disks, or when they are soldered. having two SSDs in notebooks is not uncommon.

tags: added: rls-hh-incoming
tags: added: fr-898
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

eui64 being identical sounds like a violation of NVMe spec!

Revision history for this message
Matthias Klose (doko) wrote :

> eui64 being identical sounds like a violation of NVMe spec!

pointer please?

tags: removed: rls-hh-incoming
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

When both drives connected could you please paste the output of

$ cat /sys/block/nvme[0-9]n[0-9]/{nguid,eui}

https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4b-2020.09.21-Ratified.pdf

As per ratified NVMe 1.4b specification, in figure 249 Identity Namespace Data Structure the nguid and/or eui-64 fiends must be globally unique. Or they may be initialised to the value of 0h to indicate empty.

If you have two drives, with two distinct namespaces, which are not multipathed to access the single/same underlying disk, I expect the two drives to have _different_ nguid & eui values.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 1902855] Re: two identical disks are only offered as a multipath device

On Thu, 5 Nov 2020 at 21:35, Matthias Klose <email address hidden>
wrote:

> see also LP: #1871611, failing the install with disabled multipath.
>

That bug looked very much like the kernel losing its mind. Are you saying
you can reproduce it?

> However multipath is hard wired into the installer.
>

The intent is that what mulitpath-tools does on boot is reflected in the
installer. If it is disabled by kernel command line or whatever, the disks
should not be multipathed. It's possible this is not what happens, but it's
not by design.

Revision history for this message
Matthias Klose (doko) wrote :

$ cat /sys/block/nvme[02]n[0-9]/{nguid,eui}
cat: '/sys/block/nvme[02]n[0-9]/nguid': No such file or directory
64 79 a7 37 30 83 02 10
64 79 a7 37 30 83 02 10

Revision history for this message
Matthias Klose (doko) wrote :

> The intent is that what mulitpath-tools does on boot is reflected
> in the installer. If it is disabled by kernel command line or
> whatever, the disks should not be multipathed. It's possible this
> is not what happens, but it's not by design.

yes, that's what is not working. with multipath disabled on the command line, I see:

Nov 06 11:11:08 | libdevmapper version 1.02.167 (2019-11-30)
Nov 06 11:11:08 | DM multipath kernel driver not loaded
===== paths list =====
uuid hcil dev dev_t pri dm_st chk_st vend/prod/rev
     0:1:1:1 nvme0n1 259:0 -1 undef undef NVME,Force MP600
     1:1:1:1 nvme1n1 259:1 -1 undef undef NVME,Seagate FireCuda 520 SSD ZP20
     2:1:1:1 nvme2n1 259:2 -1 undef undef NVME,Force MP600
     1:0:0:0 sda 8:0 -1 undef undef ATA,WDC WD141KRYZ-01
     6:0:0:0 sdb 8:16 -1 undef undef ATA,WDC WD141KRYZ-01

as seen in the crash report, all multipath queries are answered with no, then the installer tries to run multipath ...

Revision history for this message
Matthias Klose (doko) wrote :

curtin has:

def multipath_assert_supported():
    """ Determine if the runtime system supports multipath.
    returns: True if system supports multipath
    raises: RuntimeError: if system does not support multipath
    """
    missing_progs = [p for p in ('multipath', 'multipathd')
                     if not util.which(p)]
    if missing_progs:
        raise RuntimeError(
            "Missing multipath utils: %s" % ','.join(missing_progs))

that's apparently not good enough, if multipath is disabled via kernel command line.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

based on multipath & installer logs things look okish.

nvme0n1 & nvme2n1 is multipathed Force MP600 NVMe drive eui.6479a73730830210, which is assembled into mpatha drive (also dm-0)

nvme1n1 is another NVMe drive Seagate FireCuda 520 SSD ZP2000GM30002 eui.0024cf01540022a8

sdc is the usb stick with Ubuntu Server installer

sda is WDC_WD141KRYZ-01, serial 9RJRELZC

sdb is WDC_WD141KRYZ-01, serial 9RJRTXYC

The probert data correctly shows multipath device.

I have the expectation that the installer should have offered the multipathed NVMe, non-multipath NVMe, and the sda / sdb drives for install. Thus you should have seen the two euid's and the two serial numbers for installation.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think there is no doubt that Matthias's disks are bad and the default behaviour is OK, but I would hope that disabling multipath via the kernel would allow an installation as desired. But it seems curtin doesn't quite work this way. Part of the issue IIRC is that the curtin multipath support doesn't assume multipathd is actually running, as for some versions of Ubuntu multipath-tools wasn't included in the environment curtin runs in. I don't think that's true for recent releases, but it might be true for xenial which MAAS presumably still has to support installing...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

*via the kernel command line

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So much time has past ...
@Michael - do you think this would work (despite the bad devices having the same id) better nowadays?

I think I can drop the Hirsute tasks :-)
ALso it isn't a probert/curtin/multipath bug if the reported IDs are the same.
Keeping the subiquity task open, but only to consider hardening it to somehow be able to work around it.

no longer affects: subiquity (Ubuntu Hirsute)
no longer affects: probert (Ubuntu Hirsute)
no longer affects: multipath-tools (Ubuntu Hirsute)
no longer affects: curtin (Ubuntu Hirsute)
Changed in curtin (Ubuntu):
status: New → Invalid
Changed in multipath-tools (Ubuntu):
status: New → Invalid
Changed in probert (Ubuntu):
status: New → Invalid
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Nothing has changed here. I do have a branch in review that should help but it broke other vmtests in obscure ways. Now Jenkins is usable again, I guess I/someone should take a look at it again...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in subiquity (Ubuntu):
status: New → Confirmed
Revision history for this message
Chee Yang Chau (chaucheeyang) wrote :

I also facing this issue. I have reported to

https://askubuntu.com/questions/1430273/ubuntu-22-04-1-installer-cannot-detect-dual-identical-nvme-devices

Is there any progress to this issue?

Simon Chopin (schopin)
Changed in subiquity (Ubuntu):
importance: Undecided → Medium
tags: added: foundations-triage-discuss
Dan Bungert (dbungert)
tags: added: foundations-todo
removed: foundations-triage-discuss
Dan Bungert (dbungert)
Changed in subiquity (Ubuntu):
assignee: nobody → Olivier Gayot (ogayot)
tags: removed: fr-898
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.