Installation of Focal on a linux raid consistently fails

Bug #1876848 reported by Stian Brattland on 2020-05-05
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin (Ubuntu)
High
Unassigned
subiquity (Ubuntu)
Undecided
Unassigned

Bug Description

During installation of Focal, I choose to set up custom partitioning. I have two disks, and I setup these two discs in a linux raid.

In addition to the autogenerated GRUB partitions which are created when i designate each of the two disks as boot disks, I create an additional partition which takes part in the raid device (md0).

Just after the installer attempts to begin installation, the installer fails. The most descriptive part of the crash report is :

comparing device lists: expected: ['/dev/sda2', '/dev/sda2'] found: ['/dev/sdb2', '/dev/sda2']
 An error occured handling 'raid-md127': ValueError - RAID array device list does not match. Missing: set() Extra: {'/dev/sdb2'}

In an attempt to work around this, I jumped into a shell just after the installer started. I then created a linux raid manually, and once a raid was in place, proceeded with the installer. I then chose to install Ubuntu onto the existing raid. The same error was given.

It seems that no matter what I try, the installer fails when I attempt to install onto a linux raid.

Installation on a plain partition works fine.

Related branches

Revision history for this message
Stian Brattland (sbrattla) wrote :
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Wow what a confusing error message, I'm sorry about this!

This seems to be caused by the fact that both disks in the curtin config have the same wwn:

   - {ptable: gpt, serial: Corsair_Force_GS_13207907000097410026, wwn: '0x0000000000000000',
     path: /dev/sda, preserve: true, name: '', grub_device: true, type: disk, id: disk-sda}
   - {ptable: gpt, serial: Corsair_Force_GS_1320790700009741003D, wwn: '0x0000000000000000',
     path: /dev/sdb, preserve: true, name: '', grub_device: true, type: disk, id: disk-sdb}

This seems to come straight from the udev database:

 P: /devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
 ...
 E: ID_WWN=0x0000000000000000
 E: ID_WWN_WITH_EXTENSION=0x0000000000000000

(and same for sdb). This in turn seems to be because SCSI_IDENT_LUN_NAA_LOCAL is 0000000000000000 (from 55-scsi-sg3_id.rules) for both drives and I'm fairly sure that's something sg_inq --export spits out but now we're really running into the limits of my knowledge I'm afraid. There's a bug somewhere, maybe in your drives firmware or maybe in sg3-utils...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Is the drive actually Corsair_Force or some white-label OEM who bought bulk Corsair_Force drives and were supposed to configure the WWNs before shipping the drives?

I think we should reject WWNs 0x0, or like things with consecutive numbers. Cause I remember in whoopsie database we had lots of devices with serial numbers 987654321.

Or at least we should ignore duplicate WWNs from non-multipathed drives. And say, well this is bogus, let's try to identify these by path, or some such. Not sure if we can implement that in probert to not generate non-multipathed drives with duplicate WWNs and purge them all, and emit warnings.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

We are sorry we assumed World Wide Name to be unique World Wide =/ but also we understand that "your drives are not unique enough" is not a good enough vendor response in such situation.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

> Or at least we should ignore duplicate WWNs from non-multipathed drives.

I think this makes sense, I'm not sure where to jam this in though. Probably curtin's extract_storage_config stuff? That doesn't really conceptually look at disks as a set but I'm sure we can hack it in.

Changed in subiquity (Ubuntu):
status: New → Triaged
Revision history for this message
Stian Brattland (sbrattla) wrote :

Hi all,

Thank's for following up!

The disks are very clearly labeled Corsair, so I believe they are original disks and not OEM. I have no idea why they do not have that WWN number set? The disks were bought as a regular consumer product.

I don't really know what that number is, or how it is read from the drives, but could it have an impact that I've enabled UEFI with legacy boot on the system?

Revision history for this message
Ryan Harper (raharper) wrote :

@Stian

No worries; it's nothing you configured/changed as far as I can tell. We look up device paths to disk via serial or wwn; as it turns out your two disks have different serials, but a duplicate (and invalid WWN); so we'll need to fix our code to ignore these invalid WWN and not include them in the storage config; I suspect if we removed two 'wwn' keys from the storage config, all would work fine.

Thanks for the crash, we have enough data here to fix the issue.

@Michael,

in block-discover, we should not include the wwn key if the value is invalid (0x0000... )
If we omit the key then we'll have 'serial' which does differ and we will find the correct block devices.

Changed in curtin (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yes, I had two questions really: 1) where to exclude "bogus" values (I think I agree that curtin's block-discover stuff is right) and 2) how to define "bogus" exactly. We could just exclude the values from this report and wait for the next bug report I guess...

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 145e4939 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=145e4939

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.3 KiB)

This bug was fixed in the package curtin - 19.3-68-g6cbdc02d-0ubuntu1

---------------
curtin (19.3-68-g6cbdc02d-0ubuntu1) groovy; urgency=medium

  * New upstream snapshot.
    - Makefile: make adjustments to call lint/style tools via python module
    - block-discover: ignore invalid id_serial/id_wwn values (LP: #1876848)
    - Fix handing of reusing msdos partitions and flags (LP: #1875903)
    - block.detect_multipath: ignore fake "devices" from /proc/mounts
      [Michael Hudson-Doyle] (LP: #1876626)
    - udev: use shlex.quote when shlex.split errors on shell-escape chars
      (LP: #1875085)
    - lvm: don't use vgscan --mknodes
    - vmtest: rsync don't cross filesystem boundaries when copying
      (LP: #1873909)
    - vmtest: basic/basic_scsi adjust collect/tests for unstable device names
      (LP: #1874100)
    - Add unittests for partition_handler calc_[dm]_part_info and kpartx paths
    - multipath: attempt to enforce /dev/mapper/mpath files are symlinks
    - block-meta: device mapper partitions may be block devices not links
    - Default to dm_name being id if empty earlier in dm_crypt_handler()
      [Łukasz 'sil2100' Zemczak] (LP: #1874243)
    - storage: correct declared schema draft version for storage schema
    - test_clear_holders: add missing zfs mock
    - Mock out zfs_supported to prevent attempting to load kernel modules
    - block-meta: skip wipe device paths if not present (LP: #1869075)
    - unittest: do not allow util.subp by default (LP: #1873913)
    - curthooks: support multiple ESP on UEFI bootable systems
    - block-discover: handle missing multipath 'path' data, use DM_NAME
      (LP: #1873728)
    - lvm-over-multipath: handle lookups of multipath members (LP: #1869075)
    - block-meta: don't filter preserve=true devices, select by wipe
      (LP: #1837214)
    - vmtest: basic use dname to lookup disk with multiple partitions
    - block-meta: Don't check the ptable type of a disk with no ptable
    - curthooks: always use ChrootableTarget.subp when calling efibootmgr
    - storage: enable and use multipath during storage configuration
      (LP: #1869075)
    - block-discover: detect nvme multipath devices (LP: #1868109)
    - clear-holders: Tolerate vgchange errors during discovery (LP: #1870037)
    - block-meta: handle preserve with vtoc ptable (LP: #1871158)
    - vmtest: use -partition file for TestReuseRAIDMemberPartition class
    - format: extra_options should be a list type
    - tox: add pyflakes to the default tox run [Paride Legovini]
    - storage_config: Add 'extra_options' parameter to allow custom mkfs
      (LP: #1869069)
    - Add support for installing Ubuntu Core 20 images
    - tox.ini: Fix issues with newer tox on focal
    - vmtest: Fix test_basic.py to run on s390x (LP: #1866663)
    - vmtest: use util.load_file for loading collect files
    - block-meta: refactor storage_config preserve and wipe settings
      (LP: #1837214)
    - block-discover: skip 'multipath' key in blockdevice if mpath name is None
    - tox: all py27 environments should use the base py27 deps
    - uefi: refactor efibootmg handling to support removing duplicate entries
      (LP: #1864257)
    - tox: pi...

Read more...

Changed in curtin (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Stian Brattland (sbrattla) wrote :

What's the process of getting this incorporated into subiquity, and consequently release it as part of the 20.04 installer?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 1876848] Re: Installation of Focal on a linux raid consistently fails

On Tue, 19 May 2020, 21:26 Stian Brattland, <email address hidden>
wrote:

> What's the process of getting this incorporated into subiquity, and
> consequently release it as part of the 20.04 installer?
>

You should be offered an upgrade during the install, if you say yes this
bug should be fixed.

>

Revision history for this message
Stian Brattland (sbrattla) wrote :

Thanks, that worked!

Revision history for this message
Stian Brattland (sbrattla) wrote :

The installation now completes successfully with RAID1 on these disks with bogus/invalid WWN.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Thanks for testing!

Changed in subiquity (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers