Installer crashes when setting up software RAID and disks have duplicate WWN
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
curtin |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
Trying to set up software RAID on a 22.04 server where the physical disks share the same WWN the installer will crash. This is because curtin will try to look up both disks using the same WWN, leading to follow-up operations (like running sgdisk) working on the same disk twice. This makes sgdisk unhappy, which leads to the installer crashing
Here is some lsblk output from such a server showing the values:
```
$ lsblk -S -d -o TRAN,NAME,
TRAN NAME TYPE MODEL SERIAL SIZE WWN
sata sda disk M.2 (S80) 3ME4 YCA12009140310160 119.2G 0x502b2a201d1c1b1a
sata sdb disk M.2 (S80) 3ME4 YCA11905030450003 119.2G 0x502b2a201d1c1b1a
usb sr0 rom Virtual CDROM0 AAAABBBBCCCC1 1024M
usb sr1 rom Virtual CDROM1 AAAABBBBCCCC1 1024M
usb sr2 rom Virtual CDROM2 AAAABBBBCCCC1 1024M
usb sr3 rom Virtual CDROM3 AAAABBBBCCCC1 1024M
```
There you can see that serial is unique, but the WWN is unfortunately the same.
Here is output from the installer crash report where the issue occurs (here the interal drives happen to be sdb and sdc instead of sda/sdb):
```
start: cmd-install/
get_path_
Processing serial 0x502b2a201d1c1b1a via udev to 0x502b2a201d1c1b1a
lookup_disks found: ['wwn-0x502b2a2
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sdb'] with allowed return codes [0] (capture=True)
/dev/sdb is multipath device? False
[...]
Preparing partition location on disk /dev/sdb
Wiping 1M on /dev/sdb at offset 1048576
Running command ['sgdisk', '--new', '1:2048:2203647', '--typecode=
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sdb'] with allowed return codes [0] (capture=True)
/dev/sdb is multipath device? False
```
You can see that the 'path' field is '/dev/sdc', but even so the follow-up commands operate on /dev/sdb instead.
Then a bit later:
```
start: cmd-install/
get_path_
Processing serial 0x502b2a201d1c1b1a via udev to 0x502b2a201d1c1b1a
lookup_disks found: ['wwn-0x502b2a2
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sdb'] with allowed return codes [0] (capture=True)
/dev/sdb is multipath device? False
[...]
Preparing partition location on disk /dev/sdb
Wiping 1M on /dev/sdb at offset 1048576
Running command ['sgdisk', '--new', '1:2048:2203647', '--typecode=
An error occured handling 'partition-1': ProcessExecutio
Command: ['sgdisk', '--new', '1:2048:2203647', '--typecode=
Exit code: 4
Reason: -
Stdout: ''
Stderr: Could not create partition 1 from 2048 to 2203647
Error encountered; not saving changes.
```
Above you can see that the disk is supposed to be /dev/sdb this time, and here it actually results in working on /dev/sdb, but this is too late as it has already executed the same sgdisk command earlier, and here it breaks.
I tried building my own 22.04 installer with a patched curtin library: https:/
Here is the diff for future reference:
```
diff --git a/curtin/
index f3f19dc2..918acb34 100644
--- a/curtin/
+++ b/curtin/
@@ -455,7 +455,7 @@ def get_path_
# Get path to block device for disk. Device_id param should refer
# to id of device in storage config
- for disk_key in ['wwn', 'serial', 'device_id', 'path']:
+ for disk_key in ['serial', 'device_id', 'path']:
try:
if not vol_value:
```
Basically I just ignore looking for WWN instead going directly for serial, and with this patch the installer managed to succeed. Here is output from that successful run (here the internal drives end up as sda/sdb again):
```
start: cmd-install/
get_path_
Processing serial M.2_(S80)
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sdb'] with allowed return codes [0] (capture=True)
/dev/sdb is multipath device member? False
[...]
Preparing partition location on disk /dev/sdb
Wiping 1M on /dev/sdb at offset 1048576
Running command ['sgdisk', '--new', '1:2048:2203647', '--typecode=
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sdb'] with allowed return codes [0] (capture=True)
/dev/sdb is multipath device? False
```
and then later, the second disk (now actually operating on the expected disk):
```
start: cmd-install/
get_path_
Processing serial M.2_(S80)
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sda'] with allowed return codes [0] (capture=True)
/dev/sda is multipath device member? False
[...]
Preparing partition location on disk /dev/sda
Wiping 1M on /dev/sda at offset 1048576
Running command ['sgdisk', '--new', '1:2048:2203647', '--typecode=
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sda'] with allowed return codes [0] (capture=True)
/dev/sda is multipath device? False
```
From what I can tell at least the following bugs also are a result of either WWN or serial being duplicates:
https:/
https:/
It seems to me the proper fix for this would be that at some stage in the code where the hardware data has been assembled you should iterate over the disks and mark disks with duplicate keys in an additional field or something. If this was known it would be easy to do somehing like "if disk_key in duplicate_
Related branches
- Server Team CI bot: Approve (continuous-integration)
- Dan Bungert: Approve
-
Diff: 553 lines (+384/-44)7 files modifiedcurtin/block/multipath.py (+4/-3)
curtin/commands/block_meta.py (+130/-39)
curtin/udev.py (+9/-0)
examples/tests/multipath-reuse.yaml (+1/-1)
examples/tests/multipath.yaml (+1/-1)
requirements.txt (+1/-0)
tests/unittests/test_commands_block_meta.py (+238/-0)
Changed in curtin: | |
status: | New → Fix Committed |
And some additional information, curtin seems to look up stuff in /dev/disk/by-id/ in its lookup_disk(serial) function: https:/ /github. com/canonical/ curtin/ blob/b08eecd68c f5f1bccf4255b3d 00a77af51c159f7 /curtin/ block/_ _init__ .py#L921- L924
On a machine where the WWN is duplicate, there will only exists links for one of the disks: by-id/scsi- 3502b2a201d1c1b 1a -> ../../sdb by-id/scsi- 3502b2a201d1c1b 1a-part1 -> ../../sdb1 by-id/scsi- 3502b2a201d1c1b 1a-part2 -> ../../sdb2 by-id/wwn- 0x502b2a201d1c1 b1a -> ../../sdb by-id/wwn- 0x502b2a201d1c1 b1a-part1 -> ../../sdb1 by-id/wwn- 0x502b2a201d1c1 b1a-part2 -> ../../sdb2
```
$ ls -l /dev/disk/by-id/* | grep 502b2a201d1c1b1a
lrwxrwxrwx 1 root root 9 Jan 20 12:49 /dev/disk/
lrwxrwxrwx 1 root root 10 Jan 20 12:49 /dev/disk/
lrwxrwxrwx 1 root root 10 Jan 20 12:49 /dev/disk/
lrwxrwxrwx 1 root root 9 Jan 20 12:49 /dev/disk/
lrwxrwxrwx 1 root root 10 Jan 20 12:49 /dev/disk/
lrwxrwxrwx 1 root root 10 Jan 20 12:49 /dev/disk/
```
So this can not be used to detect the duplication.