Ubuntu server LVM creates clone partitions and crash installation (invalid wwn is used by udev & probert as well)

Bug #1929213 reported by Bartosz Baranowski
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
curtin
Fix Committed
Undecided
Michael Hudson-Doyle
subiquity
Fix Committed
Undecided
Unassigned

Bug Description

#I've already sent crash report from installer, but since I'm not quite sure how #it's handled, might as well follow up with this.

#I've been tinkering with ubuntu server and some old HW. Long story short, since #I'm not used to CLI/text like install and missed that installer reverse order of #HDDs I ended up with disk setup where /boot, /boot/efi along with LVM handling #root were on /sdb
#To be precise:
#sdb:
#/boot
#/boot/efi
#/lvm/root

#sda:
#/lvm/root

#This failed at some point with message refering boot/grub could not be found on #/dev/sda1:
#
#"could not get path to dev from kname sda1"
#
#If I set up everything on sda it works fine.

https://bugs.launchpad.net/ubuntu/+source/subiquity/+bug/1929213/comments/5

Tags: fr-1759 impish

Related branches

Colin Watson (cjwatson)
affects: launchpad → ubuntu
Revision history for this message
Bartosz Baranowski (baranowb) wrote :

Upon further investigation - it might be connected to BIOS screwing up order. Part of installer see wrong order, but scripts most likely see it differently?

Revision history for this message
Chris Guiver (guiverc) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command only once, as it will automatically gather debugging information, in a terminal:

apport-collect 1929213

When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

(no release, ISO etc details have been provided)

Revision history for this message
Bartosz Baranowski (baranowb) wrote :

Im thinkering with ubuntu server 20.10 from live USB.

I tried single drive install and it worked. Once it went live I tried minimal setup on two SSDs( /boot, /boot/efi and encrypted LVM with rest on both). Not sure if its relevant but I can see in crash report:
"adding partition 'partition-3' to disk "disk-sdb" (ptable: 'gpt')
partnum: 1 offset_Sectrors: 2048 length_sectors 500113407
Preparing partition location on disk /dev/sda

This fails with exist code: 4

For some reason Im not able to authorize apport. I will try to tinker with single drive install and two. Will crash report be sufficient ?

Paul White (paulw2u)
affects: ubuntu → subiquity (Ubuntu)
tags: added: groovy
Revision history for this message
Bartosz Baranowski (baranowb) wrote :

Possibly original tittle might be wrong. Ive reset physical connections and BIOS, but could not reproduce ID switch. I might just have been tired and imagined it. Nevertheless, installation still fail.
Through trial and error one thing is clear - plain HDD install( no LVM/Crypto) does work like a charm.
What I found odd about his is fact that installer, seems to see partitions cloned after restart, while parted does not( txt and img in ;single_drive_encr_lvm;). I'm fairly sure other iterations did produce similar duplication, but I just brushed it off as trash after last install - my bad?

Crash logs, with semi-descriptive dirs attached.

Revision history for this message
Bartosz Baranowski (baranowb) wrote :

Should have been: * while parted does show it as well.

Out of curiosity, I gave a go to fedora server and it seems to work( with one minor detail when it comes to partitioning).

Im going to alter title, as it is wrong.

General flaw seems to be LVM mixing drives/CC partition table onto another:

Ubuntu headles LVM creation fails. HW setup:
2xSSD(sda,sdb)
2xHDD(sdc,sdd).

Test scenarion:
0 - /boot and /boot/efi on /sda

1(encr) vg0 on sda,sdb( /, /home, /etc, /var/log, /usr/local). vg1 on sdc,sdd( /srv )
2(plain) vg0 on sda,sdb( /, /home, /etc, /var/log, /usr/local). vg1 on sdc,sdd( /srv )
3(encr) vg0 on sda,sdb( /, /home, /etc, /var/log, /usr/local).
4(plain) vg0 on sda,sdb( /, /home, /etc, /var/log, /usr/local).
5(encr) vg0 on sda( /, /home, /etc, /var/log, /usr/local).
6(plain) vg0 on sda( /, /home, /etc, /var/log, /usr/local).
7(plain) sda( /, /home, /etc, /var/log, /usr/local).

Only 7 worked. In all other cases installation failed( for some crash logs are in tar ). With one key observation - even when only single drive was selected, somehow second(sdb) had clones of partitions from sda.

summary: - Ubuntu server fails to install if boot is not on sda1
+ Ubuntu server LVM creates clone partitions and crash installation
Revision history for this message
Bartosz Baranowski (baranowb) wrote : Re: Ubuntu server LVM creates clone partitions and crash installation

During installation version was bumped to 21.04.xx( as per dialog suggestion): AFAIR.

description: updated
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Argh what is going on here is that you have two distinct drives with the same WWN. From the UdevDB from the crash report:

 P: /devices/pci0000:00/0000:00:11.0/ata1/host0/target0:0:0/0:0:0:0/block/sda
 N: sda
...
 E: ID_WWN=0x5000000000000000
...
 E: ID_SERIAL=SSDPR-CX400-256-G2_GXA062868

 P: /devices/pci0000:00/0000:00:11.0/ata2/host1/target1:0:0/1:0:0:0/block/sdb
 N: sdb
...
 E: ID_WWN=0x5000000000000000
...
 E: ID_SERIAL=SSDPR-CX400-256-G2_GXA062866

Distinct serial but same wwn. These values end up in the curtin config:

     - id: disk-sda
         path: /dev/sda
         ptable: gpt
         serial: SSDPR-CX400-256-G2_GXA062868
         type: disk
         wwn: '0x5000000000000000'
     - id: disk-sdb
         path: /dev/sdb
         ptable: gpt
         serial: SSDPR-CX400-256-G2_GXA062866
         type: disk
         wwn: '0x5000000000000000'

curtin looks up by wwn first (if present) so when processing the action for /dev/sda it actually (at least in the run I looked at) ended up clearing users from /dev/sdb instead. So the vg named "vg0" that was on sda didn't get cleared and then when it tried to create a new vg called vg0, things blew up.

I'm not going to bother trying to think of why some of your attempts failed and some succeeded. Your comment "even when only single drive was selected, somehow second(sdb) had clones of partitions from sda." does make sense given this though!

What we should do is be more robust to this behaviour (it's not the first time we've seen it). Something needs to notice when drives with the same WWN have different serials and ignore the wwn if that happens, or something along those lines.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

You might also want to yell at your disk vendor a bit I guess :)

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

wwn: '0x5000000000000000' is not a valid wwn and should have been rejected by udev & probert. All zeros are not allowed, and 5 is just a prefix.

I think this is an OEM / whitelabel drive, with expectation that somebody who ships these drives will use their WWN prefix and flash their own WWN as they see fit..... If they cared.

summary: Ubuntu server LVM creates clone partitions and crash installation
+ (invalid wwn is used by udev & probert as well)
Revision history for this message
Bartosz Baranowski (baranowb) wrote :

Only attempt that was successful was when I did not use lVM and set up everything on single drive.

I got drives from retailer, sealed in hard plastic so I doubt anyone except factory had their hands on them. Second pair( HDDs ) I got sealed by vendor with his sticker on it. Those two have WWN set.

It seems WWN = 0x50..... is something that is common among those type of drives. Just checked my destkop and it has at least one drive like that as well.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think the reason LVM deploys fail is that there is already a VG with the name that is to be used, which should be deleted but isn't because the installer gets confused about which disk is which. In any case, the source of the problem is clear and the exact symptoms are not that important.

We should definitely handle this situation better. (I think arguably the udev rules should such a bogus wwn as well but I don't know as much about how things work at that level).

Disk actions can provide three bits of data to find the drive: wwn, serial and path. Currently the installer just uses the first of whichever of these is provided (so in this case, the bogus wwn), but Dimitri suggested that instead it should use all provided fields, which would work much better for this bug at least.

Changed in subiquity (Ubuntu):
status: New → Triaged
affects: subiquity (Ubuntu) → subiquity
Changed in curtin:
status: New → Triaged
tags: added: impish
removed: groovy
Dan Bungert (dbungert)
tags: added: fr-1759
Changed in curtin:
status: Triaged → In Progress
assignee: nobody → Michael Hudson-Doyle (mwhudson)
Olivier Gayot (ogayot)
Changed in subiquity:
status: Triaged → Fix Committed
Changed in curtin:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.