vgcreate fails on /dev/disk/by-dname block devices

Bug #1878752 reported by James Page
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Invalid
Undecided
Unassigned
curtin (Ubuntu)
Invalid
Undecided
Unassigned
lvm2 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Ubuntu Focal, OpenStack Charmers Next Charms.

juju run-action --wait ceph-osd/0 add-disk osd-devices=/dev/disk/by-dname/bcache2

unit-ceph-osd-0:
  UnitId: ceph-osd/0
  id: "5"
  message: exit status 1
  results:
    ReturnCode: 1
    Stderr: |
      partx: /dev/disk/by-dname/bcache2: failed to read partition table
        Failed to find physical volume "/dev/bcache1".
        Failed to find physical volume "/dev/bcache1".
        Device /dev/disk/by-dname/bcache2 not found.
      Traceback (most recent call last):
        File "/var/lib/juju/agents/unit-ceph-osd-0/charm/actions/add-disk", line 79, in <module>
          request = add_device(request=request,
        File "/var/lib/juju/agents/unit-ceph-osd-0/charm/actions/add-disk", line 34, in add_device
          charms_ceph.utils.osdize(device_path, hookenv.config('osd-format'),
        File "lib/charms_ceph/utils.py", line 1497, in osdize
          osdize_dev(dev, osd_format, osd_journal,
        File "lib/charms_ceph/utils.py", line 1570, in osdize_dev
          cmd = _ceph_volume(dev,
        File "lib/charms_ceph/utils.py", line 1705, in _ceph_volume
          cmd.append(_allocate_logical_volume(dev=dev,
        File "lib/charms_ceph/utils.py", line 1965, in _allocate_logical_volume
          lvm.create_lvm_volume_group(vg_name, pv_dev)
        File "hooks/charmhelpers/contrib/storage/linux/lvm.py", line 104, in create_lvm_volume_group
          check_call(['vgcreate', volume_group, block_device])
        File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['vgcreate', 'ceph-911bc34b-4634-4ebd-a055-876b978d0b0a', '/dev/disk/by-dname/bcache2']' returned non-zero exit status 5.
    Stdout: |2
        Physical volume "/dev/disk/by-dname/bcache2" successfully created.
  status: failed
  timing:
    completed: 2020-05-15 06:04:41 +0000 UTC
    enqueued: 2020-05-15 06:04:30 +0000 UTC
    started: 2020-05-15 06:04:39 +0000 UTC

The same action on the /dev/bcacheX device succeeds - looks like some sort of behaviour break in Ubuntu.

Revision history for this message
James Page (james-page) wrote :

The by-dname entry disappears after the pv is created:

$ sudo pvs
  PV VG Fmt Attr PSize PFree
  /dev/bcache1 lvm2 --- 931.51g 931.51g
  /dev/bcache2 ceph-a7917224-a99d-4885-814d-c9376fa8436d lvm2 a-- 931.51g 0

$ ls -l /dev/disk/by-dname/
total 0
lrwxrwxrwx 1 root root 13 May 15 06:04 bcache1 -> ../../bcache2
lrwxrwxrwx 1 root root 13 May 15 06:04 bcache3 -> ../../bcache0
lrwxrwxrwx 1 root root 13 May 15 06:04 bcache43 -> ../../bcache3
lrwxrwxrwx 1 root root 13 May 15 06:04 nvme0n1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 May 15 06:04 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 May 15 06:04 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 9 May 15 06:04 sda -> ../../sda
lrwxrwxrwx 1 root root 10 May 15 06:04 sda-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 15 06:04 sda-part2 -> ../../sda2
lrwxrwxrwx 1 root root 9 May 15 06:04 sdb -> ../../sdb
lrwxrwxrwx 1 root root 9 May 15 06:04 sdc -> ../../sdc
lrwxrwxrwx 1 root root 9 May 15 06:04 sdd -> ../../sdd

description: updated
Revision history for this message
James Page (james-page) wrote :

Removing the pv from the device also removes the by-dname entry

Revision history for this message
James Page (james-page) wrote :

ubuntu@node-licetus:~$ sudo vgcreate ceph-911bc34b-4634-4ebd-a055-876b978d0b0a /dev/disk/by-dname/bcache2
  Physical volume "/dev/disk/by-dname/bcache2" successfully created.
  Volume group "ceph-911bc34b-4634-4ebd-a055-876b978d0b0a" successfully created
ubuntu@node-licetus:~$ ls -l /dev/disk/by-dname/
total 0
lrwxrwxrwx 1 root root 13 May 15 06:12 bcache1 -> ../../bcache2
lrwxrwxrwx 1 root root 13 May 15 06:12 bcache3 -> ../../bcache0
lrwxrwxrwx 1 root root 13 May 15 06:12 bcache43 -> ../../bcache3
lrwxrwxrwx 1 root root 13 May 15 06:12 nvme0n1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 May 15 06:12 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 May 15 06:12 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 9 May 15 06:12 sda -> ../../sda
lrwxrwxrwx 1 root root 10 May 15 06:12 sda-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 15 06:12 sda-part2 -> ../../sda2
lrwxrwxrwx 1 root root 9 May 15 06:12 sdb -> ../../sdb
lrwxrwxrwx 1 root root 9 May 15 06:12 sdc -> ../../sdc
lrwxrwxrwx 1 root root 9 May 15 06:12 sdd -> ../../sdd
ubuntu@node-licetus:~$ sudo udevadm trigger --subsystem-match=block --action=add
ubuntu@node-licetus:~$ ls -l /dev/disk/by-dname/
total 0
lrwxrwxrwx 1 root root 13 May 15 06:14 bcache1 -> ../../bcache2
lrwxrwxrwx 1 root root 13 May 15 06:14 bcache2 -> ../../bcache1
lrwxrwxrwx 1 root root 13 May 15 06:14 bcache3 -> ../../bcache0
lrwxrwxrwx 1 root root 13 May 15 06:14 bcache43 -> ../../bcache3
lrwxrwxrwx 1 root root 13 May 15 06:14 nvme0n1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 May 15 06:14 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 May 15 06:14 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 9 May 15 06:14 sda -> ../../sda
lrwxrwxrwx 1 root root 10 May 15 06:14 sda-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 15 06:14 sda-part2 -> ../../sda2
lrwxrwxrwx 1 root root 9 May 15 06:14 sdb -> ../../sdb
lrwxrwxrwx 1 root root 9 May 15 06:14 sdc -> ../../sdc
lrwxrwxrwx 1 root root 9 May 15 06:14 sdd -> ../../sdd

Revision history for this message
James Page (james-page) wrote :

OK so it sees the pv* commands have a side-effect of dropping the by-dname entries, so when the charm creates the PV and then tries to use it for the VG its disappeared.

ubuntu@node-licetus:~$ sudo pvcreate /dev/disk/by-dname/bcache3
  Physical volume "/dev/disk/by-dname/bcache3" successfully created.
ubuntu@node-licetus:~$ ls -l /dev/disk/by-dname/
total 0
lrwxrwxrwx 1 root root 13 May 15 06:18 bcache1 -> ../../bcache2
lrwxrwxrwx 1 root root 13 May 15 06:18 bcache2 -> ../../bcache1
lrwxrwxrwx 1 root root 13 May 15 06:18 bcache43 -> ../../bcache3
lrwxrwxrwx 1 root root 13 May 15 06:18 nvme0n1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 May 15 06:18 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 May 15 06:18 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 9 May 15 06:18 sda -> ../../sda
lrwxrwxrwx 1 root root 10 May 15 06:18 sda-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 15 06:18 sda-part2 -> ../../sda2
lrwxrwxrwx 1 root root 9 May 15 06:18 sdb -> ../../sdb
lrwxrwxrwx 1 root root 9 May 15 06:18 sdc -> ../../sdc
lrwxrwxrwx 1 root root 9 May 15 06:18 sdd -> ../../sdd

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

"dropping unrelated symlinks" sounds a lot like "udev drops bcache/by-uuid names, when didn't ask for it".

I wonder if this issue is a duplicate of https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1861941

CC rharper somehow.

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-ceph-osd (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/728488

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-ceph-osd (master)

Reviewed: https://review.opendev.org/728488
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=b1aab5d0e12e433b714e39f78945baf16e508a41
Submitter: Zuul
Branch: master

commit b1aab5d0e12e433b714e39f78945baf16e508a41
Author: James Page <email address hidden>
Date: Fri May 15 17:00:25 2020 +0100

    Trigger udev rescan if pv_dev disappears

    Workaround for kernel by in Ubuntu 20.04 LTS.

    When using by-dname device paths with MAAS and bcache, the pvcreate
    operation results in the by-dname entry for the block device being
    deleted. The subsequent vgcreate then fails as the path cannot
    be found.

    Trigger a rescan of block devices if the pv_dev path does not
    exists after the pvcreate operation.

    Change-Id: If7e11f6bd1effd2d5fc2dc5abbaba6865104006f
    Depends-On: Ifb16c47ae5ff316cbcfc3798de3446a3774fa012
    Related-Bug: 1878752

Revision history for this message
Dan Watkins (oddbloke) wrote :

This doesn't seem to be a curtin bug to me; feel free to disagree (with reasoning!).

Changed in curtin (Ubuntu):
status: New → Invalid
Revision history for this message
Ryan Harper (raharper) wrote :

I would very much agree with xnox (it's a duplicate) (and Dan, nothing for curtin to do);

curtin generated dname rules rely upon pointing to a /dev/bcache/by-uuid/* symlink which is currently broken per https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1861941 which at this time points some issue in udev itself (the kernel emits all of the correct uevents we expect).

And as James' workaround shows; it's *not* always happening; a rescan can "restore" the links; but that's not 100% reliable.

James Page (james-page)
Changed in charm-ceph-osd:
status: New → Invalid
Changed in lvm2 (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.