Action "add-disk" fails with disks which still have device-mapper leftovers

Bug #1930285 reported by Peter Sabaini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
New
Undecided
Unassigned

Bug Description

I've run the add-disk action on devices that still had device-mapper entries from a previous deployment. This fails with a stack trace.

This is because pvcreate can't open the block device exclusively:

     partx: /dev/disk/by-dname/bcache-sdb: failed to read partition table
        Failed to find physical volume "/dev/bcache1".
        Failed to find physical volume "/dev/bcache1".
        Can't open /dev/bcache1 exclusively. Mounted filesystem?
      Traceback (most recent call last):
        File "/var/lib/juju/agents/unit-ceph-osd-48/charm/actions/add-disk", line 67, in <module>
          bucket=hookenv.action_get("bucket")) File "/var/lib/juju/agents/unit-ceph-osd-48/charm/actions/add-disk", line 37, in add_device
          hookenv.config('osd-encrypt-keymanager'))
        File "lib/ceph/utils.py", line 1465, in osdize
          bluestore, key_manager)
        File "lib/ceph/utils.py", line 1540, in osdize_dev
          key_manager)
        File "lib/ceph/utils.py", line 1675, in _ceph_volume
          key_manager=key_manager))
        File "lib/ceph/utils.py", line 1925, in _allocate_logical_volume
          lvm.create_lvm_physical_volume(pv_dev)
        File "/usr/local/lib/python3.6/dist-packages/charmhelpers/contrib/storage/linux/lvm.py", line 92, in create_lvm_physical_volume
          check_call(['pvcreate', block_device])
        File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['pvcreate', '/dev/disk/by-dname/bcache-sdb']' returned non-zero exit status 5.

When inspecting the block device I can see entries for device-mapper:

# dmsetup ls
ceph--c18fd969--8e29--4d11--90b5--83de121f8f51-osd--journal--c18fd969--8e29--4d11--90b5--83de121f8f51 (253:4)
ceph--78b8d18a--b17f--4faf--a08d--9eaf89e779a8-osd--journal--78b8d18a--b17f--4faf--a08d--9eaf89e779a8 (253:2)
ceph--c18fd969--8e29--4d11--90b5--83de121f8f51-osd--data--c18fd969--8e29--4d11--90b5--83de121f8f51 (253:5)
ceph--78b8d18a--b17f--4faf--a08d--9eaf89e779a8-osd--data--78b8d18a--b17f--4faf--a08d--9eaf89e779a8 (253:3)
ceph--6b4f06b5--3264--44f7--8109--e6ddb14c7c2f-osd--data--6b4f06b5--3264--44f7--8109--e6ddb14c7c2f (253:1)
ceph--6b4f06b5--3264--44f7--8109--e6ddb14c7c2f-osd--journal--6b4f06b5--3264--44f7--8109--e6ddb14c7c2f (253:0)

After clearing them by hand via `dmsetup remove` the add-disk action succeeds

Probably the action should just clear those device-mapper entries.
Alternatively it would be good to catch that situation and print a warning message.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Peter

When you say "previous deployment", do you mean that the machine was not new/pristine when the charm was newly deployed to that machine?

Thanks.

Changed in charm-ceph-osd:
status: New → Incomplete
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Hi,

indeed the machine has had a ceph deployment previously. I had zapped the disks with the "zap-disk" action before running add-disk however. So possibly the bug here really is about zap-disk not clearing the dmsetup entries?

cheers,
peter.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

> indeed the machine has had a ceph deployment previously. I had zapped the disks with the "zap-disk" action before running add-disk however. So possibly the bug here really is about zap-disk not clearing the dmsetup entries?

Yeah, I think we are in "undefined behaviour" territory if the machine is not pristine (from ceph-osd charm's perspective) when the charm is installed; it wasn't designed that way, and so we'd have to add some features to detect these types of anomalies (and then block) - not that that's not desirable, it would just be a new feature.

However, if you used zap-disk and it didn't remove the stale dms then that might well be a bug. I don't suppose you have any logs from the previous deployment for that unit?

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Hey,

I have put up an sosreport here (Canonical only, sorry):

https://private-fileshare.canonical.com/~sabaini/17a8461e-f723-4a62-89c3-5be486f71302/

This should have logs from the previous deployment as well.

Changed in charm-ceph-osd:
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.