zap-disk action should fail if target disk is actively used by LVM, or handle the LVM removal

Bug #1858519 reported by Paul Goins
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
Medium
nikhil kshirsagar

Bug Description

Today I ran zap-disk to try to reset a previously-deployed device for the purpose of re-running add-disk. This unfortunately failed since zap-disk does not appear to handle when target disks are hosting LVM volumes.

As the charm provisioned the LVM volumes upon add-disk, it feels like zap-disk (or some other action) should handle cleanup of those volumes. Alternatively, it should fail and alert the user that LVM volume cleanup needs to be done first.

I'm presently uncertain of the exact charm version involved; the cloud in question is currently having some maintenance done which impacts juju. I'm guessing this affects the current trunk but am not certain; please dismiss this if it's already fixed.

In case someone else hits this issue, I resolved it via "pvremove --force <device>"; because of running zap-disk, LVM actions for this device at the volume group or logical volume level were not working. This then allowed me to run add-disk successfully afterwards.

Andrew McLeod (admcleod)
Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)
Changed in charm-ceph-osd:
status: Triaged → In Progress
Revision history for this message
nikhil kshirsagar (nkshirsagar) wrote :

If the disk being zapped is used by lvm2 (i.e contains the lvm2 label and hasn't been pvremoved) it's safer to simply bail out than attempt teardown through forced pvremove,etc, because the disk being zapped might even be in use by an LV.

Submitted https://review.opendev.org/c/openstack/charm-ceph-osd/+/804520

Tested the patch:
root@juju-779bb4-ceph-2:/var/lib/juju/agents/unit-ceph-osd-1/charm/actions# pvs
  PV VG Fmt Attr PSize PFree
  /dev/vdb ceph-5c48b342-e3f2-4401-a2b8-c9c4f485d550 lvm2 a-- <10.00g 0

juju run-action --wait ceph-osd/1 zap-disk devices="/dev/vdc" i-really-mean-it=yes

root@juju-779bb4-ceph-2:/var/lib/juju/agents/unit-ceph-osd-1/charm# ./actions/zap-disk
  Failed to find physical volume "/dev/vdc".
  Failed to find physical volume "/dev/vdc".
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0116351 s, 90.1 MB/s
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.0165592 s, 3.1 MB/s
root@juju-779bb4-ceph-2:/var/lib/juju/agents/unit-ceph-osd-1/charm#

Now use vdc as a pv and check,

root@juju-779bb4-ceph-2:/var/lib/juju/agents/unit-ceph-osd-1/charm/actions# pvcreate /dev/vdc
  Physical volume "/dev/vdc" successfully created.
root@juju-779bb4-ceph-2:/var/lib/juju/agents/unit-ceph-osd-1/charm/actions# pvs
  PV VG Fmt Attr PSize PFree
  /dev/vdb ceph-5c48b342-e3f2-4401-a2b8-c9c4f485d550 lvm2 a-- <10.00g 0
  /dev/vdc lvm2 --- 8.00g 8.00g

root@juju-779bb4-ceph-2:/var/lib/juju/agents/unit-ceph-osd-1/charm# ./actions/zap-disk
root@juju-779bb4-ceph-2:/var/lib/juju/agents/unit-ceph-osd-1/charm#

2021-08-13 11:42:23 INFO unit.ceph-osd/1.juju-log server.go:314 Cannot zap a device used by lvm

Changed in charm-ceph-osd:
assignee: nobody → nikhil kshirsagar (nkshirsagar)
Changed in charm-ceph-osd:
milestone: none → 21.10
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-osd/+/804520
Committed: https://opendev.org/openstack/charm-ceph-osd/commit/489a4ede69c72ce930a0909af5fffe2a9faa8d1a
Submitter: "Zuul (22348)"
Branch: master

commit 489a4ede69c72ce930a0909af5fffe2a9faa8d1a
Author: Nikhil Kshirsagar <email address hidden>
Date: Fri Aug 13 16:59:42 2021 +0530

    Do not zap a disk if it is used by lvm2

    If the disk being zapped is used by lvm (if it contains the
    lvm label and hasn't been pvremove'd) it's safer to simply
    bail out of zapping it than attempt teardown through a force
    pvremove, because the disk being zapped might be in fact in
    use by some LV.

    Closes-Bug: 1858519
    Change-Id: I111475c5a4584a3e367c604ab51ce2ef3789ff7f

Changed in charm-ceph-osd:
status: In Progress → Fix Committed
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.