lvremove occasionally fails on nodes with multiple volumes and curtin does not catch the failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
curtin (Ubuntu) |
Expired
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
For example:
Wiping lvm logical volume: /dev/ceph-
wiping 1M on /dev/ceph-
using "lvremove" on ceph-db-
Running command ['lvremove', '--force', '--force', 'ceph-db-
device-mapper: remove ioctl on (253:14) failed: Device or resource busy
Logical volume "ceph-db-dev-sdi" successfully removed
On a node with 10 disks configured as follows:
/dev/sda2 /
/dev/sda1 /boot
/dev/sda3 /var/log
/dev/sda5 /var/crash
/dev/sda6 /var/lib/
/dev/sda7 /var
/dev/sdj1 /srv
sdb and sdc are used for BlueStore WAL and DB
sdd, sde, sdf: ceph OSDs, using sdb
sdg, sdh, sdi: ceph OSDs, using sdc
across multiple servers this happens occasionally with various disks. It looks like this maybe a race condition maybe in lvm as curtin is wiping multiple volumes before lvm fails
Curtin is currently using a two force:
$ lvremove --force --force vg_lv_name
as indicating here: /github. com/CanonicalLt d/curtin/ blob/14c0560ed4 482cb3b514fbec8 d89118bd775652f /curtin/ block/clear_ holders. py#L136- L138
https:/
# LVREMOVE(8) #
Confirmation will be requested before deactivating any active LV prior to removal. LVs cannot be deactivated or removed while they are open (e.g. if they contain a mounted filesystem). Re‐moving an origin LV will also remove all dependent snapshots.
When a single force option is used, LVs are removed without confirmation, and the command will try to deactivate unused LVs.
To remove damaged LVs, two force options may be required (-ff).
-f|--force ...
Override various checks, confirmations and protections. Use with extreme caution.