disks becomes unusable if add-disk action fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
New
|
Undecided
|
Unassigned |
Bug Description
My Ceph deployment had a problem with a removed disk. Despite OSD was not listed Ceph, it's auth key was still exists.
Therefore whenever I try to add a new OSD to deployment was failing due to ceph attempts to use same osd number again.
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.5M 1 loop /snap/core18/2074
loop1 7:1 0 55.3M 1 loop
loop2 7:2 0 70.6M 1 loop
loop3 7:3 0 55.5M 1 loop
loop4 7:4 0 32.3M 1 loop
loop5 7:5 0 70.3M 1 loop /snap/lxd/21029
loop6 7:6 0 32.3M 1 loop /snap/snapd/12883
loop7 7:7 0 67.6M 1 loop
loop8 7:8 0 55.4M 1 loop /snap/core18/2128
loop10 7:10 0 32.3M 1 loop /snap/snapd/13170
loop11 7:11 0 61.8M 1 loop /snap/core20/1081
loop12 7:12 0 67.3M 1 loop /snap/lxd/21545
sda 8:0 1 232.9G 0 disk
└─sda1 8:1 1 232.9G 0 part /
sdb 8:16 1 894.3G 0 disk
└─ceph-
juju run-action --wait ceph-osd/0 add-disk osd-devices=
unit-ceph-osd-0:
UnitId: ceph-osd/0
id: "1533"
message: exit status 1
results:
ReturnCode: 1
Stderr: |
partx: /dev/sdb: failed to read partition table
Failed to find physical volume "/dev/sdb".
Failed to find physical volume "/dev/sdb".
Can't open /dev/sdb exclusively. Mounted filesystem?
Can't open /dev/sdb exclusively. Mounted filesystem?
Traceback (most recent call last):
File "/var/lib/
request = add_device(
File "/var/lib/
File "/var/lib/
File "/var/lib/
cmd = _ceph_volume(dev,
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/usr/lib/
raise CalledProcessEr
subproces
status: failed
timing:
completed: 2021-10-02 00:44:47 +0000 UTC
enqueued: 2021-10-02 00:44:45 +0000 UTC
started: 2021-10-02 00:44:45 +0000 UTC
It's a deadlock situation for ceph-osd, none of zap-disk or add-disk command works.
I discovered that the volume listed on lsblk meanwhile none of vgs, pgs or lgs returns nothing. There was a backup for the vg however vgcfgrestore was denying to restore as well.
vgcfgrestore ceph-359ab4d2-
Volume group ceph-359ab4d2-
WARNING: Found 1 active volume(s) in volume group "ceph-359ab4d2-
Restoring VG with active LVs, may cause mismatch with its metadata.
Do you really want to proceed with restore of volume group "ceph-359ab4d2-
WARNING: Couldn't find device with uuid 1E7aEI-
Cannot restore Volume Group ceph-359ab4d2-
Restore failed.
The solution is find and remove vg manually. Then zap-disk and add-disk commands starts to work again.
dmsetup info
dmsetup remove <failed vg name>
These commands should be implemented into ceph-osd charms at least as an additional action to clear volumes properly.
we have also been observing this error and got to this bug. /bugs.launchpad .net/charm- ceph-osd/ +bug/1858519
A related bug to this would be https:/
juju version 2.9.44.1
ceph-osd charm: 15.2.17
channel: octopus/stable