Comment 3 for bug 1760065

Revision history for this message
Magnus Lööf (magnus-loof) wrote : Re: ceph luminous insufficient nova caps

So we need to add `allow command "osd blacklist" to the `client.cinder`. Otherwise after a hypervisor crash, instances will not boot with "INACCESSIBLE BOOT VOLUME"

Analysis:

When nova-compute boots an instance with a boot volume in Ceph, it places a lock on the volume to prevent data corruption.

If the hypervisor crashes, it cannot release the lock - but it tries to send a blacklist op to "steal" the lock, which fails since `cinder.client` does not have that privilege.

Workaround:

1. Determine volume ID using Horizon or openstack cli
1. Ensure instance is not in a reboot loop but "shutdown"
1. Enter into a Ceph shell `sudo docker exec -it ceph_mon bash`
1. List the lock on the volume `rbd lock ls --pool volumes volume-<VOLUME ID>` This will show the lock. Take note of the client ID and the lock ID.
1. Remove the lock `rbd lock rm --pool volumes volume-<VOLUME ID> "<lock ID>" <client ID>`

Fix:
The cinder.client should have the blacklist op. Follow this article https://access.redhat.com/solutions/3391211 to fix it:

1. Examine current caps: `ceph auth list`
1. `ceph auth export client.cinder -o client.cinder.export`
1. Set `caps mon = allow r, allow command "osd blacklist"` in the `client.cinder.export` file.
1. `ceph auth import -i client.cinder.export`
1. Verify with `ceph auth list`

Restart `cinder_volume` and `nova_compute` containers.

Permanent fix:
The client.cinder should have the blacklist op as part of Kolla deployment.