Ceph client.cinder insufficient privileges for blacklist op

Bug #1760065 reported by Tomasz Setkowski
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Michal Nasiadka
Train
Fix Released
Medium
Michal Nasiadka

Bug Description

Ceph luminous has new caps are needed for rbd access.
In particular

    caps mon = "allow r, allow command "osd blacklist""

See also issue http://tracker.ceph.com/issues/21353

Current setup creates nova with access
mon 'allow r'
osd 'allow class-read object_prefix rbd_children, allow rwx pool={{ ceph_cinder_hdd_pool_name }}' ...

This is fine in normal usage. However if you kill VM process with kill -9 or just hard reset the hypervisor. There will be a stale exclusive lock on the rbd. Then when VM is booted again, it will produce nasty IO errors, not being able to actually write anything to the device.

The correct approach for luminous as suggested in the issue is to use 'profile rbd' and 'profile rbd-real-only' for user which indends to access rbd volumes.
Such as

   client.nova mon 'profile rbd' osd 'profile rbd pool={{ ceph_cinder_hdd_pool_name }},

Tags: ceph cinder
description: updated
Revision history for this message
George Zhao (georgezhao) wrote :

Having the same issue, change client.nova auth caps only fix the vms that don't have attached volumes.
Need change client.cinder as well if there are attached volumes to vm

Revision history for this message
Benjamin Bendel (benvandamme) wrote :

Is there any quick workaround to release the lock?

Revision history for this message
Magnus Lööf (magnus-loof) wrote :

So we need to add `allow command "osd blacklist" to the `client.cinder`. Otherwise after a hypervisor crash, instances will not boot with "INACCESSIBLE BOOT VOLUME"

Analysis:

When nova-compute boots an instance with a boot volume in Ceph, it places a lock on the volume to prevent data corruption.

If the hypervisor crashes, it cannot release the lock - but it tries to send a blacklist op to "steal" the lock, which fails since `cinder.client` does not have that privilege.

Workaround:

1. Determine volume ID using Horizon or openstack cli
1. Ensure instance is not in a reboot loop but "shutdown"
1. Enter into a Ceph shell `sudo docker exec -it ceph_mon bash`
1. List the lock on the volume `rbd lock ls --pool volumes volume-<VOLUME ID>` This will show the lock. Take note of the client ID and the lock ID.
1. Remove the lock `rbd lock rm --pool volumes volume-<VOLUME ID> "<lock ID>" <client ID>`

Fix:
The cinder.client should have the blacklist op. Follow this article https://access.redhat.com/solutions/3391211 to fix it:

1. Examine current caps: `ceph auth list`
1. `ceph auth export client.cinder -o client.cinder.export`
1. Set `caps mon = allow r, allow command "osd blacklist"` in the `client.cinder.export` file.
1. `ceph auth import -i client.cinder.export`
1. Verify with `ceph auth list`

Restart `cinder_volume` and `nova_compute` containers.

Permanent fix:
The client.cinder should have the blacklist op as part of Kolla deployment.

tags: added: ceph
tags: added: cinder
Revision history for this message
Magnus Lööf (magnus-loof) wrote :

If this problem occurrs, there are `Mar 22 15:55:24 dev-ceph-01 docker: 2018-03-22 15:55:24.018624 7f7b084aa700 0 log_channel(audit) log [INF] : from='client.? 10.0.0.1:0/2275353734' entity='client.openstack' cmd=[{"prefix": "osd blacklist", "blacklistop": "add", "addr": "10.0.0.2:0/3833716830"}]: access denied` entries in the `/var/lib/docker/volumes/kolla_logs/_data/ceph/ceph.log` file.

Revision history for this message
Magnus Lööf (magnus-loof) wrote :

(cut from article above, correct except for `entity=`client.cinder`

Revision history for this message
Magnus Lööf (magnus-loof) wrote :

Seeing this in Rocky btw

summary: - ceph luminous insufficient nova caps
+ Ceph client.cinder insufficient privileges for blacklist op
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/678296

Changed in kolla-ansible:
assignee: nobody → Gaëtan Trellu (goldyfruit)
status: New → In Progress
Changed in kolla:
assignee: nobody → Gaëtan Trellu (goldyfruit)
no longer affects: kolla
Changed in kolla-ansible:
status: In Progress → Triaged
importance: Undecided → Medium
milestone: none → 9.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/687544

Changed in kolla-ansible:
assignee: Gaëtan Trellu (goldyfruit) → Michal Nasiadka (mnasiadka)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/687544
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=bdc8df0c9066aadd442b6f63db5daec6dcda96f1
Submitter: Zuul
Branch: master

commit bdc8df0c9066aadd442b6f63db5daec6dcda96f1
Author: Michal Nasiadka <email address hidden>
Date: Wed Oct 9 14:17:03 2019 +0200

    Change ceph_client caps to use profile rbd

    Using profiles in cephx is the recommended way since Mimic,
    this also adds support for blacklist ops.

    Change-Id: Ib9f65644637a5761c6cd7ca8925afc6bb2b8d5f5
    Closes-Bug: #1760065

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 9.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by "Dr. Jens Harbott <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/678296
Reason: obsolete, ceph is no longer deployed by kolla-ansible

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.