rbd-target-api crashes with `blacklist removal failed`

Bug #1969775 reported by Liam Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph-iscsi (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

[Impact]
 * ceph-iscsi on Focal talking to a Pacific or later Ceph cluster

 * rbd-target-api service fails to start if there is a blocklist
   entry for the unit.

 * When the rbd-target-api service starts it checks if any of the
   ip addresses on the machine it is running on are listed as
   blocked. If there are entries it tries to remove them. When it
   issues the block removal command it checks stdout from the
   removal command looking for the string `un-blacklisting`.
   However from Pacific onward a successful unblocking returns
   `un-blocklisting` instead (https://github.com/ceph/ceph/commit/dfd01d765304ed8783cef613930e65980d9aee23)

[Test Plan]

 If an existing ceph-iscsi deployment is available then skip to
 step 3.

 1) Deploy the bundle below (tested with OpenStack provider).

series: focal
applications:
  ceph-iscsi:
    charm: cs:ceph-iscsi
    num_units: 2
  ceph-osd:
    charm: ch:ceph-osd
    num_units: 3
    storage:
      osd-devices: 'cinder,10G'
    options:
      osd-devices: '/dev/test-non-existent'
      source: yoga
    channel: latest/edge
  ceph-mon:
    charm: ch:ceph-mon
    num_units: 3
    options:
      monitor-count: '3'
      source: yoga
    channel: latest/edge
relations:
  - - 'ceph-mon:client'
    - 'ceph-iscsi:ceph-client'
  - - 'ceph-osd:mon'
    - 'ceph-mon:osd'

 2) Connect to ceph-iscsi unit:

juju ssh -m zaza-a1d88053ab85 ceph-iscsi/0

 3) Stop rbd-target-api via systemd to make test case clearer:

sudo systemctl stop rbd-target-api

 4) Add 2 blocklist entries for this unit (due to another issue the ordering of the output from `osd blacklist ls` matters which can lead to the reproduction of this bug being intermittent. To avoid this add two entries which ensures there is always an entry for this node in the list of blocklist entries to be removed).

sudo ceph -n client.ceph-iscsi --conf /etc/ceph/iscsi/ceph.conf osd blacklist add $(hostname --all-ip-addresses | awk '{print $1}'):0/1
sudo ceph -n client.ceph-iscsi --conf /etc/ceph/iscsi/ceph.conf osd blacklist add $(hostname --all-ip-addresses | awk '{print $1}'):0/2
sudo ceph -n client.ceph-iscsi --conf /etc/ceph/iscsi/ceph.conf osd blacklist ls
  listed 2 entries
  172.20.0.135:0/2 2022-02-23T11:14:54.850352+0000
  172.20.0.135:0/1 2022-02-23T11:14:52.502592+0000

 5) Attempt to start service:

sudo /usr/bin/python3 /usr/bin/rbd-target-api

At this point the process should be running in the foreground but instead
it will die. The log from the service will have an entry like:

2022-04-21 12:35:21,695 CRITICAL [gateway.py:51:ceph_rm_blacklist()] - blacklist removal failed. Run 'ceph -n client.ceph-iscsi --conf /etc/ceph/iscsi/ceph.conf osd blacklist rm 172.20.0.156:0/1'

[Where problems could occur]

 * Problems could occur with the service starting as this blocklist check is done at startup.

 * Blocklist entries could fail to be removed.

This issue is very similar to Bug #1883112

Tags: patch
Liam Young (gnuoy)
description: updated
Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

The fix proposed in https://bugs.launchpad.net/ubuntu/+source/ceph-iscsi/+bug/1883112 provides a partial solution (changing the test to use bytes instead of strings). However, that in itself is insufficient since that only helps to avoid TypeError exceptions - An additional fix is needed so that the messages being checked for inclusion are changed from `blacklist` to `blocklist` (they could be added instead of removed, to be safe).

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Attached is a debdiff adding support for the newer "blocklist" syntax

tags: added: patch
Revision history for this message
James Page (james-page) wrote :

@chris.macnaughton

looked at your proposed patch and have questions.

Do we not need more of:

   https://github.com/ceph/ceph-iscsi/commit/4d04457ddaa9103cafccdad239b0dc5f1cbd0530

as there seems to be changes in both the cli interface as well as the return values of the command?

That commit is not a clean cherry-pick to 3.4 but its not far off.

If that's not all needed could you please annotate your patch with headers so we have documentation around the approach and that a different fix can be found in later releases of Ceph.

Thanks!

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I've attached an updated debdiff with more commentary about the fix.

The TLDR is that the CLI still supports the "blacklist" syntax in addition to the blocklist syntax at newer Ceph releases, but the output is always "blocklist". If we want, I can cherry-picking the entire upstream commit related to the syntax change, but a more minimal change seems more appropriate for an SRU (the upstream change seems to also change detection of elements in the block/black-list).

Revision history for this message
Robie Basak (racb) wrote :

What's the status of this bug in Kinetic, please?

Robie Basak (racb)
Changed in ceph-iscsi (Ubuntu):
status: New → Incomplete
Revision history for this message
Robie Basak (racb) wrote : Proposed package upload rejected

An upload of ceph-iscsi to focal-proposed has been rejected from the upload queue for the following reason: "Needs fixing in the development release first".

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for ceph-iscsi (Ubuntu) because there has been no activity for 60 days.]

Changed in ceph-iscsi (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.