test_configure_compression fails with 'Services not running that should be: rbd-target-api'

Bug #1960622 reported by Aurelien Lourot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Released
High
Corey Bryant
OpenStack Ceph iSCSI Charm
Fix Released
Undecided
Unassigned

Bug Description

Seen in the gate on focal:
https://review.opendev.org/c/openstack/charm-ceph-iscsi/+/828160
https://openstack-ci-reports.ubuntu.com/artifacts/29a/828160/3/check/ceph-iscsi-focal-octopus/29ab038/job-output.txt

test_configure_compression (zaza.openstack.charm_tests.ceph.tests.BlueStoreCompressionCharmOperation)
Enable compression and validate properties flush through to pool.
 ...
Checking Ceph pool compression_mode prior to change
property does not exist on pool, which is OK.
Changing "bluestore-compression-mode" to "force" on ceph-iscsi
Waiting for at least one unit with agent status "executing"
Waiting for application states to reach targeted states.
Waiting for an application to be present
Now checking workload status and status messages
Application mysql-innodb-cluster is ready.
Application vault-mysql-router is ready.
Application vault is ready.
Application ceph-osd is ready.
Application ubuntu is ready.
Application ceph-mon is ready.
TIMEOUT: Workloads didn't reach acceptable status:
Timed out waiting for 'ceph-iscsi/0'. The workload status is 'blocked' which is not one of '['active']'
Timed out waiting for 'ceph-iscsi/0'. The workload status message is 'Services not running that should be: rbd-target-api' which is not one of '['ready', 'Ready', 'Unit is ready']'
ERROR

Revision history for this message
Corey Bryant (corey.bryant) wrote (last edit ):

ubuntu@juju-7d8e6f-zaza-ff9976283e4c-1:~$ journalctl -xe

Feb 16 20:52:56 juju-7d8e6f-zaza-ff9976283e4c-1 rbd-target-api[13633]: Invaid cluster_client_name or setting in /etc/ceph/ceph.conf - [errno 2] RADOS object not found (error calling conf_read_file)

ubuntu@juju-7d8e6f-zaza-ff9976283e4c-0:/var/lib/juju/agents/unit-ceph-iscsi-0/charm$ ls -al /etc/ceph
total 12
drwxr-xr-x 2 root root 4096 Feb 16 20:53 .
drwxr-xr-x 103 root root 4096 Feb 16 20:52 ..
lrwxrwxrwx 1 root root 27 Feb 16 20:53 ceph.conf -> /etc/alternatives/ceph.conf
-rw-r--r-- 1 root root 92 Dec 8 09:31 rbdmap

There should be an /etc/ceph/iscsi-gateway.cfg that gets rendered into /etc/ceph, and that will contain "cluster_client_name = client.ceph-iscsi".

Revision history for this message
Corey Bryant (corey.bryant) wrote :

/etc/ceph/ceph.conf from a failing unit

Revision history for this message
Corey Bryant (corey.bryant) wrote (last edit ):

Tests were successful here in Sept: https://review.opendev.org/c/openstack/charm-ceph-iscsi/+/806947

I'm running a local test with this commit ^ to see if it's still successful.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I've confirmed this is limited to the new bundle changes in: https://review.opendev.org/c/openstack/charm-ceph-iscsi/+/828160

Specifically I tested with a charm built from the commit "Add support for Ceph dashboard support" and run with focal.yaml from that commit is successful, but fails with this one: https://review.opendev.org/c/openstack/charm-ceph-iscsi/+/828160/3/tests/bundles/focal.yaml

ceph-iscsi units seem to go from "1 is an invalid unit count" to "Services not running that should be: rbd-target-api".

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This appears to be happening due to this change in ceph-mon:
https://opendev.org/openstack/charm-ceph-mon/commit/da798bdd95f8fa19ce42363be6bfb268d62fe690

Running with the prior commit in the ceph-mon git tree is succesful (commit 05a03bd10d885d161b07e6acf47d030549562768).

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I'm seeing errors like this in the ceph-mon/leader juju log:

2022-02-17 21:22:36 INFO unit.ceph-mon/1.juju-log server.go:327 osd:7: Creating pool 'iscsi-foo-metadata' (replicas=3)
2022-02-17 21:22:36 ERROR unit.ceph-mon/1.juju-log server.go:327 osd:7: Failed to discover crush profile named None
2022-02-17 21:22:36 ERROR unit.ceph-mon/1.juju-log server.go:327 osd:7: Failed to discover crush profile named None

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This line in class ReplicatedPool's __init__():

self.profile_name = op.get('crush-profile', profile_name)

the value of op['crush-profile'] is None. If I force that line to the following it works:

self.profile_name = profile_name

Changed in charm-ceph-mon:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Corey Bryant (corey.bryant)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)
Changed in charm-ceph-mon:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/829970
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/d695ab315f7808f90eedad36a194f3d29d66eb4d
Submitter: "Zuul (22348)"
Branch: master

commit d695ab315f7808f90eedad36a194f3d29d66eb4d
Author: Corey Bryant <email address hidden>
Date: Fri Feb 18 12:25:37 2022 -0500

    Fix handling of profile-name

    The current code:
    self.profile_name = op.get('crush-profile', profile_name)

    will only default to profile_name if the 'crush-profile' key
    doesn't exist in the op dictionary. If the 'crush-profile' key
    exists and is set to None, the default profile_name is not used.

    This change will use the default profile_name in both cases.

    A full charm-helpers sync is done here.

    Closes-Bug: #1960622
    Change-Id: If9749e16eadfab5523d06c82f3899a83b8c6fdc1

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

I'm moving charm-ceph-iscsi to fix released since the fix has landed charm-ceph-mon.

Changed in charm-ceph-iscsi:
status: New → Fix Released
Changed in charm-ceph-mon:
milestone: none → 22.04
Changed in charm-ceph-mon:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.