Cinder-ceph stuck waiting: Ceph broker request incomplete

Bug #1976390 reported by Bas de Bruijne
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Cinder-Ceph charm
New
Undecided
Unassigned
OpenStack Glance Charm
New
Undecided
Unassigned

Bug Description

In testrun: https://solutions.qa.canonical.com/testruns/testRun/4fca98e1-e308-4f82-b802-74c491ccfb10

Cinder-ceph gets stuck waiting with message Ceph broker request incomplete:
```
cinder/0 active idle 3/lxd/2 10.246.166.187 8776/tcp Unit is ready
  cinder-ceph/2 active idle 10.246.166.187 Unit is ready
  cinder-mysql-router/2 active idle 10.246.166.187 Unit is ready
  filebeat/55 active idle 10.246.166.187 Filebeat ready.
  hacluster-cinder/2 active idle 10.246.166.187 Unit is ready and clustered
  landscape-client/55 maintenance idle 10.246.166.187 Need computer-title and juju-info to proceed
  logrotated/50 active idle 10.246.166.187 Unit is ready.
  nrpe/61 active idle 10.246.166.187 icmp,5666/tcp Ready
  public-policy-routing/32 active idle 10.246.166.187 Unit is ready
  telegraf/54 active idle 10.246.166.187 9103/tcp Monitoring cinder/0 (source version/commit cc7fa21)
cinder/1* active idle 4/lxd/2 10.246.167.106 8776/tcp Unit is ready
  cinder-ceph/0* waiting idle 10.246.167.106 Ceph broker request incomplete
  cinder-mysql-router/0* active idle 10.246.167.106 Unit is ready
  filebeat/42 active idle 10.246.167.106 Filebeat ready.
  hacluster-cinder/0* active idle 10.246.167.106 Unit is ready and clustered
  landscape-client/42 maintenance idle 10.246.167.106 Need computer-title and juju-info to proceed
  logrotated/36 active idle 10.246.167.106 Unit is ready.
  nrpe/46 active idle 10.246.167.106 icmp,5666/tcp Ready
  public-policy-routing/24 active idle 10.246.167.106 Unit is ready
  telegraf/42 active idle 10.246.167.106 9103/tcp Monitoring cinder/1 (source version/commit cc7fa21)
cinder/2 active idle 5/lxd/2 10.246.166.236 8776/tcp Unit is ready
  cinder-ceph/1 active idle 10.246.166.236 Unit is ready
  cinder-mysql-router/1 active idle 10.246.166.236 Unit is ready
  filebeat/48 active idle 10.246.166.236 Filebeat ready.
  hacluster-cinder/1 active idle 10.246.166.236 Unit is ready and clustered
  landscape-client/48 maintenance idle 10.246.166.236 Need computer-title and juju-info to proceed
  logrotated/42 active idle 10.246.166.236 Unit is ready.
  nrpe/52 active idle 10.246.166.236 icmp,5666/tcp Ready
  public-policy-routing/27 active idle 10.246.166.236 Unit is ready
  telegraf/48 active idle 10.246.166.236 9103/tcp Monitoring cinder/2 (source version/commit cc7fa21)
```

The relation with ceph is rendered correctly, and in the logs we only see this message indicating a problem:
```
2022-05-31 12:00:59 DEBUG unit.cinder-ceph/0.juju-log server.go:327 Ignoring legacy broker_rsp without unit key as remote service supports unit specific replies
```

Link to crashdumps:
https://oil-jenkins.canonical.com/artifacts/4fca98e1-e308-4f82-b802-74c491ccfb10/index.html

Revision history for this message
Marcus Boden (marcusboden) wrote :

Hi,
I ran into this issue as well. In my case, for some reason the ceph config on the faulty cinder-ceph unit wasn't generated correctly and was lacking the necessary entries to connect to ceph.
I fixed that by running manually running a config changed hook (juju run -u cinder-ceph/0 hooks/config-changed) on the affected unit.
That generated the config (I restarted all cinder services on the unit as well) but the unit was still stuck in waiting.
My guess from skimming the code: It is waiting for a response from ceph that was never really sent. I commented out the check if the response had already been sent (https://opendev.org/openstack/charm-cinder-ceph/src/commit/a973d9351ed6123d2be4dce909acca91bcca245d/charmhelpers/contrib/storage/linux/ceph.py#L2220) and therefore forcing it to create a new request when I manually ran the ceph-relation-changed hook. I think just removing the "broker-rsp-cinder-ceph-0" relation data for ceph-mon might also have worked, without hacking the code.

Anyway, I hope this helps any future travelers coming across this issue.

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

This is still a relevant bug, subscribing Marosg for follow up questions

Revision history for this message
macchese (max-liccardo) wrote :

this bug affects me too after I deleted a mon unit by juju.
The unit was removed but the mon still remained into the ceph cluster and into the glance and cinder-ceph ceph.conf file

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.