ceph-access relation not always completing

Bug #1711642 reported by Frode Nordahl on 2017-08-18
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack cinder-ceph charm
Critical
Chris MacNaughton

Bug Description

For OpenStack Ocata and newer a ceph-access relation between cinder-ceph and nova-compute is required for Nova's use of RBD for Boot from Volume among other things.

However, the relation does not always complete, as a consequence nova-compute never receives the required information for setting up the secrets with libvirt on the nova-compute units.

I am currently trying to figure out what leads to this situation and will follow up with a remedy when I find it.

Some log excerpts from /var/log/juju/unit-cinder-ceph-0.log:
2017-08-18 13:46:54 INFO juju-log ceph:24: Making dir /var/lib/charm/cinder-ceph root:root 555
2017-08-18 13:46:54 INFO juju-log ceph:24: Making dir /etc/ceph root:root 555
2017-08-18 13:46:55 INFO juju-log ceph:24: Registered config file: /var/lib/charm/cinder-ceph/ceph.conf
2017-08-18 13:47:00 DEBUG juju-log ceph:24: Generating template context for ceph
2017-08-18 13:47:02 INFO juju-log ceph:24: Missing required data: key auth
2017-08-18 13:47:03 INFO juju-log ceph:24: ceph relation's interface, ceph, is related awaiting the following data from the relationship: key, auth.
2017-08-18 13:47:15 INFO juju-log ceph-access:57: Making dir /var/lib/charm/cinder-ceph root:root 555
2017-08-18 13:47:15 INFO juju-log ceph-access:57: Making dir /etc/ceph root:root 555
2017-08-18 13:47:16 INFO juju-log ceph-access:57: Registered config file: /var/lib/charm/cinder-ceph/ceph.conf
2017-08-18 13:47:16 INFO juju-log ceph-access:57: Unknown hook ceph-access-relation-changed - skipping.
2017-08-18 13:47:20 DEBUG juju-log ceph-access:57: Generating template context for ceph
2017-08-18 13:47:23 INFO juju-log ceph-access:57: Missing required data: key auth
2017-08-18 13:47:23 INFO juju-log ceph-access:57: ceph relation's interface, ceph, is related awaiting the following data from the relationship: key, auth.
2017-08-18 13:47:38 INFO juju-log ceph-access:57: Making dir /var/lib/charm/cinder-ceph root:root 555
2017-08-18 13:47:38 INFO juju-log ceph-access:57: Making dir /etc/ceph root:root 555
2017-08-18 13:47:38 INFO juju-log ceph-access:57: Registered config file: /var/lib/charm/cinder-ceph/ceph.conf
2017-08-18 13:47:38 DEBUG juju-log ceph-access:57: Generating template context for ceph
2017-08-18 13:47:41 INFO juju-log ceph-access:57: Missing required data: key auth
2017-08-18 13:47:42 INFO juju-log ceph-access:57: Deferring key provision until ceph relation complete
2017-08-18 13:47:48 DEBUG juju-log ceph-access:57: Generating template context for ceph
2017-08-18 13:47:48 INFO juju-log ceph-access:57: Missing required data: key auth
2017-08-18 13:47:48 INFO juju-log ceph-access:57: ceph relation's interface, ceph, is related awaiting the following data from the relationship: key, auth.
2017-08-18 13:48:03 INFO juju-log ceph-access:57: Making dir /var/lib/charm/cinder-ceph root:root 555
2017-08-18 13:48:03 INFO juju-log ceph-access:57: Making dir /etc/ceph root:root 555
2017-08-18 13:48:03 INFO juju-log ceph-access:57: Registered config file: /var/lib/charm/cinder-ceph/ceph.conf
2017-08-18 13:48:04 INFO juju-log ceph-access:57: Unknown hook ceph-access-relation-changed - skipping.
2017-08-18 13:48:09 DEBUG juju-log ceph-access:57: Generating template context for ceph
2017-08-18 13:48:11 INFO juju-log ceph-access:57: Missing required data: key auth
2017-08-18 13:48:11 INFO juju-log ceph-access:57: ceph relation's interface, ceph, is related awaiting the following data from the relationship: key, auth.
2017-08-18 13:48:30 INFO juju-log ceph:24: Making dir /var/lib/charm/cinder-ceph root:root 555
2017-08-18 13:48:30 INFO juju-log ceph:24: Making dir /etc/ceph root:root 555
2017-08-18 13:48:30 INFO juju-log ceph:24: Registered config file: /var/lib/charm/cinder-ceph/ceph.conf
2017-08-18 13:48:30 DEBUG juju-log ceph:24: Generating template context for ceph
2017-08-18 13:48:33 INFO juju-log ceph:24: Missing required data: key auth
2017-08-18 13:48:33 INFO juju-log ceph:24: ceph relation incomplete. Peer not ready?
2017-08-18 13:48:38 DEBUG juju-log ceph:24: Generating template context for ceph
2017-08-18 13:48:38 INFO juju-log ceph:24: Missing required data: key auth
2017-08-18 13:48:39 INFO juju-log ceph:24: ceph relation's interface, ceph, is related awaiting the following data from the relationship: key, auth.
2017-08-18 13:48:43 INFO juju-log ceph:24: Making dir /var/lib/charm/cinder-ceph root:root 555
2017-08-18 13:48:44 INFO juju-log ceph:24: Making dir /etc/ceph root:root 555
2017-08-18 13:48:44 INFO juju-log ceph:24: Registered config file: /var/lib/charm/cinder-ceph/ceph.conf
2017-08-18 13:48:51 DEBUG juju-log ceph:24: Generating template context for ceph
2017-08-18 13:49:00 INFO juju-log ceph:24: Unit is ready
2017-08-18 13:49:31 INFO juju-log ceph:24: Making dir /var/lib/charm/cinder-ceph root:root 555
2017-08-18 13:49:31 INFO juju-log ceph:24: Making dir /etc/ceph root:root 555
2017-08-18 13:49:32 INFO juju-log ceph:24: Registered config file: /var/lib/charm/cinder-ceph/ceph.conf
2017-08-18 13:49:32 DEBUG juju-log ceph:24: Generating template context for ceph
2017-08-18 13:49:41 INFO ceph-relation-changed creating /etc/ceph/ceph.client.cinder-ceph.keyring
2017-08-18 13:49:41 INFO ceph-relation-changed added entity client.cinder-ceph auth auth(auid = 18446744073709551615 key=AQA/75ZZNLsPORAAQ7Gp+3xtI8dCUTKQRwOLVw== with 0 caps)
2017-08-18 13:49:41 DEBUG juju-log ceph:24: Created new ceph keyring at /etc/ceph/ceph.client.cinder-ceph.keyring.
2017-08-18 13:49:42 DEBUG juju-log ceph:24: Sending request 15936915-841c-11e7-864d-00163e35c28c
2017-08-18 13:49:49 INFO juju-log ceph:24: Unit is ready

After the ceph relation is complete and the auth and key is in place no attempt is made at completing the ceph-access relation ever again.

Frode Nordahl (fnordahl) on 2017-08-18
tags: added: backport-potential sts
summary: - ceph-access relation not allways completing
+ ceph-access relation not always completing
Changed in charm-cinder-ceph:
milestone: none → 17.08
importance: Undecided → Critical
status: New → Triaged
Dmitrii Shcherbakov (dmitriis) wrote :

I can confirm that (Ocata, 17.02 charms), one of the consequences is that you may have an incorrect secret passed to QEMU.

As a result it will fail with "process exited while connecting to monitor".

2017-08-18 13:02:47.123 13832 ERROR nova.compute.manager [instance: cd682722-6417-4df1-92b5-dd4c5215e6c7] libvirtError: internal error: process exited while connecting to monitor: 2017-08-18T13:02:46.126047Z qemu-system-x86_64: -drive file=rbd:cinder-ceph/volume-2aabe89d-6feb-44f5-8bf0-85f270546be9:id=cinder-ceph:key=AQBfjZVZ+BtmKBAAsNrqkZfDBSc/GdoufBtHPA==:auth_supported=cephx\;none:mon_host=10.44.91.10\:6789\;10.44.91.23\:6789\;10.44.91.26\:6789,format=raw,if=none,id=drive-virtio

I had to remove/add a relation between nova-compute-kvm and cinder-ceph as a workaround.

The secret was simply incorrect.

tags: added: cpec
Dmitrii Shcherbakov (dmitriis) wrote :

I think it is possible to simulate that by completely removing ceph-mon from the model, adding it again and making the necessary relations (in this case OSDs might need to be removed as well as it seems that they are not added to the tree after remove/add ceph-mon relation).

Can you please upload a bundle that you used to deploy this? I want to confirm that it still behaves this way on master

Frode Nordahl (fnordahl) wrote :

The last deployment was instrumented using a tool so I do not have the actual bundle file, but I can show you the versions.

cinder-ceph 10.0.2 active 1 cinder-ceph jujucharms 241 ubuntu
nova-compute-kvm 15.0.5 active 3 nova-compute jujucharms 327 ubuntu

Also I can add that adding the 'ceph-access-relation-changed' to cinder_hooks.py, then removing the relation, waiting a bit, and then re-adding it again can be used to unwedge the incomplete relation.

I will try to reproduce with standard bundle as well and report back/upload.

Frode Nordahl (fnordahl) wrote :

Here is a consistent reproducer for this issue using multiple bundles simulating a staged deployment pattern.

How to reproduce:
1) juju deploy bundle-ocata-1711642.yaml

2) Wait for deployment activity to subside/complete as far as it can get without relations

3) juju deploy bundle-ocata-1711642-relations-1.yaml

4) Wait for deployment activity to subside/complete as far as it can get without remaining
   relations

5) Verify that ceph-access relation is incomplete
 juju run --unit nova-compute/0 relation-ids ceph-access
 juju run --unit cinder-ceph/0 -- relation-get -r ceph-access:N - nova-compute/0 # Replace N

6) juju deploy bundle-ocata-1711642-relations-2.yaml

7) Wait for deployment activity to complete

8) Re-verify that ceph-access relation is still incomplete
 juju run --unit nova-compute/0 relation-ids ceph-access
 juju run --unit cinder-ceph/0 -- relation-get -r ceph-access:N - nova-compute/0 # Replace N
9) And to concclude, verify that the cinder-ceph secret is not added to libvirt secret store on
   nova-compute units.
 juju ssh nova-compute/0 -- sudo virsh secret-list

Frode Nordahl (fnordahl) wrote :
Frode Nordahl (fnordahl) wrote :
Frode Nordahl (fnordahl) wrote :

If the ceph cluster is not bootstrapped when the relation is added, we will never try again: https://github.com/openstack/charm-cinder-ceph/blob/master/hooks/cinder_hooks.py#L190

Unless the leader-settings change

Changed in charm-cinder-ceph:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Frode Nordahl (fnordahl) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/496230

Changed in charm-cinder-ceph:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/496230
Committed: https://git.openstack.org/cgit/openstack/charm-cinder-ceph/commit/?id=b928e65019aaa792a6caede409d96b41b2eba10b
Submitter: Jenkins
Branch: master

commit b928e65019aaa792a6caede409d96b41b2eba10b
Author: Chris MacNaughton <email address hidden>
Date: Tue Aug 22 15:30:15 2017 +0200

    Ensure we setup the ceph-access relation

    Change-Id: I0d95523fd84338ca55e141c12231f82bf99056df
    Closes-bug: #1711642

Changed in charm-cinder-ceph:
status: In Progress → Fix Committed
James Page (james-page) on 2017-09-12
Changed in charm-cinder-ceph:
status: Fix Committed → Fix Released
Ben (bjenkins-x) wrote :
Download full text (4.4 KiB)

This may still be an issue with nova-compute(273 in charm store) and cinder-ceph(227 in charm store) when using Pike as the source.

The MON cluster complains about the following during a VM launch attempt.
cephx server client.cinder: unexpected key: req.key=a880107da8ddc92a expected_key=1e592938b7df5bf7

the compute node seems to only use the client.cinder key and not the custom names as explained below.

I use a custom name by doing the following.
juju deploy --config config.yaml cinder-ceph cinder-ceph-storage1

on the MON when I run ceph auth list I get the expected keys.

...
client.cinder
        key: ***************************************
        caps: [mon] allow r
        caps: [osd] allow rwx
client.cinder-backup
        key: ***************************************
        caps: [mon] allow r
        caps: [osd] allow rwx
client.cinder-ceph-storage1
        key: ***************************************
        caps: [mon] allow r
        caps: [osd] allow rwx
client.glance
        key: ***************************************
        caps: [mon] allow r
        caps: [osd] allow rwx
client.nova-compute
        key: ***************************************
        caps: [mon] allow r
        caps: [osd] allow rwx
...

cinder-ceph-storage1 keys are copied correctly to the compute node however the client.cinder key is never copied to the compute nodes.

on a compute node

root@compute-001:/etc/ceph# virsh secret-list
 UUID Usage
--------------------------------------------------------------------------------
 ************************************ ceph client.nova-compute secret
 ************************************ ceph client.cinder-ceph-storage1 secret

#################### Begin from syslog #############################

Sep 22 18:02:43 compute-001 qemu-system-x86_64: 2017-09-22 18:02:43.693555 7fe420734c00 -1 asok(0x55f6ea217760) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/rbd-client-18701.asok': (13) Permission denied

Sep 22 18:02:43 compute-001 qemu-system-x86_64: 2017-09-22 18:02:43.697032 7fe420734c00 0 librados: client.cinder authentication error (1) Operation not permitted

...

Sep 22 18:02:43 compute-001 libvirtd[4148]: 2017-09-22 18:02:43.806+0000: 4148: error : qemuMonitorIORead:595 : Unable to read from monitor: Connection reset by peer

Sep 22 18:02:43 compute-001 libvirtd[4148]: 2017-09-22 18:02:43.807+0000: 4148: error : qemuProcessReportLogError:1859 : internal error: qemu unexpectedly closed the monitor:

...

Sep 22 18:02:43 compute-001 libvirtd[4148]: 2017-09-22T18:02:43.660188Z qemu-system-x86_64: -drive file=rbd:cinder/volume-94ebc6b4-3e50-44fd-b7f8-bea63f6d2fbb:id=cinder:auth_supported=cephx\;none:mon_host=10.5.0.5\:6789\;10.5.0.6\:6789\;10.5.0.7\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,serial=94ebc6b4-3e50-44fd-b7f8-bea63f6d2fbb,cache=none,discard=unmap: 'serial' is deprecated, please use the corresponding option of '-device' instead

Sep 22 18:02:43 compute-001 libvirtd[4148]: 2017-09-22T18:02:43.698865Z qemu-system-x86_64: -drive file=rbd:cinder...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers