charms: nova/cinder/ceph rbd integration broken on Ocata

Bug #1671422 reported by James Page on 2017-03-09
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Charm Guide
High
James Page
OpenStack cinder charm
Critical
Liam Young
OpenStack cinder-ceph charm
Critical
James Page
OpenStack nova-compute charm
Critical
James Page

Bug Description

https://github.com/openstack/nova/commit/b89efa3ef611a1932df0c2d6e6f30315b5111a57 introduced a change in Ocata where any data provided by cinder for rbd block devices is preferred over any local libvirt sectional configuration for rbd (which was used in preference in the past).

As a result, its not possible to attach ceph block devices in instances in a charm deployed Ocata; the secret_uuid configuration is not populated in the cinder configuration file, and in any case the username on the compute units won't match the username for ceph being used on the cinder units (as compute and cinder units get different keys created) so I don't think the key created on the compute units will actually work with the username provided from cinder.

I'm not 100% convinced this is a great change in behaviour; the cinder and nova keys have much the same permissions for correct operation (rwx on images, volumes and vms groups) however it does mean that the nova-compute units have to have the same keys as the cinder units. A key disclosure/compromise on a cinder unit would require revoke and re-issue across a large number of units (as compute units are likely to be 100-1000's whereas the number of cinder units will be minimal.

James Page (james-page) on 2017-03-09
Changed in charm-cinder-ceph:
importance: Undecided → Critical
Changed in charm-nova-compute:
importance: Undecided → Critical
Changed in charm-cinder-ceph:
status: New → Triaged
Changed in charm-nova-compute:
status: New → Triaged
summary: - nova/cinder/ceph rbd integration broken on Ocata
+ charms: nova/cinder/ceph rbd integration broken on Ocata
James Page (james-page) on 2017-03-09
description: updated
James Page (james-page) wrote :

As a quick fix I've tried adding the uuid for the nova-compute created secret to cinder (this is a global constant for the charms):

<disk type="network" device="disk">
  <driver name="qemu" type="raw" cache="none"/>
  <source protocol="rbd" name="cinder-ceph/volume-bdff2036-c0da-438d-aa95-d882d408df92">
    <host name="10.5.25.226" port="6789"/>
    <host name="10.5.25.227" port="6789"/>
    <host name="10.5.25.229" port="6789"/>
  </source>
  <auth username="cinder-ceph">
    <secret type="ceph" uuid="514c9fca-8cbe-11e2-9c52-3bc8c7819472"/>
  </auth>
  <target bus="virtio" dev="vdb"/>
  <serial>bdff2036-c0da-438d-aa95-d882d408df92</serial>
</disk>

results in the correct XML, however the username mismatches with the keys so the attach fails.

James Page (james-page) wrote :

Error from a boot from volume check:

Details: {'message': 'internal error: process exited while connecting to monitor: 2017-03-09T10:45:23.019332Z qemu-system-x86_64: -drive file=rbd:cinder-ceph/volume-05e31583-5888-4a24-b92f-7477b3a398e7:id=cinder-ceph:key=AQASQMBYbtROJRAAtDafRDTQkq6oiybMpeMHEw==:auth_supported=', 'code': 500, 'created': '2017-03-09T10:45:28Z'

James Page (james-page) wrote :

(note this is from a transient test environment so keys not sensitive)

James Page (james-page) wrote :

cross referencing with bug 1635008

James Page (james-page) wrote :

Resolution of this in the charms might look something like this:

1) addition of new relation between cinder-ceph and nova-compute

cinder-ceph will need to provide its cephx key + UUID that it will use in its configuration files for this purpose; this cannot be fixed (as in nova-compute) as its possible multiple backends will be in use, so the UUID must be specific to the backend (so some complexity in HA deployments with regards to which unit will generate the UUID and how the other units will observe and consume the UUID - via leader storage).

2) updates to nova-compute

consumption of the new interface, storage of secret in libvirt using the UUID provided from the cinder-ceph charm.

I'm still not hugely keen on having the compute units share keys with the cinder-ceph units.

James Page (james-page) wrote :

(and to confirm - updating the secret on compute nodes to use the cinder-ceph key results in a functioning cloud - but that's not a fix - just to confirm username and key must match).

description: updated
Changed in charm-cinder-ceph:
milestone: none → 17.05
Changed in charm-nova-compute:
milestone: none → 17.05
James Page (james-page) on 2017-03-09
Changed in charm-cinder-ceph:
assignee: nobody → James Page (james-page)
Changed in charm-nova-compute:
assignee: nobody → James Page (james-page)
Changed in charm-cinder-ceph:
status: Triaged → In Progress
Changed in charm-nova-compute:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/443609
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=1467cbb1b3883cfe7c47ac484b2da96d13fa9e13
Submitter: Jenkins
Branch: master

commit 1467cbb1b3883cfe7c47ac484b2da96d13fa9e13
Author: James Page <email address hidden>
Date: Thu Mar 9 12:51:25 2017 +0000

    Fix support for cinder ceph rbd in Ocata

    As of Ocata, the ceph key used to access a specific Cinder
    Ceph backend must match the name of the key used by cinder,
    with an appropriate secret configured for libvirt use with
    the cephx key used by the cinder-ceph charm.

    Add support for the new ceph-access relation to allow
    nova-compute units to communicate with multiple ceph
    backends using different cephx keys and user names.

    The side effect of this change is that nova-compute will
    have a key for use with its own ephemeral backend ceph
    access, and a key for each cinder ceph backend configured
    in the deployment.

    Change-Id: I638473fc46c99a8bfe301f9a0c844de9efd47a2a
    Closes-Bug: 1671422

Changed in charm-nova-compute:
status: In Progress → Fix Committed
James Page (james-page) on 2017-03-14
Changed in charm-guide:
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Page (james-page)
milestone: none → 17.05

Reviewed: https://review.openstack.org/443612
Committed: https://git.openstack.org/cgit/openstack/charm-cinder-ceph/commit/?id=62613456e7a04ac348d4589145a550323dfa5a55
Submitter: Jenkins
Branch: master

commit 62613456e7a04ac348d4589145a550323dfa5a55
Author: James Page <email address hidden>
Date: Thu Mar 9 12:59:06 2017 +0000

    Fix support for cinder ceph rbd on Ocata

    As of Ocata, the ceph key used to access a specific Cinder
    Ceph backend must match the name of the key used by cinder,
    with an appropriate secret configured for libvirt use with
    the cephx key used by the cinder-ceph charm.

    Add support for the new ceph-access relation to allow
    nova-compute units to communicate with multiple ceph
    backends using different cephx keys and user names.

    The lead cinder-ceph unit will generate a UUID for use in
    the cinder configuration file, and for use by the remote
    nova-compute units when configuring libvirt secrets,
    ensuring that both ends of the integration match up.

    The side effect of this change is that nova-compute will
    have a key for use with its own ephemeral backend ceph
    access, and a key for each cinder ceph backend configured
    in the deployment.

    Change-Id: I974ecb39132feddfffabd6dcef401e91b5548d05
    Closes-Bug: 1671422

Changed in charm-cinder-ceph:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/445356
Committed: https://git.openstack.org/cgit/openstack/charm-guide/commit/?id=bb9cac0a1c2529274e43717ee8439d7f17792bec
Submitter: Jenkins
Branch: master

commit bb9cac0a1c2529274e43717ee8439d7f17792bec
Author: James Page <email address hidden>
Date: Tue Mar 14 08:29:44 2017 +0000

    Add additional release note for cinder-ceph storage

    A new relation is required to support key sharing between
    the cinder-ceph and nova-compute charms, providing better
    support for use of multiple storage backends.

    Add a release note to this effect.

    Change-Id: Idc32c75593c0ac90b4e2bff1c79d9a4d3486aa95
    Closes-Bug: 1671422

Changed in charm-guide:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/445398
Committed: https://git.openstack.org/cgit/openstack/charm-cinder-ceph/commit/?id=fcd1afbe8b8d0c9721f0090faec565925cbda692
Submitter: Jenkins
Branch: stable/17.02

commit fcd1afbe8b8d0c9721f0090faec565925cbda692
Author: James Page <email address hidden>
Date: Thu Mar 9 12:59:06 2017 +0000

    Fix support for cinder ceph rbd on Ocata

    As of Ocata, the ceph key used to access a specific Cinder
    Ceph backend must match the name of the key used by cinder,
    with an appropriate secret configured for libvirt use with
    the cephx key used by the cinder-ceph charm.

    Add support for the new ceph-access relation to allow
    nova-compute units to communicate with multiple ceph
    backends using different cephx keys and user names.

    The lead cinder-ceph unit will generate a UUID for use in
    the cinder configuration file, and for use by the remote
    nova-compute units when configuring libvirt secrets,
    ensuring that both ends of the integration match up.

    The side effect of this change is that nova-compute will
    have a key for use with its own ephemeral backend ceph
    access, and a key for each cinder ceph backend configured
    in the deployment.

    Change-Id: I974ecb39132feddfffabd6dcef401e91b5548d05
    Closes-Bug: 1671422
    (cherry picked from commit 62613456e7a04ac348d4589145a550323dfa5a55)

Reviewed: https://review.openstack.org/445354
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=e0c187cb7aa87fa91671910f2b228024f64a0763
Submitter: Jenkins
Branch: stable/17.02

commit e0c187cb7aa87fa91671910f2b228024f64a0763
Author: James Page <email address hidden>
Date: Thu Mar 9 12:51:25 2017 +0000

    Fix support for cinder ceph rbd in Ocata

    As of Ocata, the ceph key used to access a specific Cinder
    Ceph backend must match the name of the key used by cinder,
    with an appropriate secret configured for libvirt use with
    the cephx key used by the cinder-ceph charm.

    Add support for the new ceph-access relation to allow
    nova-compute units to communicate with multiple ceph
    backends using different cephx keys and user names.

    The side effect of this change is that nova-compute will
    have a key for use with its own ephemeral backend ceph
    access, and a key for each cinder ceph backend configured
    in the deployment.

    Change-Id: I638473fc46c99a8bfe301f9a0c844de9efd47a2a
    Closes-Bug: 1671422
    (cherry picked from commit 1467cbb1b3883cfe7c47ac484b2da96d13fa9e13)

James Page (james-page) on 2017-03-15
Changed in charm-cinder-ceph:
status: Fix Committed → Fix Released
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
Sean Dague (sdague) on 2017-04-17
no longer affects: nova
Ryan Beisner (1chb1n) wrote :

Need to revisit this for the scenario where the cinder-ceph subordinate is not in use, and the cinder charm is used with the ceph* charms directly.

Darin Arrick (darinavbt) wrote :

I think I've also run into this. I deployed a MAAS+Autopilot cloud a couple of weeks ago. Everything seems to work but attaching volumes. Speaking with David B. from Canonical, he suggested I post here, as well.

Environment: new deployment, based on https://www.ubuntu.com/download/cloud/autopilot
"juju status" on controller: https://pastebin.com/Rk6pAGFG
nova-compute.log from the compute node in question: https://pastebin.com/XHsZkVxG

Two things:
1) How do I prove that my issue is this bug? The lack of rbd_secret_uuid somewhere?
2) What's the workaround/fix? My deployment is new and strictly for testing at this point, so I can do whatever is needed.

Nobuto Murata (nobuto) wrote :

It would be nice if openstack-base bundle in the charm store has the "ceph-access" relation added in this bug as a reference for everyone.

The current revision is Newton, not Ocata.
https://api.jujucharms.com/charmstore/v5/openstack-base/archive/bundle.yaml

However, development one with Ocata does not have the newly added relation yet:
https://github.com/openstack-charmers/openstack-bundles/blob/master/development/openstack-base-xenial-ocata/bundle.yaml

Frode Nordahl (fnordahl) wrote :

PR already up here: https://github.com/openstack-charmers/openstack-bundles/pull/32

This will currently only work with the next charms for cinder-ceph and nova-compute. The necessary commits are subject for release in the upcoming charm release.

Nobuto Murata (nobuto) wrote :

> PR already up here: https://github.com/openstack-charmers/openstack-bundles/pull/32

Nice!

> This will currently only work with the next charms for cinder-ceph and nova-compute. The necessary commits are subject for release in the upcoming charm release.

Hmm, stable charms have "ceph-access" relations already? Looks like the fix has been backported to 17.02 branch.
https://api.jujucharms.com/charmstore/v5/nova-compute/archive/metadata.yaml
https://api.jujucharms.com/charmstore/v5/cinder-ceph/archive/metadata.yaml

Frode Nordahl (fnordahl) wrote :

The relation is there, but last I checked it did not contain the required data for it to work. If it was intended to be backported I'll check again and try to track down what's missing if it does not work.

Frode Nordahl (fnordahl) wrote :

This is indeed backported to stable but in some circumstances the ceph-access relation never completes both in stable and in master.

I have filed bug 1711642.

James Page (james-page) on 2017-09-21
Changed in charm-cinder:
status: New → Won't Fix
Chris Sanders (chris.sanders) wrote :

Subscribed field-critical.

The 'wont fix' for the cinder charm seems in conflict with https://bugs.launchpad.net/charm-cinder-ceph/+bug/1727184

A cloud running the cinder charm, then upgraded to Ocata does not currently appear to have a way to use it's cinder volumes. No know work around currently.

David Ames (thedac) on 2018-12-20
Changed in charm-cinder:
status: Won't Fix → Confirmed
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 19.04
David Ames (thedac) wrote :

Updating this bug. We may decide to move this elsewhere it at some point.

We have a deployment that was upgraded through to pike at which point it was noticed that nova instances with ceph backed volumes would not start.

The cinder key was manually added to the nova-compute nodes in /etc/ceph and with:
sudo virsh secret-define --file /tmp/cinder.secret

However, this did not resolve the problem. It appeared libvirt was trying to use a mixed pair of usernames and keys. It was using the cinder username but the nova-compute key.

Looking at nova's code it falls back to nova.conf when it does not have a secret_uuid from cinder but it was not setting the username correctly.
https://github.com/openstack/nova/blob/stable/pike/nova/virt/libvirt/volume/net.py#L74

The following seems to mitigate this as a temporary fix on nova-compute until we can come up with a complete plan:

https://pastebin.ubuntu.com/p/tGm7C7fpXT/

diff --git a/nova/virt/libvirt/volume/net.py b/nova/virt/libvirt/volume/net.py
index cec43ce93b..8b0148df0b 100644
--- a/nova/virt/libvirt/volume/net.py
+++ b/nova/virt/libvirt/volume/net.py
@@ -71,6 +71,7 @@ class LibvirtNetVolumeDriver(libvirt_volume.LibvirtBaseVolumeDriver):
             else:
                 LOG.debug('Falling back to Nova configuration for RBD auth '
                           'secret_uuid value.')
+ conf.auth_username = CONF.libvirt.rbd_user
                 conf.auth_secret_uuid = CONF.libvirt.rbd_secret_uuid
             # secret_type is always hard-coded to 'ceph' in cinder
             conf.auth_secret_type = netdisk_properties['secret_type']

Apply to /usr/lib/python2.7/dist-packages/nova/virt/libvirt/volume/net.py

We still need a migration plan to get from the topology with nova-compute directly related to ceph to the topology with cinder-ceph related to nova-compute using ceph-access which would populate cinder's secret_uuid.
It is possible we will need to carry the patch for existing instances. It may be worth getting that upstream as master has the same problem.

Frode Nordahl (fnordahl) wrote :

Referencing some other issues that makes the cinder -> cinder-ceph migration more complicated:
- bug 1727184
- bug 1768922
- bug 1773800

Corey Bryant (corey.bryant) wrote :

@thedac, I've created bug 1809454 to track your fix from comment 25.

Frode Nordahl (fnordahl) wrote :

As for the charm migration path we need to provide the means for enabling a administrator to morph a existing model without the `cinder-ceph` subordinate to a model with the `cinder-ceph` subordinate.

What I would propose we do is to document the existence and use of the `rename-volume-host` action that was introduced on the `cinder` charm in bug 1665272.

In addition to that we would need to either have the proposed Nova fallback fix landed or find another way to have Nova update the block_device_mapping.connection_info of existing instances with the correct ceph username and the libvirt secret_uuid on the `cinder-ceph` - `nova-compute` `ceph-access` relation.

Frode Nordahl (fnordahl) wrote :

FWIW; here is a bundle useful for testing the scenario: https://pastebin.ubuntu.com/p/qdd78M96CQ/

David Ames (thedac) wrote :

It would seem the upgrade to Ocata changes the auth_username to cinder in the database and leaves secret_uuid Null. This may be because cinder did not already have a rbd_secret_uuid set during the upgrade. Adding cinder-ceph to the equation adds this but does not on its own update the nova DB. (more testing needed)

The patch [0] and the package updates [1] will be required for the fall back to nova's rbd_username and rbd_secret_uuid for existing volume backed instances.

The path forward:

When [1] packages are available update packages on nodes. This will handle all existing instances.
Add cinder ceph to the model. New instances will use the cinder-ceph credentials.

Needs further testing:
Remove relation between cinder and ceph-mon
Test non nova cinder volumes after the topology change
To future proof against the fallback being removed:
  Either update the DB similar to [2]
  Or create an action similar to [3] that does this for us.

[0] https://review.openstack.org/#/c/626897/
[1] https://bugs.launchpad.net/nova/+bug/1809454
[2] https://pastebin.canonical.com/p/4ZdVnzpSp8/
[3] https://github.com/openstack/charm-cinder/blob/stable/18.11/actions.yaml#L14

Changed in charm-cinder:
assignee: David Ames (thedac) → nobody
Xav Paice (xavpaice) on 2018-12-22
tags: added: canonical-bootstack
David Ames (thedac) wrote :

Packages from https://bugs.launchpad.net/nova/+bug/1809454 are now available in the cloud:xenial-queens/proposed pocket. Note the "proposed" pocket. The version of the nova packages is 17.0.7-0ubuntu2~cloud0.

This should enable the upgrade from Pike (with the cowboy patch) to Queens proposed with the fix in the packages.

juju config nova-cloud-controller cloud:xenial-queens/proposed
juju config nova-compute cloud:xenial-queens/proposed

Then run the openstack-upgrade action. Note: the rest of the cloud can use cloud:xenial-queens and will need to be upgraded as well.

I have run through a quick and dirty upgrade test (nova only) from newton to queens. Confirmed the problem in ocata and pike and the fix in queens proposed.

The fix was introduced in Stein and therefor will be available for the foreseeable future. This means the fallback to the nova configured ceph authentication will be available until we can confirm a complete migration path from nova-compute<->ceph-mon to the cinder<->cinder-ceph<->ceph-mon topology.

David Ames (thedac) wrote :

WARNING
Potentially as soon as the upgrade from newton to ocata occurred and confirmed
after the bellow topology change, live migrations and snapshots are not working.

Existing instances can be stopped and started but not snapshotted or migrated.
New instances can do all of the above.

Toplogy migration path
juju deploy cs:cinder-ceph
juju add-relation cinder cinder-ceph
juju add-relation cinder-ceph ceph-mon
juju remove-relation cinder ceph-mon
juju add-relation cinder-ceph nova-compute
# If [CEPH] block still in /etc/cinder/cinder.conf update it with any configuration change
juju config cinder debug=True

New instances will have auth_username cinder-ceph and secret-uuid populated.

David Ames (thedac) wrote :
Download full text (10.7 KiB)

The problem with live migration et al seems to occur at the time of upgrade from newton to ocata. Not necessarily with the cinder-ceph topology change:

For example an instance created at newton, then the cloud is upgraded to ocata. Attempting to live migrate:

Source host:
2019-01-14 19:18:01.258 4742 ERROR nova.compute.manager [instance: f4ff161f-7c51-4d2a-a97d-c8cff12d5651]
2019-01-14 19:18:01.775 4742 ERROR root [req-5e9c7fd1-7072-4878-8a97-6751581fbba4 16d150ce78794d3eba5cafa0f6e83b36 dbdbac7a5d09477b891321ef11690f03 - - -] Original exception being dropped: ['Traceback (most recent call last):
', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5440, in _do_live_migration
    block_migration, disk, dest, migrate_data)
', ' File "/usr/lib/python2.7/dist-packages/nova/compute/rpcapi.py", line 723, in pre_live_migration
    disk=disk, migrate_data=migrate_data)
', ' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
    retry=self.retry)
', ' File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 97, in _send
    timeout=timeout, retry=retry)
', ' File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in send
    retry=retry)
', ' File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 449, in _send
    raise result
', 'RemoteError: Remote error: ClientException Internal Server Error (HTTP 500)
[u\'Traceback (most recent call last):\
\', u\' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming\
    res = self.dispatcher.dispatch(message)\
\', u\' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch\
    return self._do_dispatch(endpoint, method, ctxt, args)\
\', u\' File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch\
    result = func(ctxt, **new_args)\
\', u\' File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 75, in wrapped\
    function_name, call_dict, binary)\
\', u\' File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__\
    self.force_reraise()\
\', u\' File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise\
    six.reraise(self.type_, self.value, self.tb)\
\', u\' File "/usr/lib/python2.7/dist-packages/nova/exception_wrapper.py", line 66, in wrapped\
    return f(self, context, *args, **kw)\
\', u\' File "/usr/lib/python2.7/dist-packages/nova/compute/utils.py", line 686, in decorated_function\
    return function(self, context, *args, **kwargs)\
\', u\' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 216, in decorated_function\
    kwargs[\\\'instance\\\'], e, sys.exc_info())\
\', u\' File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__\
    self.force_reraise()\
\', u\' File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise\
    six.reraise(self.type_, self.value, self.tb)\
\', u\' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 204, in d...

Liam Young (gnuoy) on 2019-01-21
Changed in charm-cinder:
assignee: nobody → Liam Young (gnuoy)
Liam Young (gnuoy) wrote :

The charm guide now contains instructions for migrating to the cinder-ceph charm (https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-upgrade-openstack.html#cinder-ceph-topology-change-upgrading-from-newton-to-ocata ) I believe this covers the last critical issue in this bug.

Changed in charm-cinder:
status: Confirmed → Invalid
Liam Young (gnuoy) wrote :

Spoke with xavpaice and chris.sanders and they agreed that the field-crit tag can be removed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers