RBD driver option rbd_user is confusing

Bug #1083540 reported by Florian Haas
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Josh Durgin

Bug Description

"rados lspools" in RBDDriver's check_for_setup_error() does not correctly set the Cephx client identity. Instead, it defaults to the "client.admin" identity. In a Ceph cluster where Cephx authentication is disabled, this does not matter -- client authentication succeeds no matter which client identity is set.

In a cluster where authentication is enabled, and the Cinder host does not have the client.admin's Cephx key available, "rados lspools" fails because the client.admin key is not available. This causes cinder-volume to fail on startup:

012-11-27 11:51:25 DEBUG cinder.utils [req-30c51712-e631-4034-98b3-686fb29705fd None None] Running cmd (subprocess): rados lspools execute /usr/lib/python2.7/dist-packages/cinder/utils.py:156
2012-11-27 11:51:25 DEBUG cinder.utils [req-30c51712-e631-4034-98b3-686fb29705fd None None] Result was 254 execute /usr/lib/python2.7/dist-packages/cinder/utils.py:172
2012-11-27 11:51:25 22750 CRITICAL cinder [-] Unexpected error while running command.
Command: rados lspools
Exit code: 254
Stdout: ''
Stderr: "2012-11-27 11:51:25.025430 7f18f3413780 -1 auth: failed to open keyring from /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin\n2012-11-27 11:51:25.025447 7f18f3413780 -1 monclient(hunting): failed to open keyring: (2) No such file or directory\n2012-11-27 11:51:25.025506 7f18f3413780 0 librados: client.admin initialization error (2) No such file or directory\ncouldn't connect to cluster! error -2\n"
2012-11-27 11:51:25 22750 TRACE cinder Traceback (most recent call last):
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/bin/cinder-volume", line 48, in <module>
2012-11-27 11:51:25 22750 TRACE cinder service.wait()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 422, in wait
2012-11-27 11:51:25 22750 TRACE cinder _launcher.wait()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 127, in wait
2012-11-27 11:51:25 22750 TRACE cinder service.wait()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
2012-11-27 11:51:25 22750 TRACE cinder return self._exit_event.wait()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2012-11-27 11:51:25 22750 TRACE cinder return hubs.get_hub().switch()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
2012-11-27 11:51:25 22750 TRACE cinder return self.greenlet.switch()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
2012-11-27 11:51:25 22750 TRACE cinder result = function(*args, **kwargs)
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 88, in run_server
2012-11-27 11:51:25 22750 TRACE cinder server.start()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 159, in start
2012-11-27 11:51:25 22750 TRACE cinder self.manager.init_host()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 95, in init_host
2012-11-27 11:51:25 22750 TRACE cinder self.driver.check_for_setup_error()
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/volume/driver.py", line 614, in check_for_setup_error
2012-11-27 11:51:25 22750 TRACE cinder (stdout, stderr) = self._execute('rados', 'lspools')
2012-11-27 11:51:25 22750 TRACE cinder File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 179, in execute
2012-11-27 11:51:25 22750 TRACE cinder cmd=' '.join(cmd))
2012-11-27 11:51:25 22750 TRACE cinder ProcessExecutionError: Unexpected error while running command.

A workaround is to make the client.admin key available on the Cinder host, and make it readable to the "cinder" group (or whichever gid the cinder-volume process runs under). But the proper fix is for the "rados lspools" invocation to set a proper "-n" flag.

Tags: drivers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/16962

Changed in cinder:
assignee: nobody → Florian Haas (fghaas)
status: New → In Progress
Revision history for this message
Florian Haas (fghaas) wrote : Re: RBD driver ignores rbd_user

This thing is much worse than I thought. Every single "rados" or "rbd" command in RBDDriver ignores rbd_user.

summary: - RBD driver does not correctly check for pool's existence when rbd_user
- != "admin"
+ RBD driver ignores rbd_user
Changed in cinder:
importance: Undecided → High
milestone: none → grizzly-2
Josh Durgin (jdurgin)
summary: - RBD driver ignores rbd_user
+ RBD driver option rbd_user is confusing
Revision history for this message
Josh Durgin (jdurgin) wrote :

Renamed the bug, since the real issue is that cinder requires some extra configuration besides cinder.conf to use rbd with cephx authentication, and the rbd_user option is just passed from cinder to nova.

As http://ceph.com/docs/master/rbd/rbd-openstack/#configure-openstack-to-use-ceph mentions, you currently need to set the CEPH_ARGS environment variable for cinder-volume to specify the cephx user the rbd driver uses. The rbd_user configuration option is currently passed to nova as part of the information needed to connect to a volume.

There are a few cleanups for the rbd driver related to this:

1) let nova set rbd user and secret info, don't worry about it in cinder (completed in https://bugs.launchpad.net/cinder/+bug/1065883)
2) pass monitor addresses to nova so it doesn't need ceph.conf (https://bugs.launchpad.net/cinder/+bug/1077817)
3) configure all the necessary ceph parameters via cinder options so no ceph.conf or CEPH_ARGS are required (this bug)

Revision history for this message
Florian Haas (fghaas) wrote :

Josh, regarding (1) how would Cinder not have to worry about it all, specifically in conjunction with image creation? I think it would be perfectly reasonable to expect to be able to set a pool name and client id, just as in glance's RBD store.

Regarding (2), why? If that is the case you'd also have to be able to specify keyring locations, log locations etc. from nova. Why not simply rely on ceph.conf, and if anything, make the client id, the pool name, and ceph.conf's path nova config options?

Revision history for this message
Josh Durgin (jdurgin) wrote :

I was unclear in (1) - I meant that the user and key used by nova-compute can be configured by nova, and don't need to be passed from cinder to nova-compute. Cinder would need its own configuration for user and key to use, as well as pool and monitor addresses.

(2) allows you to use more than one ceph cluster from the same compute host, and lets configuration be simpler for compute hosts. It would still be possible to use ceph.conf (qemu is actually the one reading it via librados), but it would not be required. Monitor addresses and pool really should be passed from cinder, since they are part of identifying the volume uniquely (the namespaces being cluster (determined by monitor addresses), pool, and image).

Changed in cinder:
milestone: grizzly-2 → none
Mike Perez (thingee)
tags: added: drivers
Florian Haas (fghaas)
Changed in cinder:
assignee: Florian Haas (fghaas) → nobody
Josh Durgin (jdurgin)
Changed in cinder:
assignee: nobody → Josh Durgin (jdurgin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/30792

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/30792
Committed: http://github.com/openstack/cinder/commit/e2d0e1f479a56d60dc09ae913ab6625660ed0961
Submitter: Jenkins
Branch: master

commit e2d0e1f479a56d60dc09ae913ab6625660ed0961
Author: Josh Durgin <email address hidden>
Date: Tue May 21 17:49:02 2013 -0700

    rbd: simplify configuration and use librbd and librados

    Add an rbd_ceph_conf options to mirror glance configuration, and use
    the existing rbd_user option to choose how to connect to the cluster
    instead of relying on an environment variable. Use these settings
    when running command line programs and when connecting via librados.

    Use absolute imports so that importing the python librbd bindings
    via 'import rbd' does not try to import cinder.drivers.rbd again.

    Create some convenience wrappers to simplify librbd and librados
    error handling and cleanup. Using these everywhere also simplifies
    testing. Mock out all the librados and librbd calls in the tests
    so these libraries don't need to be installed.

    Remove the local_path() method since it's never used. It was
    left over from nova-volume.

    There are only three things still relying on the command line:
    - importing an image
    - exporting to an image
    - getting monitor addresses

    Importing and exporting on the command line include zero-detection
    that would be little benefit to replicate here. librados and librbd
    don't have a simple interface to obtain the monitor addresses, so
    leave that to a command line tool as well.

    Fixes: bug 1083540
    Signed-off-by: Josh Durgin <email address hidden>

    Change-Id: I32d059c5e460c2dd8423119b3dbe4a9921f5e907

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → havana-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: havana-2 → 2013.2
Revision history for this message
Jeremy Deininger (jeremydei) wrote :

Nevermind my previous comment, I encountered the same error due to attempting to use glance ephemeral storage with ceph. Once these options were removed from glance and nova I didn't have this error on launch anymore.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.