Cinder

volume stuck in creating following VolumeBackendAPIException

Bug #1211839 reported by Edward Hope-Morley on 2013-08-13

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	Medium	Edward Hope-Morley	Cinder 2014.2 "juno"

Bug Description

With RBDDriver if cinder is unable to connect to cluster, the following exception is raised but the volume stays in 'creating' state:

2013-08-13 16:33:29.009 ERROR cinder.service [req-38634e4f-1e8a-499e-88d2-014731209952 None None] Unhandled exception
2013-08-13 16:33:29.009 TRACE cinder.service Traceback (most recent call last):
2013-08-13 16:33:29.009 TRACE cinder.service File "/opt/stack/cinder/cinder/service.py", line 228, in _start_child
2013-08-13 16:33:29.009 TRACE cinder.service self._child_process(wrap.server)
2013-08-13 16:33:29.009 TRACE cinder.service File "/opt/stack/cinder/cinder/service.py", line 205, in _child_process
2013-08-13 16:33:29.009 TRACE cinder.service launcher.run_server(server)
2013-08-13 16:33:29.009 TRACE cinder.service File "/opt/stack/cinder/cinder/service.py", line 96, in run_server
2013-08-13 16:33:29.009 TRACE cinder.service server.start()
2013-08-13 16:33:29.009 TRACE cinder.service File "/opt/stack/cinder/cinder/service.py", line 385, in start
2013-08-13 16:33:29.009 TRACE cinder.service self.manager.init_host()
2013-08-13 16:33:29.009 TRACE cinder.service File "/opt/stack/cinder/cinder/volume/manager.py", line 149, in init_host
2013-08-13 16:33:29.009 TRACE cinder.service self.driver.check_for_setup_error()
2013-08-13 16:33:29.009 TRACE cinder.service File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 262, in check_for_setup_error
2013-08-13 16:33:29.009 TRACE cinder.service raise exception.VolumeBackendAPIException(data=msg)
2013-08-13 16:33:29.009 TRACE cinder.service VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: error connecting to ceph cluster
2013-08-13 16:33:29.009 TRACE cinder.service

Tags:

Revision history for this message

Eric Harney (eharney) wrote on 2013-11-18:

Sounds like same issue as bug 1242942.

Edward Hope-Morley (hopem) on 2013-12-09

tags:

added: ceph rbd

Mike Perez (thingee) on 2014-03-31

Changed in cinder:
status:	New → Triaged

Revision history for this message

Hua Zhang (zhhuabj) wrote on 2014-06-28:

I also met the same problem today using the latest master branch code, any reasons who know ?

Revision history for this message

Hua Zhang (zhhuabj) wrote on 2014-06-29:

this is caused by wrong user, change into 'rbd_user = admin' is ok by refering the link https://ceph.com/docs/master/rados/api/python/

Revision history for this message

Hua Zhang (zhhuabj) wrote on 2014-06-29:

or make sure configure right permissions for non-admin users

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2014-06-29:

Ok I've had another look at this (Juno). Some thoughts:

* The error above i.e. "error connecting to ceph cluster" is a result of the volume service getting restarted and the rbd driver not being able to connect to ceph. I notice that the rbd driver does not set a timeout value when calling rados.connect() and so the default time (which appears to be 10 mins) elapses before the error above is triggered. Now, although this is unrelated to a create operation, if a create happens to be issued prior to the volume driver finishing it's initialisation, it will remain in the creating state since the scheduler has nowhere to issue that op.

* Slightly different case is where the rbd driver has previously been successfully started/initialised but the ceph cluster subsequently becomes unreachable. When a create is issued it does reach the driver but when a connection is attempted a timeout (10 mins by default) elapses before a rados.TimedoutError is raised. Subsequent i then see that the volume stays in 'creating' until the get_volume_stats() gets called at which point it finally goes to 'error'.

So, couple of improvements I think we could make here:

1. Allow rados connection timeout to be configurable. Useful for testing and allow timeouts to be predictable and controllable.
2. Make sure the create fails properly when the timeout occurs i.e. volume status -> 'error'.

Changed in cinder:
status:	Triaged → Confirmed
importance:	Undecided → Medium
assignee:	nobody → Edward Hope-Morley (hopem)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-29: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/103424

Changed in cinder:
status:	Confirmed → In Progress

Edward Hope-Morley (hopem) on 2014-06-29

tags:

added: drivers

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-02: Fix merged to cinder (master)

Reviewed: https://review.openstack.org/103424
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=02e4afbd8108ca26af09ccaebb8b1c8b72ea3f3f
Submitter: Jenkins
Branch: master

commit 02e4afbd8108ca26af09ccaebb8b1c8b72ea3f3f
Author: Edward Hope-Morley <email address hidden>
Date: Sun Jun 29 19:08:46 2014 +0100

Ensure rbd connect exception is properly caught

    If the rbd driver fails to connect to Ceph the exception
    was not being properly caught resulting in the volume
    remaining in the 'creating' state until the corresponding
    task eventually times out (on top of the time it took
    for the connect to fail).

Also added config option for rados connect timeout.

    DocImpact: new config option 'rados_connect_timout'
    Closes-Bug: 1211839
    Change-Id: I5e6eaaaf6bed3e139ff476ecf9510ebe214a83f9

Changed in cinder:
status:	In Progress → Fix Committed

Russell Bryant (russellb) on 2014-07-24

Changed in cinder:
milestone:	none → juno-2
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in cinder:
milestone:	juno-2 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.