Glance hangs from a missing connection timeout in rbd driver

Bug #1469246 reported by Mike Fedosin
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
glance_store
Fix Released
High
Mike Fedosin

Bug Description

If the rbd driver fails to connect to Ceph, glance-api hangs until it's restarted. There are no errors in log http://paste.openstack.org/show/321409/, but glance is unavailable from outside.

The reason is that we don't provide connection timeout parameter in https://github.com/openstack/glance_store/blob/stable/kilo/glance_store/_drivers/rbd.py#L340, which causes glance to wait a response forever.

Another reason is that RBD is a python binding for librados which isn't patched by eventlet. It means that glance won't switch green threads on i/o operations. So, if one thread hangs - the whole service hangs.

Changed in glance-store:
assignee: nobody → jelly (coding1314)
Revision history for this message
Mike Fedosin (mfedosin) wrote :

Sorry, jelly :( I'm currently working on fix for this issue.

Changed in glance-store:
assignee: jelly (coding1314) → Mike Fedosin (mfedosin)
Revision history for this message
Xavier (xavier-l) wrote :

I have this issue on icehouse 2014.1.5

Revision history for this message
Mike Fedosin (mfedosin) wrote :

Steps to reproduce in HA mode I got from QA team.

OS: Ubuntu
neutron VLan
1. Deploy 1 controller 2 compute + ceph, ceph for all
2. When cluster ready, delete 2 compute with ceph (before deletion of ceph nodes you need delete osd manually http://ceph.com/docs/master/rados/operations/add-or-rm-osds/), add 2 new ceph nodes, redeploy env
3. When cluster ready, add 1 compute and 2 controllers and re-deploy env
4. When cluster ready - glance won't response after image-create call.

I think there are easier ways to reproduce the bug, but this one works each time.

Changed in glance-store:
importance: Undecided → High
status: New → Fix Committed
Revision history for this message
Vincent Legoll (vincent-legoll) wrote :
Changed in glance-store:
milestone: none → 0.9.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.