Glance hangs from a missing connection timeout in rbd driver

Bug #1469246 reported by Mike Fedosin on 2015-06-26
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
glance_store
High
Mike Fedosin

Bug Description

If the rbd driver fails to connect to Ceph, glance-api hangs until it's restarted. There are no errors in log http://paste.openstack.org/show/321409/, but glance is unavailable from outside.

The reason is that we don't provide connection timeout parameter in https://github.com/openstack/glance_store/blob/stable/kilo/glance_store/_drivers/rbd.py#L340, which causes glance to wait a response forever.

Another reason is that RBD is a python binding for librados which isn't patched by eventlet. It means that glance won't switch green threads on i/o operations. So, if one thread hangs - the whole service hangs.

Changed in glance-store:
assignee: nobody → jelly (coding1314)
Mike Fedosin (mfedosin) wrote :

Sorry, jelly :( I'm currently working on fix for this issue.

Changed in glance-store:
assignee: jelly (coding1314) → Mike Fedosin (mfedosin)
Xavier (xavier-l) wrote :

I have this issue on icehouse 2014.1.5

Mike Fedosin (mfedosin) wrote :

Steps to reproduce in HA mode I got from QA team.

OS: Ubuntu
neutron VLan
1. Deploy 1 controller 2 compute + ceph, ceph for all
2. When cluster ready, delete 2 compute with ceph (before deletion of ceph nodes you need delete osd manually http://ceph.com/docs/master/rados/operations/add-or-rm-osds/), add 2 new ceph nodes, redeploy env
3. When cluster ready, add 1 compute and 2 controllers and re-deploy env
4. When cluster ready - glance won't response after image-create call.

I think there are easier ways to reproduce the bug, but this one works each time.

Changed in glance-store:
importance: Undecided → High
status: New → Fix Committed
Changed in glance-store:
milestone: none → 0.9.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers