Glance hangs from a missing connection timeout in rbd driver

Bug #1469246 reported by Mike Fedosin on 2015-06-26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mike Fedosin

Bug Description

If the rbd driver fails to connect to Ceph, glance-api hangs until it's restarted. There are no errors in log, but glance is unavailable from outside.

The reason is that we don't provide connection timeout parameter in, which causes glance to wait a response forever.

Another reason is that RBD is a python binding for librados which isn't patched by eventlet. It means that glance won't switch green threads on i/o operations. So, if one thread hangs - the whole service hangs.

Changed in glance-store:
assignee: nobody → jelly (coding1314)
Mike Fedosin (mfedosin) wrote :

Sorry, jelly :( I'm currently working on fix for this issue.

Changed in glance-store:
assignee: jelly (coding1314) → Mike Fedosin (mfedosin)
Xavier (xavier-l) wrote :

I have this issue on icehouse 2014.1.5

Mike Fedosin (mfedosin) wrote :

Steps to reproduce in HA mode I got from QA team.

OS: Ubuntu
neutron VLan
1. Deploy 1 controller 2 compute + ceph, ceph for all
2. When cluster ready, delete 2 compute with ceph (before deletion of ceph nodes you need delete osd manually, add 2 new ceph nodes, redeploy env
3. When cluster ready, add 1 compute and 2 controllers and re-deploy env
4. When cluster ready - glance won't response after image-create call.

I think there are easier ways to reproduce the bug, but this one works each time.

Changed in glance-store:
importance: Undecided → High
status: New → Fix Committed
Changed in glance-store:
milestone: none → 0.9.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers