2016-07-28 17:15:51 |
Roman Podoliaka |
description |
While executing a call to librbd nova-compute may hang for a while and eventually go down in nova service-list output.
strace'ing shows that a process is stuck on acquiring a mutex:
root@node-153:~# strace -p 16675
Process 16675 attached
futex(0x7fff084ce36c, FUTEX_WAIT_PRIVATE, 1, NULL
gdb allows to see the traceback:
http://paste.openstack.org/show/542534/
^ which basically means calls to librbd (C library) are not monkey-patched and do not allow to switch the execution context to another green thread in an eventlet-based process.
To avoid blocking of the whole nova-compute process on calls to librbd we should wrap them with tpool.execute() (http://eventlet.net/doc/threading.html#eventlet.tpool.execute) |
While executing a call to librbd nova-compute may hang for a while (looks like at least some calls can take a really long time depending on the health of a Ceph cluster and things like http://docs.ceph.com/docs/master/rbd/librbdpy/#rbd.RBD.list are inherently slow down as the number of entities to be listed grows) and eventually go down in nova service-list output.
strace'ing shows that a process is stuck on acquiring a mutex:
root@node-153:~# strace -p 16675
Process 16675 attached
futex(0x7fff084ce36c, FUTEX_WAIT_PRIVATE, 1, NULL
gdb allows to see the traceback:
http://paste.openstack.org/show/542534/
^ which basically means calls to librbd (C library) are not monkey-patched and do not allow to switch the execution context to another green thread in an eventlet-based process.
To avoid blocking of the whole nova-compute process on calls to librbd we should wrap them with tpool.execute() (http://eventlet.net/doc/threading.html#eventlet.tpool.execute) |
|