Comment 0 for bug 1898918

Revision history for this message
Takashi Kajinami (kajinamit) wrote : Creating a cloned volume causes missing heartbeat to RabbitMQ

When creating a cloned volume, cinder-vlume is disconnected by RabbitMQ.

2020-10-06 15:19:05.697 55 DEBUG oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:765
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured
...
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 371, in _send_loop
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit return send_method(data, *args)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit
2020-10-06 15:19:05.698 55 ERROR oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] AMQP server on overcloud-controller-0:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer

We observe "missed heartbeats from client, timeout: 60s" in RabbitMQ log and it seems that this disconnection is caused by missing heatbreats to client when flatten is caused.

The same issue was reported for volume-from-snapshot[1] and the fix for the issue made flatten call use tpool.Proxy[2].
 [1] https://bugs.launchpad.net/cinder/+bug/1658037
 [2] https://review.opendev.org/#/c/423184/2/cinder/volume/drivers/rbd.py

However the flatten call in create_cloned_volume is still executed in the same thread[3]. We need the same fix for this flatten call.
 [3] https://github.com/openstack/cinder/blob/6ad1ab0c7298f0d2647ec42e2f7bb101af135af7/cinder/volume/drivers/rbd.py#L743

This issue was initially reported in downstream bug and reproduced with Queens.
 https://bugzilla.redhat.com/show_bug.cgi?id=1885734
However the call of flatten in create_cloned_volume is not yet updated even in master.