Activity log for bug #1898918

Date Who What changed Old value New value Message
2020-10-07 17:49:44 Takashi Kajinami bug added bug
2020-10-07 17:51:14 Takashi Kajinami summary Creating a cloned volume causes missing heartbeat to RabbitMQ rbd: creating a cloned volume causes missing heartbeat to RabbitMQ
2020-10-07 17:51:30 Takashi Kajinami description When creating a cloned volume, cinder-vlume is disconnected by RabbitMQ. 2020-10-06 15:19:05.697 55 DEBUG oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:765 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured ... 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 371, in _send_loop 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit return send_method(data, *args) 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit 2020-10-06 15:19:05.698 55 ERROR oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] AMQP server on overcloud-controller-0:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer We observe "missed heartbeats from client, timeout: 60s" in RabbitMQ log and it seems that this disconnection is caused by missing heatbreats to client when flatten is caused. The same issue was reported for volume-from-snapshot[1] and the fix for the issue made flatten call use tpool.Proxy[2]. [1] https://bugs.launchpad.net/cinder/+bug/1658037 [2] https://review.opendev.org/#/c/423184/2/cinder/volume/drivers/rbd.py However the flatten call in create_cloned_volume is still executed in the same thread[3]. We need the same fix for this flatten call. [3] https://github.com/openstack/cinder/blob/6ad1ab0c7298f0d2647ec42e2f7bb101af135af7/cinder/volume/drivers/rbd.py#L743 This issue was initially reported in downstream bug and reproduced with Queens. https://bugzilla.redhat.com/show_bug.cgi?id=1885734 However the call of flatten in create_cloned_volume is not yet updated even in master. When rbd is used as backend and creating a cloned volume, cinder-vlume is disconnected by RabbitMQ. 2020-10-06 15:19:05.697 55 DEBUG oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:765 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured ... 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 371, in _send_loop 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit return send_method(data, *args) 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer 2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit 2020-10-06 15:19:05.698 55 ERROR oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] AMQP server on overcloud-controller-0:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer We observe "missed heartbeats from client, timeout: 60s" in RabbitMQ log and it seems that this disconnection is caused by missing heatbreats to client when flatten is caused. The same issue was reported for volume-from-snapshot[1] and the fix for the issue made flatten call use tpool.Proxy[2].  [1] https://bugs.launchpad.net/cinder/+bug/1658037  [2] https://review.opendev.org/#/c/423184/2/cinder/volume/drivers/rbd.py However the flatten call in create_cloned_volume is still executed in the same thread[3]. We need the same fix for this flatten call.  [3] https://github.com/openstack/cinder/blob/6ad1ab0c7298f0d2647ec42e2f7bb101af135af7/cinder/volume/drivers/rbd.py#L743 This issue was initially reported in downstream bug and reproduced with Queens.  https://bugzilla.redhat.com/show_bug.cgi?id=1885734 However the call of flatten in create_cloned_volume is not yet updated even in master.
2020-10-07 18:10:17 OpenStack Infra cinder: status New In Progress
2020-10-07 18:10:17 OpenStack Infra cinder: assignee Takashi Kajinami (kajinamit)
2020-10-14 04:33:57 OpenStack Infra cinder: status In Progress Fix Released
2020-10-17 07:34:21 OpenStack Infra tags in-stable-victoria
2020-10-19 21:56:39 OpenStack Infra tags in-stable-victoria in-stable-ussuri in-stable-victoria
2020-11-14 20:11:13 OpenStack Infra tags in-stable-ussuri in-stable-victoria in-stable-train in-stable-ussuri in-stable-victoria