rbd: creating a cloned volume causes missing heartbeat to RabbitMQ

Bug #1898918 reported by Takashi Kajinami
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Takashi Kajinami

Bug Description

When rbd is used as backend and creating a cloned volume, cinder-vlume is disconnected by RabbitMQ.

2020-10-06 15:19:05.697 55 DEBUG oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:765
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured
...
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 371, in _send_loop
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit return send_method(data, *args)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit
2020-10-06 15:19:05.698 55 ERROR oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] AMQP server on overcloud-controller-0:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer

We observe "missed heartbeats from client, timeout: 60s" in RabbitMQ log and it seems that this disconnection is caused by missing heatbreats to client when flatten is caused.

The same issue was reported for volume-from-snapshot[1] and the fix for the issue made flatten call use tpool.Proxy[2].
 [1] https://bugs.launchpad.net/cinder/+bug/1658037
 [2] https://review.opendev.org/#/c/423184/2/cinder/volume/drivers/rbd.py

However the flatten call in create_cloned_volume is still executed in the same thread[3]. We need the same fix for this flatten call.
 [3] https://github.com/openstack/cinder/blob/6ad1ab0c7298f0d2647ec42e2f7bb101af135af7/cinder/volume/drivers/rbd.py#L743

This issue was initially reported in downstream bug and reproduced with Queens.
 https://bugzilla.redhat.com/show_bug.cgi?id=1885734
However the call of flatten in create_cloned_volume is not yet updated even in master.

summary: - Creating a cloned volume causes missing heartbeat to RabbitMQ
+ rbd: creating a cloned volume causes missing heartbeat to RabbitMQ
description: updated
Changed in cinder:
assignee: nobody → Takashi Kajinami (kajinamit)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/756416
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=bafe440b9418a1e3acd00456872d4d97a9c64c61
Submitter: Zuul
Branch: master

commit bafe440b9418a1e3acd00456872d4d97a9c64c61
Author: Takashi Kajinami <email address hidden>
Date: Wed Oct 7 09:55:32 2020 +0900

    RBD: Run flatten in a different thread when cloning a volume

    The current implementation of create_cloned_volume calls flatten
    directly, and this makes whole thread of cinder-volume blocked by that
    flatten call. This causes heartbeat timeout in RabbitMQ when cloning
    a volume with rbd backend.

    This patch makes sure that flatten is executed in a different thread,
    to allow heatbeat thread to run while flattening a rbd image.

    Closes-Bug: #1898918
    Change-Id: I9f28260008117abcebfc96dbe69bf892f5cd14fe

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/758003

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/758440

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/758003
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=f1521d82d941e711f6ef248a7eb81f096de8135f
Submitter: Zuul
Branch: stable/victoria

commit f1521d82d941e711f6ef248a7eb81f096de8135f
Author: Takashi Kajinami <email address hidden>
Date: Wed Oct 7 09:55:32 2020 +0900

    RBD: Run flatten in a different thread when cloning a volume

    The current implementation of create_cloned_volume calls flatten
    directly, and this makes whole thread of cinder-volume blocked by that
    flatten call. This causes heartbeat timeout in RabbitMQ when cloning
    a volume with rbd backend.

    This patch makes sure that flatten is executed in a different thread,
    to allow heatbeat thread to run while flattening a rbd image.

    Closes-Bug: #1898918
    Change-Id: I9f28260008117abcebfc96dbe69bf892f5cd14fe
    (cherry picked from commit bafe440b9418a1e3acd00456872d4d97a9c64c61)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/ussuri)

Reviewed: https://review.opendev.org/758440
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=223b3c7b0f614fc4a5e0e8e9612f10bb0de0ba55
Submitter: Zuul
Branch: stable/ussuri

commit 223b3c7b0f614fc4a5e0e8e9612f10bb0de0ba55
Author: Takashi Kajinami <email address hidden>
Date: Wed Oct 7 09:55:32 2020 +0900

    RBD: Run flatten in a different thread when cloning a volume

    The current implementation of create_cloned_volume calls flatten
    directly, and this makes whole thread of cinder-volume blocked by that
    flatten call. This causes heartbeat timeout in RabbitMQ when cloning
    a volume with rbd backend.

    This patch makes sure that flatten is executed in a different thread,
    to allow heatbeat thread to run while flattening a rbd image.

    Closes-Bug: #1898918
    Change-Id: I9f28260008117abcebfc96dbe69bf892f5cd14fe
    (cherry picked from commit bafe440b9418a1e3acd00456872d4d97a9c64c61)
    (cherry picked from commit f1521d82d941e711f6ef248a7eb81f096de8135f)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/759186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/train)

Reviewed: https://review.opendev.org/759186
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c92c15b59ae21fdd3bf4f4db2a5aef45615a6266
Submitter: Zuul
Branch: stable/train

commit c92c15b59ae21fdd3bf4f4db2a5aef45615a6266
Author: Takashi Kajinami <email address hidden>
Date: Wed Oct 7 09:55:32 2020 +0900

    RBD: Run flatten in a different thread when cloning a volume

    The current implementation of create_cloned_volume calls flatten
    directly, and this makes whole thread of cinder-volume blocked by that
    flatten call. This causes heartbeat timeout in RabbitMQ when cloning
    a volume with rbd backend.

    This patch makes sure that flatten is executed in a different thread,
    to allow heatbeat thread to run while flattening a rbd image.

    Closes-Bug: #1898918
    Change-Id: I9f28260008117abcebfc96dbe69bf892f5cd14fe
    (cherry picked from commit bafe440b9418a1e3acd00456872d4d97a9c64c61)
    (cherry picked from commit f1521d82d941e711f6ef248a7eb81f096de8135f)
    (cherry picked from commit 223b3c7b0f614fc4a5e0e8e9612f10bb0de0ba55)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 17.0.1

This issue was fixed in the openstack/cinder 17.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 16.2.1

This issue was fixed in the openstack/cinder 16.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 15.4.1

This issue was fixed in the openstack/cinder 15.4.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.0.0.0b1

This issue was fixed in the openstack/cinder 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.