Volume can't be detached if attachment delete api call fails with 504 gateway timeout

Bug #1978444 reported by Takashi Kajinami
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Takashi Kajinami
Train
In Progress
Undecided
Unassigned
Ussuri
In Progress
Undecided
Unassigned
Victoria
Fix Released
Undecided
Unassigned
Wallaby
Fix Released
Undecided
Unassigned
Xena
Fix Released
Undecided
Unassigned
Yoga
Fix Released
Undecided
Unassigned

Bug Description

Description
===========
When cinder-api is running behind load balancer like haproxy, the load balancer can return 504 if it can not receive response from cinder-api within timeout.
When this timeout occurs while detaching a volume, this results in un-detachable volume.

 - nova-compute calls delete attachment api in cinder
 - haproxy detects server timeout and returns 504
 - cinder continues processing the API and removes the attachment
 - nova-compute immediately aborts the volume detachment and leaves the bdm
 - when a client tries to detach the volume again, the detachment fails because the attachment no longer exists in Nova

See for details https://bugzilla.redhat.com/show_bug.cgi?id=2002643

Steps to reproduce
==================
* Stop cinder-volume
* Detach a volume from an instance
* Start cinder-volume
* Detach the volume again

Expected result
===============
* Volume can be detached after cinder-volume is recovered

Actual result
===============
* Volume can't be detached

Environment
===========
* The issue was initially found in stable/train

Logs & Configs
==============
* See https://bugzilla.redhat.com/show_bug.cgi?id=2002643#c1

Changed in nova:
assignee: nobody → Takashi Kajinami (kajinamit)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/845543

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/845543
Committed: https://opendev.org/openstack/nova/commit/8f4b740ca5292556f8e953a30f2a11ed4fbc2945
Submitter: "Zuul (22348)"
Branch: master

commit 8f4b740ca5292556f8e953a30f2a11ed4fbc2945
Author: Takashi Kajinami <email address hidden>
Date: Mon Jun 13 14:48:24 2022 +0900

    Retry attachment delete API call for 504 Gateway Timeout

    When cinder-api runs behind a load balancer(eg haproxy), the load
    balancer can return 504 Gateway Timeout when cinder-api does not
    respond within timeout. This change ensures nova retries deleting
    a volume attachment in that case.

    Also this change makes nova ignore 404 in the API call. This is
    required because cinder might continue deleting the attachment even if
    the load balancer returns 504. This also helps us in the situation
    where the volume attachment was accidentally removed by users.

    Closes-Bug: #1978444
    Change-Id: I593011d9f4c43cdae7a3d53b556c6e2a2b939989

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/849212

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.0.0.0rc1

This issue was fixed in the openstack/nova 26.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/866083

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/866085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/866087

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/866089

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/866091

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/849212
Committed: https://opendev.org/openstack/nova/commit/b94ffb1123b1a6cf0a8675e0d6f1072e9625f570
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit b94ffb1123b1a6cf0a8675e0d6f1072e9625f570
Author: Takashi Kajinami <email address hidden>
Date: Mon Jun 13 14:48:24 2022 +0900

    Retry attachment delete API call for 504 Gateway Timeout

    When cinder-api runs behind a load balancer(eg haproxy), the load
    balancer can return 504 Gateway Timeout when cinder-api does not
    respond within timeout. This change ensures nova retries deleting
    a volume attachment in that case.

    Also this change makes nova ignore 404 in the API call. This is
    required because cinder might continue deleting the attachment even if
    the load balancer returns 504. This also helps us in the situation
    where the volume attachment was accidentally removed by users.

    Closes-Bug: #1978444
    Change-Id: I593011d9f4c43cdae7a3d53b556c6e2a2b939989
    (cherry picked from commit 8f4b740ca5292556f8e953a30f2a11ed4fbc2945)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/866083
Committed: https://opendev.org/openstack/nova/commit/14f9b7627e8a48f546e2f1c79d4336e1e4923501
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 14f9b7627e8a48f546e2f1c79d4336e1e4923501
Author: Takashi Kajinami <email address hidden>
Date: Mon Jun 13 14:48:24 2022 +0900

    Retry attachment delete API call for 504 Gateway Timeout

    When cinder-api runs behind a load balancer(eg haproxy), the load
    balancer can return 504 Gateway Timeout when cinder-api does not
    respond within timeout. This change ensures nova retries deleting
    a volume attachment in that case.

    Also this change makes nova ignore 404 in the API call. This is
    required because cinder might continue deleting the attachment even if
    the load balancer returns 504. This also helps us in the situation
    where the volume attachment was accidentally removed by users.

    Closes-Bug: #1978444
    Change-Id: I593011d9f4c43cdae7a3d53b556c6e2a2b939989
    (cherry picked from commit 8f4b740ca5292556f8e953a30f2a11ed4fbc2945)
    (cherry picked from commit b94ffb1123b1a6cf0a8675e0d6f1072e9625f570)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/866085
Committed: https://opendev.org/openstack/nova/commit/9b1c078112f11eafbd8e174efbd0e0f9d2c951ee
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 9b1c078112f11eafbd8e174efbd0e0f9d2c951ee
Author: Takashi Kajinami <email address hidden>
Date: Mon Jun 13 14:48:24 2022 +0900

    Retry attachment delete API call for 504 Gateway Timeout

    When cinder-api runs behind a load balancer(eg haproxy), the load
    balancer can return 504 Gateway Timeout when cinder-api does not
    respond within timeout. This change ensures nova retries deleting
    a volume attachment in that case.

    Also this change makes nova ignore 404 in the API call. This is
    required because cinder might continue deleting the attachment even if
    the load balancer returns 504. This also helps us in the situation
    where the volume attachment was accidentally removed by users.

    Closes-Bug: #1978444
    Change-Id: I593011d9f4c43cdae7a3d53b556c6e2a2b939989
    (cherry picked from commit 8f4b740ca5292556f8e953a30f2a11ed4fbc2945)
    (cherry picked from commit b94ffb1123b1a6cf0a8675e0d6f1072e9625f570)
    (cherry picked from commit 14f9b7627e8a48f546e2f1c79d4336e1e4923501)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.2.0

This issue was fixed in the openstack/nova 24.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.1.0

This issue was fixed in the openstack/nova 25.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/nova/+/866087
Committed: https://opendev.org/openstack/nova/commit/3cb1e35b5e3a3f8949bb0fd31fb8a246c5346703
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 3cb1e35b5e3a3f8949bb0fd31fb8a246c5346703
Author: Takashi Kajinami <email address hidden>
Date: Mon Jun 13 14:48:24 2022 +0900

    Retry attachment delete API call for 504 Gateway Timeout

    When cinder-api runs behind a load balancer(eg haproxy), the load
    balancer can return 504 Gateway Timeout when cinder-api does not
    respond within timeout. This change ensures nova retries deleting
    a volume attachment in that case.

    Also this change makes nova ignore 404 in the API call. This is
    required because cinder might continue deleting the attachment even if
    the load balancer returns 504. This also helps us in the situation
    where the volume attachment was accidentally removed by users.

    Conflicts:
        nova/tests/unit/volume/test_cinder.py
        nova/volume/cinder.py

    NOTE(melwitt): The conflicts are due to the following changes not in
    Victoria:

      * I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e
        (Remove six.text_type (1/2))

      * I779bd1446dc1f070fa5100ccccda7881fa508d79
        (Remove six.text_type (2/2))

    Closes-Bug: #1978444
    Change-Id: I593011d9f4c43cdae7a3d53b556c6e2a2b939989
    (cherry picked from commit 8f4b740ca5292556f8e953a30f2a11ed4fbc2945)
    (cherry picked from commit b94ffb1123b1a6cf0a8675e0d6f1072e9625f570)
    (cherry picked from commit 14f9b7627e8a48f546e2f1c79d4336e1e4923501)
    (cherry picked from commit 9b1c078112f11eafbd8e174efbd0e0f9d2c951ee)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/866091
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/866089
Reason: stable/ussuri branch of openstack/nova transitioned to End of Life and is about to be deleted. To be able to do that, all open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova victoria-eom

This issue was fixed in the openstack/nova victoria-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova wallaby-eom

This issue was fixed in the openstack/nova wallaby-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.