[stable-only] openstack overcloud delete stalls forever if nested heat stack delete fails

Bug #1877258 reported by Rabi Mishra
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Medium
Rabi Mishra

Bug Description

Description of problem:

Command "openstack overcloud delete overcloud" stalls forever if nested "openstack stack delete overcloud" fails.

We seem to be retrying stack listing when the stack_status is DELETE_FAILED which seems incorrect as we're retrying stack deletion.

(undercloud) [stack@undercloud ~]$ openstack overcloud delete overcloud
Are you sure you want to delete this overcloud [y/N]? y
Undeploying stack overcloud...
Waiting for messages on queue 'tripleo' with no timeout.

Stalling here forever. Looking on the stack you see:

(undercloud) [stack@undercloud ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+---------------+----------------------+----------------------+
| ID | Stack Name | Project | Stack Status | Creation Time | Updated Time |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+----------------------+
| 402af324-9154-4407-94c3-b9455bcaf64d | overcloud | 2545877eca2c41f797499311045a3566 | DELETE_FAILED | 2020-02-18T12:51:22Z | 2020-02-18T13:40:32Z |
+--------------------------------------+------------+----------------------------------+---------------+----------------------+----------------------+
(undercloud) [stack@undercloud ~]$ openstack stack delete overcloud
Are you sure you want to delete this stack(s) [y/N]? y
(undercloud) [stack@undercloud ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+
| ID | Stack Name | Project | Stack Status | Creation Time | Updated Time |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+
| 402af324-9154-4407-94c3-b9455bcaf64d | overcloud | 2545877eca2c41f797499311045a3566 | DELETE_IN_PROGRESS | 2020-02-18T12:51:22Z | 2020-02-18T14:06:52Z |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+

If the stack delete succeeded, the first command continued with:
Deleting plan overcloud...
Success.
(undercloud) [stack@undercloud ~]$

How reproducible:
Always, when the nested stack delete failed.

Steps to Reproduce:
1. openstack overcloud delete overcloud
2. the nested stack delete fails (like described in bug 1804256)
3. the first command stalls (probably forever, but I gave up after 30 mins)

Rabi Mishra (rabi)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Rabi Mishra (rabi)
summary: - [stable-only] openstack overcloud delete overcloud stalls forever if
- nested heat stack delete fails
+ [stable-only] openstack overcloud delete stalls forever if nested heat
+ stack delete fails
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/726066

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/train)

Reviewed: https://review.opendev.org/726066
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=183c206df432cda24160acbd9a4a4a70602ff1b8
Submitter: Zuul
Branch: stable/train

commit 183c206df432cda24160acbd9a4a4a70602ff1b8
Author: Rabi Mishra <email address hidden>
Date: Thu May 7 13:52:00 2020 +0530

    [stable-only] check for stack status IN_PROGRESS to retry

    Looks like we're retrying listing of stacks when the stack_status
    is DELETE_FAILED. As we're not retrying stack deletion again listing
    the stack again would not change the stack_status automagically,
    after it's marked DELETE_FAILED.

    stack delete call won't return before it's marked DELETE_IN_PROGRESS.
    We can check for stack exists and stack_status DELETE_IN_PROGRESS to
    retry.

    Mistral 'continue-on':
    Defines an expression that will continue iteration loop if it evaluates
    to ‘true’.

    Closes-Bug: #1877258
    Change-Id: I4885e820868f746e793dacc7eb0c2155f3836d5c

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/731435

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/stein)

Reviewed: https://review.opendev.org/731435
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c8e0b809205c4c2bbc78b7f1e4935239cf61a8d9
Submitter: Zuul
Branch: stable/stein

commit c8e0b809205c4c2bbc78b7f1e4935239cf61a8d9
Author: Rabi Mishra <email address hidden>
Date: Thu May 7 13:52:00 2020 +0530

    [stable-only] check for stack status IN_PROGRESS to retry

    Looks like we're retrying listing of stacks when the stack_status
    is DELETE_FAILED. As we're not retrying stack deletion again listing
    the stack again would not change the stack_status automagically,
    after it's marked DELETE_FAILED.

    stack delete call won't return before it's marked DELETE_IN_PROGRESS.
    We can check for stack exists and stack_status DELETE_IN_PROGRESS to
    retry.

    Mistral 'continue-on':
    Defines an expression that will continue iteration loop if it evaluates
    to ‘true’.

    Closes-Bug: #1877258
    Change-Id: I4885e820868f746e793dacc7eb0c2155f3836d5c
    (cherry picked from commit 183c206df432cda24160acbd9a4a4a70602ff1b8)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common stein-eol

This issue was fixed in the openstack/tripleo-common stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.