Parallel Stack Life-cycle in separate-heat-stacks

Bug #1852314 reported by James Slagle
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
James Slagle

Bug Description

Description of problem:
It has been observed that it’s not possible to have more than one simultaneous overcloud update. The issue is not Heat but Mistral-related.

###########
2019-11-05 16:45:36Z [AllNodesDeploySteps]: UPDATE_COMPLETE state changed
2019-11-05 16:45:38Z [edge0-compute]: UPDATE_COMPLETE Stack UPDATE completed successfully

 Stack edge0-compute/d03b5b8d-e26c-472e-9c57-dd4bd8b348eb UPDATE_COMPLETE

Deploying overcloud configuration
Enabling ssh admin (tripleo-admin) for hosts:
192.168.111.213
Using ssh user heat-admin for initial connection.
Using ssh key at /home/stack/.ssh/id_rsa for initial connection.
Inserting TripleO short term key for 192.168.111.213
Warning: Permanently added '192.168.111.213' (ECDSA) to the list of known hosts.
Starting ssh admin enablement workflow
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - COMPLETE.
Removing TripleO short term key from 192.168.111.213
Warning: Permanently added '192.168.111.213' (ECDSA) to the list of known hosts.
Removing short term keys locally
Enabling ssh admin - COMPLETE.
Waiting for messages on queue 'tripleo' with no timeout.
Deployment already in progress with execution 535264f4-d735-4491-bc4f-37356b249622
Overcloud configuration failed.
###########

This issue poses a serious limitation in the usefulness of a single director managing multiple stacks. There have already been customers asking for about 30/40 Edge Locations managed by a single Control Plane (hence a single director). In such a context concurrent life-cycle operation could potentially happen.

Version-Release number of selected component (if applicable):
OSP15

How reproducible:
Deploy central zone, and then two edge zones in parallel.

Steps to Reproduce:
1.
2.
3.

Actual results:
Mistral fails to run config-download

Expected results:
Mistral runs two config-download in parallel

Alternative approach:
To solve this problem, the deployment workflow looks differently.
 - Mistral is executed to only create, and potentially update, the plan and stack. Mistral, in turn, calls Heat. Once it has ended the workflow is completed.
 - Then Ansible - Config-Download - is directly executed without Mistral interactions.

Changed in tripleo:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → James Slagle (james-slagle)
milestone: none → ussuri-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/693870

Changed in tripleo:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.opendev.org/693871

Changed in tripleo:
milestone: ussuri-1 → ussuri-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/693870
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=72f5762e45284543227246c19309e147114e8c07
Submitter: Zuul
Branch: master

commit 72f5762e45284543227246c19309e147114e8c07
Author: James Slagle <email address hidden>
Date: Tue Nov 12 10:59:59 2019 -0500

    Handle config-download in progress in tripleoclient

    This patch adds handling and checking of any instances of the workflow
    tripleo.deployment.v1.config_download_deploy already in progress for the
    current stack. It will prevent duplicate instances of the same workflow
    being started and running at the same time.

    It will allow for multiple instances of the workflow running at the same
    time as long as they are for different stacks.

    Change-Id: Ic8dbf28b5796ff998165b6b73b941f21c65f1dfa
    Closes-Bug: #1852314

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/693871
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=101bff0021c2f772af3a996ef61d3299e4082605
Submitter: Zuul
Branch: master

commit 101bff0021c2f772af3a996ef61d3299e4082605
Author: James Slagle <email address hidden>
Date: Tue Nov 12 11:03:11 2019 -0500

    Remove checking for already running config_download_deploy

    This patch removes the checking for already running instances of the
    tripleo.deployment.v1.config_download_deploy workflow from the workflow
    itself as it's been moved to tripleoclient.

    It is better handled in tripleoclient where the workflow inputs can be
    checked to see if any of the other running workflows are also for the
    current stack. That functionality would have required a custom action to
    do in the workflow itself.

    Change-Id: I6195068a42bfc2469a0b8b006e339e3ca5056dff
    Partial-Bug: #1852314
    Depends-On: Ic8dbf28b5796ff998165b6b73b941f21c65f1dfa

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/701592

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/701593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/701592
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=6367d58c248dc0884ba8cfe586fd0c973b46206c
Submitter: Zuul
Branch: stable/train

commit 6367d58c248dc0884ba8cfe586fd0c973b46206c
Author: James Slagle <email address hidden>
Date: Tue Nov 12 10:59:59 2019 -0500

    Handle config-download in progress in tripleoclient

    This patch adds handling and checking of any instances of the workflow
    tripleo.deployment.v1.config_download_deploy already in progress for the
    current stack. It will prevent duplicate instances of the same workflow
    being started and running at the same time.

    It will allow for multiple instances of the workflow running at the same
    time as long as they are for different stacks.

    Change-Id: Ic8dbf28b5796ff998165b6b73b941f21c65f1dfa
    Closes-Bug: #1852314
    (cherry picked from commit 72f5762e45284543227246c19309e147114e8c07)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/train)

Reviewed: https://review.opendev.org/701593
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=74159dc11daa24fe66e6b74c253083da52a56732
Submitter: Zuul
Branch: stable/train

commit 74159dc11daa24fe66e6b74c253083da52a56732
Author: James Slagle <email address hidden>
Date: Tue Nov 12 11:03:11 2019 -0500

    Remove checking for already running config_download_deploy

    This patch removes the checking for already running instances of the
    tripleo.deployment.v1.config_download_deploy workflow from the workflow
    itself as it's been moved to tripleoclient.

    It is better handled in tripleoclient where the workflow inputs can be
    checked to see if any of the other running workflows are also for the
    current stack. That functionality would have required a custom action to
    do in the workflow itself.

    Change-Id: I6195068a42bfc2469a0b8b006e339e3ca5056dff
    Partial-Bug: #1852314
    Depends-On: Ic8dbf28b5796ff998165b6b73b941f21c65f1dfa
    (cherry picked from commit 101bff0021c2f772af3a996ef61d3299e4082605)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 13.1.0

This issue was fixed in the openstack/python-tripleoclient 13.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 12.4.0

This issue was fixed in the openstack/python-tripleoclient 12.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.