nova_wait_for_compute_service can be run before nova_api is up on controllers

Bug #1842948 reported by Emilien Macchi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Oliver Walsh

Bug Description

Initially reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1749443

Both nova_api & nova_wait_for_compute_service containers are started at the step 4.
If we have controllers & computes, we have no way to tell nova_wait_for_compute_service to wait nova_api to be up (which is why some retries were implemented in nova_wait_for_compute_service.py; but this is not reliable enough, see BZ).

We need to look if we can move nova_wait_for_compute_service to the step 5.

tags: added: stein-backport-potential
Revision history for this message
Oliver Walsh (owalsh) wrote :

> We need to look if we can move nova_wait_for_compute_service to the step 5.

IIRC it's there to ensure the nova_compute has initialised before the cell_v2 host discovery in step 5...

Maybe add a step 6 (just for docker, not puppet).

Revision history for this message
Oliver Walsh (owalsh) wrote :

But would that still race? Isn't paunch just launching the docker/podman command and returning immediately? So at the end of step 4 we can't assume nova_api is running - podman/docker could still be pulling the image.

Oliver Walsh (owalsh)
Changed in tripleo:
assignee: nobody → Oliver Walsh (owalsh)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/681042

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: train-3 → ussuri-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/681042
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=8a87cbcc349feb9cbd710e91d9805b0db2b8aba9
Submitter: Zuul
Branch: master

commit 8a87cbcc349feb9cbd710e91d9805b0db2b8aba9
Author: Oliver Walsh <email address hidden>
Date: Mon Sep 9 15:48:23 2019 +0100

    Ensure nova-api is running before starting nova-compute containers

    If nova-api is delayed starting then the nova_wait_for_compute_service
    can timeout. A deployment using a slow/busy remote container repository is
    particularly susceptible to this issue. To resolve this nova_compute and
    nova_wait_for_compute_service have been postponed to step_5 and a task
    has been added to step_4 to ensure nova_api is active before proceeding.

    Change-Id: I6fcbc5cb5d4f3cbb618d9661d2a36c868e18b3d6
    Closes-bug: #1842948

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/688349

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.3.0

This issue was fixed in the openstack/tripleo-heat-templates 11.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/688399
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d80d948fe706420f79783180ef5972fa31844ce6
Submitter: Zuul
Branch: master

commit d80d948fe706420f79783180ef5972fa31844ce6
Author: Martin Schuppert <email address hidden>
Date: Mon Oct 14 14:43:29 2019 +0200

    Fix placement_wait_for_service

    This fix the indent and volumes of the placement_wait_for_service
    and the corresponding placement_wait_for_service.py to use the
    config of the extracted placement service.

    It also
    * changes to set placement::keystone::authtoken::auth_url
    instead of placement::keystone::authtoken::auth_uri as auth_uri is
    deprecated and not supported by placement::keystone::authtoken.
    * sets placement::keystone::authtoken::region_name

    Related-Bug: 1842948

    Change-Id: Ic24cf646efdd70ba1dbca42d3408847fe09a6e49

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/690795

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/688349
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=20b485fe8e58110116cee38cb45e58360caabec0
Submitter: Zuul
Branch: stable/stein

commit 20b485fe8e58110116cee38cb45e58360caabec0
Author: Oliver Walsh <email address hidden>
Date: Mon Sep 9 15:48:23 2019 +0100

    Ensure nova-api is running before starting nova-compute containers

    If nova-api is delayed starting then the nova_wait_for_compute_service
    can timeout. A deployment using a slow/busy remote container repository is
    particularly susceptible to this issue. To resolve this nova_compute and
    nova_wait_for_compute_service have been postponed to step_5 and a task
    has been added to step_4 to ensure nova_api is active before proceeding.

    Conflicts:
      deployment/nova/nova-compute-container-puppet.yaml
      deployment/placement/placement-api-container-puppet.yaml

    Note: Since this is not a direct cherry-pick due to the placement
    extraction in train release, this backport also includes needed
    changed from https://review.opendev.org/688399.

    Change-Id: I6fcbc5cb5d4f3cbb618d9661d2a36c868e18b3d6
    Closes-bug: #1842948
    (cherry picked from commit 8a87cbcc349feb9cbd710e91d9805b0db2b8aba9)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.6.2

This issue was fixed in the openstack/tripleo-heat-templates 10.6.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.