nova_wait_for_compute_service can be run before nova_api is up on controllers

Bug #1842948 reported by Emilien Macchi on 2019-09-05
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Oliver Walsh

Bug Description

Initially reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1749443

Both nova_api & nova_wait_for_compute_service containers are started at the step 4.
If we have controllers & computes, we have no way to tell nova_wait_for_compute_service to wait nova_api to be up (which is why some retries were implemented in nova_wait_for_compute_service.py; but this is not reliable enough, see BZ).

We need to look if we can move nova_wait_for_compute_service to the step 5.

tags: added: stein-backport-potential
Oliver Walsh (owalsh) wrote :

> We need to look if we can move nova_wait_for_compute_service to the step 5.

IIRC it's there to ensure the nova_compute has initialised before the cell_v2 host discovery in step 5...

Maybe add a step 6 (just for docker, not puppet).

Oliver Walsh (owalsh) wrote :

But would that still race? Isn't paunch just launching the docker/podman command and returning immediately? So at the end of step 4 we can't assume nova_api is running - podman/docker could still be pulling the image.

Oliver Walsh (owalsh) on 2019-09-06
Changed in tripleo:
assignee: nobody → Oliver Walsh (owalsh)

Fix proposed to branch: master
Review: https://review.opendev.org/681042

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: train-3 → ussuri-1

Reviewed: https://review.opendev.org/681042
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=8a87cbcc349feb9cbd710e91d9805b0db2b8aba9
Submitter: Zuul
Branch: master

commit 8a87cbcc349feb9cbd710e91d9805b0db2b8aba9
Author: Oliver Walsh <email address hidden>
Date: Mon Sep 9 15:48:23 2019 +0100

    Ensure nova-api is running before starting nova-compute containers

    If nova-api is delayed starting then the nova_wait_for_compute_service
    can timeout. A deployment using a slow/busy remote container repository is
    particularly susceptible to this issue. To resolve this nova_compute and
    nova_wait_for_compute_service have been postponed to step_5 and a task
    has been added to step_4 to ensure nova_api is active before proceeding.

    Change-Id: I6fcbc5cb5d4f3cbb618d9661d2a36c868e18b3d6
    Closes-bug: #1842948

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/tripleo-heat-templates 11.3.0 release.

Reviewed: https://review.opendev.org/688399
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d80d948fe706420f79783180ef5972fa31844ce6
Submitter: Zuul
Branch: master

commit d80d948fe706420f79783180ef5972fa31844ce6
Author: Martin Schuppert <email address hidden>
Date: Mon Oct 14 14:43:29 2019 +0200

    Fix placement_wait_for_service

    This fix the indent and volumes of the placement_wait_for_service
    and the corresponding placement_wait_for_service.py to use the
    config of the extracted placement service.

    It also
    * changes to set placement::keystone::authtoken::auth_url
    instead of placement::keystone::authtoken::auth_uri as auth_uri is
    deprecated and not supported by placement::keystone::authtoken.
    * sets placement::keystone::authtoken::region_name

    Related-Bug: 1842948

    Change-Id: Ic24cf646efdd70ba1dbca42d3408847fe09a6e49

Reviewed: https://review.opendev.org/688349
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=20b485fe8e58110116cee38cb45e58360caabec0
Submitter: Zuul
Branch: stable/stein

commit 20b485fe8e58110116cee38cb45e58360caabec0
Author: Oliver Walsh <email address hidden>
Date: Mon Sep 9 15:48:23 2019 +0100

    Ensure nova-api is running before starting nova-compute containers

    If nova-api is delayed starting then the nova_wait_for_compute_service
    can timeout. A deployment using a slow/busy remote container repository is
    particularly susceptible to this issue. To resolve this nova_compute and
    nova_wait_for_compute_service have been postponed to step_5 and a task
    has been added to step_4 to ensure nova_api is active before proceeding.

    Conflicts:
      deployment/nova/nova-compute-container-puppet.yaml
      deployment/placement/placement-api-container-puppet.yaml

    Note: Since this is not a direct cherry-pick due to the placement
    extraction in train release, this backport also includes needed
    changed from https://review.opendev.org/688399.

    Change-Id: I6fcbc5cb5d4f3cbb618d9661d2a36c868e18b3d6
    Closes-bug: #1842948
    (cherry picked from commit 8a87cbcc349feb9cbd710e91d9805b0db2b8aba9)

tags: added: in-stable-stein
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers