healthcheck can fail if podman inspect was too slow

Bug #1878063 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Emilien Macchi

Bug Description

Deploying Keystone or Nova services, the deploy tasks can fail on this:

fatal: [undercloud]: FAILED! => {
    "msg": "The conditional check ''healthy' not in keystone_infos.containers.0.Healthcheck.Status' failed. The error was: error while evaluating conditional ('healthy' not in keystone_infos.containers.0.Healthcheck.Status): Unable to look up a name or ac
cess an attribute in template string ({% if 'healthy' not in keystone_infos.containers.0.Healthcheck.Status %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'AnsibleUndefin
ed' is not iterable"
}

When the deployment is slow, podman inspect can fail to execute at the first try and report this error.
Add a new condition to the failed_when: so we first check if keystone_infos.containers.0.Healthcheck.Status is defined.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/726913

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/726913
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=21d1f773c72423765b8ec322751f4ab38b1a81a4
Submitter: Zuul
Branch: master

commit 21d1f773c72423765b8ec322751f4ab38b1a81a4
Author: Emilien Macchi <email address hidden>
Date: Mon May 11 13:39:06 2020 -0400

    healthchecks: check if fact is defined before checking its value

    When checking if keystone/nova healthchecks are healthy, make sure the
    registered fact is set (which can slip to a further retry if podman
    inspect took too much time to execute).

    That way, we process the retries without an error like found in the bug
    report.

    Change-Id: I9f5063c9c3b598afd5bd01447f00a1146a20f4c3
    Closes-Bug: #1878063

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers