healthcheck can fail if podman inspect was too slow

Bug #1878063 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Emilien Macchi

Bug Description

Deploying Keystone or Nova services, the deploy tasks can fail on this:

fatal: [undercloud]: FAILED! => {
    "msg": "The conditional check ''healthy' not in keystone_infos.containers.0.Healthcheck.Status' failed. The error was: error while evaluating conditional ('healthy' not in keystone_infos.containers.0.Healthcheck.Status): Unable to look up a name or ac
cess an attribute in template string ({% if 'healthy' not in keystone_infos.containers.0.Healthcheck.Status %} True {% else %} False {% endif %}).\nMake sure your variable name does not contain invalid characters like '-': argument of type 'AnsibleUndefin
ed' is not iterable"
}

When the deployment is slow, podman inspect can fail to execute at the first try and report this error.
Add a new condition to the failed_when: so we first check if keystone_infos.containers.0.Healthcheck.Status is defined.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/726913

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/726913
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=21d1f773c72423765b8ec322751f4ab38b1a81a4
Submitter: Zuul
Branch: master

commit 21d1f773c72423765b8ec322751f4ab38b1a81a4
Author: Emilien Macchi <email address hidden>
Date: Mon May 11 13:39:06 2020 -0400

    healthchecks: check if fact is defined before checking its value

    When checking if keystone/nova healthchecks are healthy, make sure the
    registered fact is set (which can slip to a further retry if podman
    inspect took too much time to execute).

    That way, we process the retries without an error like found in the bug
    report.

    Change-Id: I9f5063c9c3b598afd5bd01447f00a1146a20f4c3
    Closes-Bug: #1878063

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.