Host errors are not fatal

Bug #1833737 reported by Mark Goddard
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Mark Goddard
Antelope
Confirmed
High
Unassigned
Bobcat
Fix Released
High
Mark Goddard
Yoga
New
High
Unassigned
Zed
Confirmed
High
Unassigned

Bug Description

Currently we are using the default value of 'false' for
'any_errors_fatal'. This means that plays will execute until the end or
all hosts have failed. A host can fail and others may continue. This can
lead to odd situations, and makes errors harder to diagnose.

One specific example of where this can be an issue is during upgrade. If
one host fails early on, then it will not be upgraded. For clustered
services such as RabbitMQ or MariaDB this can cause problems. Many
OpenStack services have specific instructions for rolling upgrades which
would not work well if one or more hosts are not involved.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/666876

Changed in kolla-ansible:
assignee: nobody → Mark Goddard (mgoddard)
status: New → In Progress
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Set any_error_fatal true for gather-facts

https://review.opendev.org/c/openstack/kolla-ansible/+/805174

    Kolla Ansible will now fail execution early if fact collection fails
    on any of the hosts to avoid late failures due to missing facts
    (especially cross-host). This is done by setting ``any_errors_fatal: true``.
    Do note this still supports host fact caching and it will not affect
    scenarios with all facts cached (as there is no task to fail).

Changed in kolla-ansible:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by "Mark Goddard <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/666876
Reason: This approach is better: https://review.opendev.org/c/openstack/kolla-ansible/+/805598

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/805174
Committed: https://opendev.org/openstack/kolla-ansible/commit/5b431f0f7f932293362a01703673ee05d0a5bd8d
Submitter: "Zuul (22348)"
Branch: master

commit 5b431f0f7f932293362a01703673ee05d0a5bd8d
Author: Radosław Piliszek <email address hidden>
Date: Thu Aug 19 09:37:46 2021 +0000

    Allow setting any_errors_fatal true for gather-facts

    Kolla Ansible now supports failing execution early if fact collection
    fails on any of the hosts. This is to avoid late failures due to missing
    facts (especially cross-host).

    Change-Id: I7a74b937ded0b9da0621cf413f3a5d0d13a2cd68
    Partial-Bug: #1833737

Revision history for this message
German E (gespinozat) wrote :

Are you planning to backport this patch?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/805598
Committed: https://opendev.org/openstack/kolla-ansible/commit/af6e1ca4fd3366c1ae936d225b2835e95a4191fc
Submitter: "Zuul (22348)"
Branch: master

commit af6e1ca4fd3366c1ae936d225b2835e95a4191fc
Author: Mark Goddard <email address hidden>
Date: Mon Aug 16 17:03:22 2021 +0100

    Support Ansible max_fail_percentage

    This allows us to continue execution until a certain proportion of hosts
    to fail. This can be useful at scale, where failures are common, and
    restarting a deployment is time-consuming.

    The default max failure percentage is 100, keeping the default
    behaviour. A global max failure percentage may be set via
    kolla_max_fail_percentage, and individual services may define a max
    failure percentage via <service>_max_fail_percentage.

    Note that all hosts in the inventory must be reachable for fact
    gathering, even those not included in a --limit.

    Closes-Bug: #1833737
    Change-Id: I808474a75c0f0e8b539dc0421374b06cea44be4f

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 18.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 18.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.