Some delegated tasks are failing when deployer is node part of a deployment (eg. controller node)

Bug #1762742 reported by Kevin Lefevre
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Jean-Philippe Evrard

Bug Description

With this os_nova patch : https://github.com/openstack/openstack-ansible-os_nova/commit/c581c2af7b2bfdf60fb4467d7d5b17bed32fc85f

I noticed than the task : https://github.com/openstack/openstack-ansible-os_nova/blob/stable/pike/tasks/main.yml#L132 is failing when the OSA deployer is a remote hosts

OSA often assume that the deployer is a controller for example. When this is the case, /etc/hosts is populated by OSA playbook so the task is working

When the deployer is an "outside" host, /etc/hosts is not populated and the previous task fails.

Steps to reproduce the issue.

- Deploy OSA from a remote hosts : task is failing
- Copy over /etc/hosts file from a controller to the remote deployer and add entries to /etc/hosts
- re-run the playbook : task is successfull.

I'm not sure on what would be the better way to fix that.

Revision history for this message
Corey Wright (coreywright) wrote :
Download full text (3.9 KiB)

tl;dr The problem is that Ansible, at least in 2.3.3.0, lazily evaluates the templated random_conductor variable in the delegate_to field twice in running os_nova's "Perform a cell_v2 discover" and if it differs between the two evaluations, which is possible because random_conductor uses the "random" filter, then Ansible can't use the remote address stored in inventory for the delegated host and falls back to using just the inventory name for ssh which fails if ran on a non-openstack-ansible-deployed host where openstack-ansible hasn't added the inventoried hosts to /etc/hosts.

Steps to reproduce:
1. Create an openstack-ansible deployment environment that contains multiple nova-conductor containers (ie multiple infra hosts).
2. Deploy openstack-ansible (ie "openstack-ansible setup-openstack.yml") from a host that openstack-ansible is not deploying to (ie one that doesn't have the openstack_hosts role or in inventory at all; eg *not* an infra node which seems to be a common use-case).
3. The failure will occur with (N-1)/N probability where N is the number of nova-conductor containers (because that's the odds of randomly selecting a different nova-conductor than the first one selected).

Disclaimer: The failure only occurs with (N-1)/N probability, where N is the number of nova-conductor containers.

Symptoms:
1. os_nova's "Perform a cell_v2 discover" task will fail.
2. A nova-compute container will be reported by Ansible as "unreachable".
3. With "-vvvv", "ssh: Could not resolve hostname <insert nova-conductor container name here>: Name or service not known" is reported by Ansible during os_nova's "Perform a cell_v2 discover" task.
4. With ANSIBLE_DEBUG=1, "no remote address found for delegated host <insert nova-conductor container name here>\nusing its name, so success depends on DNS resolution" is reported by Ansible during os_nova's "Perform a cell_v2 discover" task.

Work-around:
1. Change `delegate_to: "{{ random_conductor }}"` to `delegate_to: "{{ hostvars[random_conductor]['ansible_host'] }}"`.
1. Deploy openstack-ansible (ie "openstack-ansible setup-openstack.yml") from an openstack-ansible deployed node (eg an infra node).
2. Populate /etc/hosts with inventoried nodes (eg run /var/tmp/openstack-host-hostfile-setup.sh)

Explanation:
1. random_conductor is templated as "{{ groups[nova_services['nova-conductor']['group']] | random }}".
2. Ansible lazily evaluates this twice:
2a. When fetching the host vars for the delegate_to host; see "delegated_host_name = templar.template(task.delegate_to, fail_on_undefined=False)" in ansible/vars/__init__.py, _get_delegated_vars().
2b. When setting the delegated host for the task; see "delegated_host_name = templar.template(task.delegate_to)" in ansible/playbook/play_context.py, set_task_and_variable_override().
3. The lazy evaluation of the "random" filter introduces a 1/N probability that the delegated host vars match the delegated host.
4. When Ansible looks for the delegated host's remote address (ie ansible_ssh_host or ansible_host) in the play's delegated host vars and doesn't find it (because Ansible only populated the delegated host vars with the previously lazily evaluated inventory name),...

Read more...

Changed in openstack-ansible:
assignee: nobody → Corey Wright (coreywright)
Revision history for this message
Corey Wright (coreywright) wrote :

Adding test case that succinctly demonstrates the problem by executing "ANSIBLE_DEBUG=1 openstack-ansible -vvvv -i inventory/dynamic_inventory.py test_case.yaml".

Revision history for this message
Corey Wright (coreywright) wrote :
Revision history for this message
Corey Wright (coreywright) wrote :

So one quick-and-dirty solution is to resolve the delegate host to an IP address within the templated delegate_to field because Ansible can't (because this resolves the IP using all available variables, while Ansible only resolves delegate hosts using pre-selected "delegated" host vars).

I call it dirty because it duplicates the Ansible functionality in the template, ie looking up the IP address, and Ansible still evaluates the template twice, but this way it evaluates it to two different IP addresses instead of two different inventory names.

Revision history for this message
Corey Wright (coreywright) wrote :

To clarify on my last comment, "Ansible still evaluates the template twice": Ansible will always lazily evaluate the delegate_to field twice, but what I meant is that there's still the probability that it is evaluated twice *to two separate values* (inventory names in the original code, but IP addresses in the patch).

Revision history for this message
Corey Wright (coreywright) wrote :

here's a more representative test case / set of files that mimics os_nova's main.yml and its included nova_db_post_setup.yml: test_case.yaml & test_case-include.yaml.

Revision history for this message
Corey Wright (coreywright) wrote :
Revision history for this message
Corey Wright (coreywright) wrote :

after adding debug logging to Ansible to make obvious what it chooses for "delegated_host_name" (in both cases: when storing host vars in ansible_delegated_vars and when connecting to the delegated host) moving random_conductor from a task var to a fact evaluates it once when storing it as a fact and then uses the evaluated value in the fact from then on, so it never changes when referenced.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/560395

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Given that the task is now a run_once task and not run for every host, I think we can simplify the delegation to just be done to the first member of the conductor group. The randomisation is confusing things here and creating another layer of abstraction.

Revision history for this message
Corey Wright (coreywright) wrote :

I've unassigned myself from this bug and will defer to https://review.openstack.org/560395.

Changed in openstack-ansible:
assignee: Corey Wright (coreywright) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/560920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/560923

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/560924

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_nova (master)

Reviewed: https://review.openstack.org/560395
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_nova/commit/?id=db5a6181c357c4af629450fa1f48773144be17af
Submitter: Zuul
Branch: master

commit db5a6181c357c4af629450fa1f48773144be17af
Author: Maxime Guyot <email address hidden>
Date: Wed Apr 11 13:59:12 2018 +0200

    Use first conductor for nova-manage cell_v2 discover_hosts

    Using a random conductor leads to unpredictable results with lazy eval

    Change-Id: I846f78c80e4b5eb8421dbdebf3fc943be0bc84df
    Closed-Bug: 1762742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_nova (stable/pike)

Reviewed: https://review.openstack.org/560920
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_nova/commit/?id=aa5446fd26b542937b7a4ed64188281c3321fe2c
Submitter: Zuul
Branch: stable/pike

commit aa5446fd26b542937b7a4ed64188281c3321fe2c
Author: Maxime Guyot <email address hidden>
Date: Wed Apr 11 13:59:12 2018 +0200

    Use first conductor for nova-manage cell_v2 discover_hosts

    Using a random conductor leads to unpredictable results with lazy eval

    Change-Id: I846f78c80e4b5eb8421dbdebf3fc943be0bc84df
    Closed-Bug: 1762742
    (cherry picked from commit db5a6181c357c4af629450fa1f48773144be17af)

tags: added: in-stable-pike
tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_nova (stable/queens)

Reviewed: https://review.openstack.org/560924
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_nova/commit/?id=ad9bcb08150db26c09150f0ed5442522ad4de53b
Submitter: Zuul
Branch: stable/queens

commit ad9bcb08150db26c09150f0ed5442522ad4de53b
Author: Maxime Guyot <email address hidden>
Date: Wed Apr 11 13:59:12 2018 +0200

    Use first conductor for nova-manage cell_v2 discover_hosts

    Using a random conductor leads to unpredictable results with lazy eval

    Change-Id: I846f78c80e4b5eb8421dbdebf3fc943be0bc84df
    Closed-Bug: 1762742
    (cherry picked from commit db5a6181c357c4af629450fa1f48773144be17af)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_nova (stable/ocata)

Reviewed: https://review.openstack.org/560923
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_nova/commit/?id=7a860832b0a731838f0b3413736460f80fb76395
Submitter: Zuul
Branch: stable/ocata

commit 7a860832b0a731838f0b3413736460f80fb76395
Author: Maxime Guyot <email address hidden>
Date: Wed Apr 11 13:59:12 2018 +0200

    Use first conductor for nova-manage cell_v2 discover_hosts

    Using a random conductor leads to unpredictable results with lazy eval

    Change-Id: I846f78c80e4b5eb8421dbdebf3fc943be0bc84df
    Closed-Bug: 1762742
    (cherry picked from commit db5a6181c357c4af629450fa1f48773144be17af)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/561167

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/561170

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/queens)

Reviewed: https://review.openstack.org/561162
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=1753c12bd1329eea528409293df8f20192441288
Submitter: Zuul
Branch: stable/queens

commit 1753c12bd1329eea528409293df8f20192441288
Author: Maxime Guyot <email address hidden>
Date: Fri Apr 13 10:50:42 2018 +0200

    Update os_nova SHA to fix nova-manage cell_v2 discover_hosts

    Closes-Bug: 1762742
    Change-Id: I75301666defda70e7102a3f10bf6210aabf9a156

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/ocata)

Reviewed: https://review.openstack.org/561170
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=087766bf968781ecfcfe5c0dab630eda978b484b
Submitter: Zuul
Branch: stable/ocata

commit 087766bf968781ecfcfe5c0dab630eda978b484b
Author: Maxime Guyot <email address hidden>
Date: Fri Apr 13 11:06:22 2018 +0200

    Update os_nova SHA to fix nova-manage cell_v2 discover_hosts

    Change-Id: I9337fe6d5b4f27456415c1bb72f52bab99a00588
    Closes-Bug: 1762742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/pike)

Reviewed: https://review.openstack.org/561167
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=d8202e8726ca190f35682a426d830eb82da1192d
Submitter: Zuul
Branch: stable/pike

commit d8202e8726ca190f35682a426d830eb82da1192d
Author: Maxime Guyot <email address hidden>
Date: Fri Apr 13 11:04:51 2018 +0200

    Update os_nova SHA to fix nova-manage cell_v2 discover_hosts

    Change-Id: I585bf653e112429ab71930d5b0d648dad0b63b59
    Closes-Bug: 1762742

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 17.0.2

This issue was fixed in the openstack/openstack-ansible 17.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 15.1.19

This issue was fixed in the openstack/openstack-ansible 15.1.19 release.

Changed in openstack-ansible:
status: New → Fix Released
Changed in openstack-ansible:
assignee: nobody → Jean-Philippe Evrard (jean-philippe-evrard)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 16.0.12

This issue was fixed in the openstack/openstack-ansible 16.0.12 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.