Some delegated tasks are failing when deployer is node part of a deployment (eg. controller node)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Fix Released
|
Undecided
|
Jean-Philippe Evrard |
Bug Description
With this os_nova patch : https:/
I noticed than the task : https:/
OSA often assume that the deployer is a controller for example. When this is the case, /etc/hosts is populated by OSA playbook so the task is working
When the deployer is an "outside" host, /etc/hosts is not populated and the previous task fails.
Steps to reproduce the issue.
- Deploy OSA from a remote hosts : task is failing
- Copy over /etc/hosts file from a controller to the remote deployer and add entries to /etc/hosts
- re-run the playbook : task is successfull.
I'm not sure on what would be the better way to fix that.
Changed in openstack-ansible: | |
status: | New → Fix Released |
Changed in openstack-ansible: | |
assignee: | nobody → Jean-Philippe Evrard (jean-philippe-evrard) |
tl;dr The problem is that Ansible, at least in 2.3.3.0, lazily evaluates the templated random_conductor variable in the delegate_to field twice in running os_nova's "Perform a cell_v2 discover" and if it differs between the two evaluations, which is possible because random_conductor uses the "random" filter, then Ansible can't use the remote address stored in inventory for the delegated host and falls back to using just the inventory name for ssh which fails if ran on a non-openstack- ansible- deployed host where openstack-ansible hasn't added the inventoried hosts to /etc/hosts.
Steps to reproduce: .yml") from a host that openstack-ansible is not deploying to (ie one that doesn't have the openstack_hosts role or in inventory at all; eg *not* an infra node which seems to be a common use-case).
1. Create an openstack-ansible deployment environment that contains multiple nova-conductor containers (ie multiple infra hosts).
2. Deploy openstack-ansible (ie "openstack-ansible setup-openstack
3. The failure will occur with (N-1)/N probability where N is the number of nova-conductor containers (because that's the odds of randomly selecting a different nova-conductor than the first one selected).
Disclaimer: The failure only occurs with (N-1)/N probability, where N is the number of nova-conductor containers.
Symptoms:
1. os_nova's "Perform a cell_v2 discover" task will fail.
2. A nova-compute container will be reported by Ansible as "unreachable".
3. With "-vvvv", "ssh: Could not resolve hostname <insert nova-conductor container name here>: Name or service not known" is reported by Ansible during os_nova's "Perform a cell_v2 discover" task.
4. With ANSIBLE_DEBUG=1, "no remote address found for delegated host <insert nova-conductor container name here>\nusing its name, so success depends on DNS resolution" is reported by Ansible during os_nova's "Perform a cell_v2 discover" task.
Work-around: random_ conductor] ['ansible_ host'] }}"`. .yml") from an openstack-ansible deployed node (eg an infra node). openstack- host-hostfile- setup.sh)
1. Change `delegate_to: "{{ random_conductor }}"` to `delegate_to: "{{ hostvars[
1. Deploy openstack-ansible (ie "openstack-ansible setup-openstack
2. Populate /etc/hosts with inventoried nodes (eg run /var/tmp/
Explanation: nova_services[ 'nova-conductor ']['group' ]] | random }}". host_name = templar. template( task.delegate_ to, fail_on_ undefined= False)" in ansible/ vars/__ init__. py, _get_delegated_ vars(). host_name = templar. template( task.delegate_ to)" in ansible/ playbook/ play_context. py, set_task_ and_variable_ override( ).
1. random_conductor is templated as "{{ groups[
2. Ansible lazily evaluates this twice:
2a. When fetching the host vars for the delegate_to host; see "delegated_
2b. When setting the delegated host for the task; see "delegated_
3. The lazy evaluation of the "random" filter introduces a 1/N probability that the delegated host vars match the delegated host.
4. When Ansible looks for the delegated host's remote address (ie ansible_ssh_host or ansible_host) in the play's delegated host vars and doesn't find it (because Ansible only populated the delegated host vars with the previously lazily evaluated inventory name),...