Kolla Ansible installation fails when the rabbitmq instance is not co-resident with the nova controller.

Bug #2020805 reported by Jay Rhine
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Undecided
Unassigned

Bug Description

** Summary **
The change introduced by https://review.opendev.org/c/openstack/kolla-ansible/+/817190 to fix https://bugs.launchpad.net/kolla-ansible/+bug/1946506 breaks Kolla Ansible installations where the rabbitmq instance is not co-resident with the nova controller.

Specifically, the ""{{ project_name }} | Ensure RabbitMQ vhosts exist" task in the "kolla-ansible/ansible/roles/service-rabbitmq/tasks/main.yml" file is delegated to the one of the rabbitmq hosts, but the node parameter is set to the host name of first node of the "nova_cell_conductor_group" group via the ansible_facts.hostname parameter. In the default multi-node configuration provided by kolla-ansible, both the rabbitmq and nova control hosts are the same servers, so this delegation will work correctly because the host name will be the same. However, for a more distributed configuration where the rabbitmq and nova hosts are not the same, the host name passed will be the name of the "original" server NOT the delegated server.

This will result in the same error 'Could not determine the version of the RabbitMQ server' as was seen in the original bug (https://bugs.launchpad.net/kolla-ansible/+bug/1946506) with debug logging enabled. My understanding of the original bug is that the ansible module responsible for adding rabbitmq user would sometimes default to the wrong node name, so by specifying the hostname via ansible_facts.hostname, this problem was eliminated. We have attempted to workaround this problem, by removing the one line introduced by 817190. This was successful in our environment. However, that would restore the originaly problem, so we have also tried a more robust alternative of pointing the node to the hostname for the delegated server. See below for the minor change.

--- /opt/kolla/venv/share/kolla-ansible/ansible/roles/service-rabbitmq/tasks/main.yml.orig 2023-05-18 19:04:13.120194010 +0000
+++ /opt/kolla/venv/share/kolla-ansible/ansible/roles/service-rabbitmq/tasks/main.yml 2023-05-25 16:54:40.514191582 +0000
@@ -20,7 +20,7 @@
         module_args:
           user: "{{ item.user }}"
           password: "{{ item.password }}"
- node: "rabbit@{{ ansible_facts.hostname }}"
+ node: "rabbit@{{ hostvars[service_rabbitmq_delegate_host]['ansible_facts']['hostname'] }}"
           update_password: always
           vhost: "{{ item.vhost }}"
           configure_priv: ".*"

Revision history for this message
Jay Rhine (fredomlover23) wrote :

Verified that this issue still exists in the latest master branch (antelope) as of 7/8/2023

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.