Problem in openstack-ansible-rabbitmq_server roles when try to cluster the rabbitMQ

Bug #1658670 reported by Nagaraj Hegde
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Expired
Undecided
Unassigned

Bug Description

Issues are below:

1. The variable ansible_host in line number 47 in file rabbitmq_pre_install.yml is wrong and theres is no variable with such a name and we see that it should be inventory_hostname.

2. We tried RabbitMQ clustering with 3 nodes with different scenarios as below:

Scenario 1: While installing rabbitMQ node alone for clustering, key and certificate will be generated in the first node (from inventory file) from the file rabbitmq_ssl_self_signed.yml of "openstack-ansible-rabbitmq_server" roles and the same key and certificate will be distributed across the other nodes using rabbitmq_ssl_key_distribute.yml of "openstack-ansible-rabbitmq_server" roles. The clustering has been done successfully.

Scenario 2: While installing rabbitMQ node with clustering in integrated environment with following order nginx, keepalived and rabbitmq respectively,
instead of picking the first node (from inventory file), it picks the node randomly and while generating the key and certification the conditions present in the file rabbitmq_ssl_self_signed.yml of "openstack-ansible-rabbitmq_server" roles will fail. Hence it directly goes for distributing the key and certificate to the other nodes (Which doesnt even have the key and certificate generated) using the file rabbitmq_ssl_key_distribute.yml of "openstack-ansible-rabbitmq_server". This behaviour is wrong and it should not happen.

Expected behaviour is: The key and certificate generation should happen in whichever the node it picks first and the same key and certificate should be distributed across the other nodes.
So the logic present in rabbitmq_ssl_self_signed.yml and rabbitmq_ssl_key_distribute.yml of "openstack-ansible-rabbitmq_server" role has to be changed in order to fix this issue.

Please let us know for any clarification.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

What version of ansible and openstack-ansible are you running? (what branch?).

On what kind of host are you running ansible? What is the OS of your target nodes?

What is the "integrated environment" you are talking about, is that an OSA environment, or your own? If the latter, what does your inventory look like?

Thanks for the clarifications.

Revision history for this message
Nagaraj Hegde (nhegde) wrote :

ansible version is: ansible 2.2.0.0
openstack ansible roles is from https://github.com/mayankdiatm/openstack-ansible-rabbitmq_server
Linux redhad 7.3 is the platform for running ansible.
This above platform is brought up with our own image from an Openstack as an instance.
We were trying to install NGINX, KEEP-ALIVED and REBBITMQ respectively as part of single script on the machine(in above mentioned platform) which we brought up(These we were installing to achieve high availabilty of the hosts for rabbitMQ by clustering).

Here while installing RabbitMQ we use the default role "openstack-ansible-rabbitmq_server" provided by openstack-ansible. We see the issue in the above role provided as mentioned in the bug description.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

This is not our code.... our code is mirrored in
https://github.com/openstack/openstack-ansible-rabbitmq_server
(source on openstack git repos).

Thanks.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

My bad I didn't see it was a fork.

DO you have the same issue with latest version of https://github.com/openstack/openstack-ansible-rabbitmq_server ?

Revision history for this message
Nagaraj Hegde (nhegde) wrote :

Yes whatever issue we see is with the latest version only.

Revision history for this message
Jimmy McCrory (jimmy-mccrory) wrote :

Could you run ansible in verbose mode (-vvv) and provide output of the particular tasks within the role that are failing?

Revision history for this message
Nagaraj Hegde (nhegde) wrote :
Download full text (361.6 KiB)

Please find the error below for the pomit num 2(scenario 2) mentioned above:

2017-01-18 04:34:19,483 p=32630 u=root | Using /tmp/ansible/ansible.cfg as config file
2017-01-18 04:34:19,489 p=32630 u=root | [WARNING]: provided hosts list is empty, only localhost is available

2017-01-18 04:34:19,585 p=32630 u=root | Loading callback plugin debug of type stdout, v2.0 from /usr/lib/python2.7/site-packages/ansible/plugins/callback/__init__.pyc
2017-01-18 04:34:19,650 p=32630 u=root | PLAYBOOK: cbnd-installer.yml ***************************************************
2017-01-18 04:34:19,650 p=32630 u=root | 2 plays in /tmp/ansible/cbnd-installer.yml
2017-01-18 04:34:19,655 p=32630 u=root | PLAY [add file based cbnd repo] ************************************************
2017-01-18 04:34:19,714 p=32630 u=root | TASK [setup] *******************************************************************
2017-01-18 04:34:19,995 p=32630 u=root | Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/system/setup.py
2017-01-18 04:34:20,725 p=32630 u=root | ok: [localhost]
2017-01-18 04:34:20,727 p=32630 u=root | TASK [install apache httpd] ****************************************************
2017-01-18 04:34:20,728 p=32630 u=root | task path: /tmp/ansible/cbnd-installer.yml:13
2017-01-18 04:34:20,927 p=32630 u=root | Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/packaging/os/yum.py
2017-01-18 04:34:21,456 p=32630 u=root | ok: [localhost] => {
    "changed": false,
    "invocation": {
        "module_args": {
            "conf_file": null,
            "disable_gpg_check": false,
            "disablerepo": null,
            "enablerepo": null,
            "exclude": null,
            "install_repoquery": true,
            "list": null,
            "name": [
                "httpd"
            ],
            "state": "present",
            "update_cache": false,
            "validate_certs": true
        },
        "module_name": "yum"
    },
    "rc": 0,
    "results": [
        "httpd-2.4.6-45.el7.x86_64 providing httpd is already installed"
    ]
}
2017-01-18 04:34:21,459 p=32630 u=root | TASK [change default port] *****************************************************
2017-01-18 04:34:21,459 p=32630 u=root | task path: /tmp/ansible/cbnd-installer.yml:17
2017-01-18 04:34:21,603 p=32630 u=root | Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/files/lineinfile.py
2017-01-18 04:34:21,847 p=32630 u=root | ok: [localhost] => {
    "backup": "",
    "changed": false,
    "diff": [
        {
            "after": "",
            "after_header": "/etc/httpd/conf/httpd.conf (content)",
            "before": "",
            "before_header": "/etc/httpd/conf/httpd.conf (content)"
        },
        {
            "after_header": "/etc/httpd/conf/httpd.conf (file attributes)",
            "before_header": "/etc/httpd/conf/httpd.conf (file attributes)"
        }
    ],
    "invocation": {
        "module_args": {
            "backrefs": false,
            "backup": false,
            "content": null,
            "create": false,
            "delimiter": null,
            ...

Changed in openstack-ansible:
status: New → Invalid
Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :
Revision history for this message
Nagaraj Hegde (nhegde) wrote :

Hi,
In the conversation it is mentioned that because of role failure in rabbit_host_group[0] the var is not set. But actually what happened is that:
In inventory we had three Ip for example:
[amqp]
1.1.1.1
2.2.2.2
3.3.3.3

when we run only rabbitmq ansible playbook then the installation order is 1.1.1.1,2.2.2.2 and 3.3.3.3 and installation is successful.

But when we try to install rqbbitmq along with NGINX,Keepalived (order: NGINX installed first ,then keepalived and then rabbitmq) then when rabbitmq installation triggered the first node started will be random (some time it will start with 2.2.2.2 and then 1.1.1.1 or 3.3.3.3, 2.2.2.2 and then 1.1.1.1). When the order is not according to the inventory then the logic inside the file rabbitmq_ssl_self_signed.yml will not hold good.

the logic under rabbitmq_ssl_self_signed.yml is like:
- include: rabbitmq_ssl_key_store.yml
  when: inventory_hostname == groups[rabbitmq_host_group][0],
which is index based, this will fail if the rabbitmq installation started on node 3 (3.3.3.3) and not according to inventory order. rabbitmq_ssl_key_store.yml is the file will be setting the fact 'rabbitmq_ssl_key_fact'. Since the condition failed because of random order it will not set the fact and it will directly invoke the rabbitmq_ssl_key_distribute.yml. This call fail as the fact is not set. And fact is not set because the rabbitmq installation first started on node 3 instead of node 1 (or not on groups[rabbitmq_host_group][0]).

Please see the attached log for the way the installation of rabbitMq triggered and search for string "openstack-ansible-rabbitmq_server : Print host name" the first task started is on ip 172.29.39.239 but the inventory order is 172.29.39.180, 172.29.39.181, 172.29.39.239 respectively.
Please kindly check.

Nagaraj Hegde (nhegde)
Changed in openstack-ansible:
status: Invalid → Incomplete
Revision history for this message
Jimmy McCrory (jimmy-mccrory) wrote :

Hi Nagaraj,

The failure is within your keepalived role.

2017-01-18 04:35:28,926 p=317 u=root | TASK [Validate master node] ****************************************************
2017-01-18 04:35:28,926 p=317 u=root | task path: /tmp/ansible/keepalived.yml:56
2017-01-18 04:35:29,009 p=317 u=root | fatal: [172.29.39.180]: FAILED! => {
    "assertion": "systemctl_status.stdout | search(\"Entering MASTER STATE\")",
    "changed": false,
    "evaluated_to": false,
    "failed": true,
    "invocation": {
        "module_args": {
            "that": "systemctl_status.stdout | search(\"Entering MASTER STATE\")"
        },
        "module_name": "assert"
    }
}

172.29.39.180 is groups[rabbitmq_host_group][0].
The assertion is not being met, causing a failure for 172.29.39.180 and removing it from the remainder of the ansible run.

These tasks store a 'rabbitmq_ssl_key_fact' host variable on groups[rabbitmq_host_group][0].
https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/master/tasks/rabbitmq_ssl_self_signed.yml#L18-L22
Since the 172.29.39.180 host is no longer available at that point in the run to perform those tasks, the variable is never set.

Please look into and fix your keepalived role's 'Validate master node' task.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for openstack-ansible because there has been no activity for 60 days.]

Changed in openstack-ansible:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.