OpenStack-Ansible RabbitMQ server in OpenStack-Ansible: standalone module fails on CentOS7

Bug #1748485 reported by Marcin Dulak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Low
Unassigned

Bug Description

This is an actual bug report I believe, and not a documentation problem, unless I miss something basic.

Standalone (outside of openstack-ansible) deployment of
https://github.com/openstack/openstack-ansible-rabbitmq_server/commit/52f3b38b630b54eb45e81a8f0b5348f72ffa967d fails on CentOS7 with:

ansible# ansible-playbook -i /vagrant/hosts.yml /vagrant/install.yml

...
TASK [rabbitmq_server : include] ************************************************************************************************************************************************************
skipping: [rabbit3]
included: /vagrant/rabbitmq_server/tasks/rabbitmq_cluster_join.yml for rabbit2, rabbit1

TASK [rabbitmq_server : Check cluster status] ***********************************************************************************************************************************************
fatal: [rabbit1]: FAILED! => {"changed": false, "cmd": "rabbitmqctl -q cluster_status | grep '{cluster_name,<<\"rabbitmq_cluster1\">>}'", "delta": "0:00:00.711749", "end": "2018-02-09 16:26:33.491456", "msg": "non-zero return code", "rc": 1, "start": "2018-02-09 16:26:32.779707", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [rabbit2]: FAILED! => {"changed": false, "cmd": "rabbitmqctl -q cluster_status | grep '{cluster_name,<<\"rabbitmq_cluster1\">>}'", "delta": "0:00:00.724711", "end": "2018-02-09 16:26:33.508532", "msg": "non-zero return code", "rc": 1, "start": "2018-02-09 16:26:32.783821", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

TASK [rabbitmq_server : Join rabbitmq cluster] **********************************************************************************************************************************************
FAILED - RETRYING: Join rabbitmq cluster (50 retries left).
FAILED - RETRYING: Join rabbitmq cluster (50 retries left).
FAILED - RETRYING: Join rabbitmq cluster (49 retries left).
FAILED - RETRYING: Join rabbitmq cluster (49 retries left).
FAILED - RETRYING: Join rabbitmq cluster (48 retries left).
FAILED - RETRYING: Join rabbitmq cluster (48 retries left).
...
FAILED - RETRYING: Join rabbitmq cluster (2 retries left).
FAILED - RETRYING: Join rabbitmq cluster (1 retries left).
FAILED - RETRYING: Join rabbitmq cluster (1 retries left).
fatal: [rabbit1]: FAILED! => {"attempts": 50, "changed": true, "cmd": ["rabbitmqctl", "join_cluster", "rabbit@rabbit3"], "delta": "0:00:00.683890", "end": "2018-02-09 16:29:08.890972", "msg": "non-zero return code", "rc": 70, "start": "2018-02-09 16:29:08.207082", "stderr": "Error: Mnesia is still running on node rabbit@rabbit1.\n Please stop the node with rabbitmqctl stop_app first.", "stderr_lines": ["Error: Mnesia is still running on node rabbit@rabbit1.", " Please stop the node with rabbitmqctl stop_app first."], "stdout": "Clustering node rabbit@rabbit1 with rabbit@rabbit3", "stdout_lines": ["Clustering node rabbit@rabbit1 with rabbit@rabbit3"]}
fatal: [rabbit2]: FAILED! => {"attempts": 50, "changed": true, "cmd": ["rabbitmqctl", "join_cluster", "rabbit@rabbit3"], "delta": "0:00:00.684598", "end": "2018-02-09 16:29:08.915102", "msg": "non-zero return code", "rc": 70, "start": "2018-02-09 16:29:08.230504", "stderr": "Error: Mnesia is still running on node rabbit@rabbit2.\n Please stop the node with rabbitmqctl stop_app first.", "stderr_lines": ["Error: Mnesia is still running on node rabbit@rabbit2.", " Please stop the node with rabbitmqctl stop_app first."], "stdout": "Clustering node rabbit@rabbit2 with rabbit@rabbit3", "stdout_lines": ["Clustering node rabbit@rabbit2 with rabbit@rabbit3"]}

You can see that I've increased the delay in tasks/rabbitmq_cluster_join.yml to 50.

The nodes rabbit1 and rabbit2 do not join the cluster:

rabbit1# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1
[{nodes,[{disc,[rabbit@rabbit1]}]},
 {running_nodes,[rabbit@rabbit1]},
  {cluster_name,<<"rabbit@rabbit1">>},
   {partitions,[]},
    {alarms,[{rabbit@rabbit1,[]}]}]

However, I can manually join the nodes to the cluster according as described at https://www.rabbitmq.com/clustering.html#creating in a negligible time:

rabbit1# rabbitmqctl stop_app&& rabbitmqctl join_cluster rabbit@rabbit3&& rabbitmqctl start_app
rabbit2# rabbitmqctl stop_app&& rabbitmqctl join_cluster rabbit@rabbit3&& rabbitmqctl start_app

rabbit1# rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1
[{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2,rabbit@rabbit3]}]},
 {running_nodes,[rabbit@rabbit2,rabbit@rabbit3,rabbit@rabbit1]},
   {cluster_name,<<"rabbitmq_cluster1">>},
    {partitions,[]},
     {alarms,[{rabbit@rabbit2,[]},{rabbit@rabbit3,[]},{rabbit@rabbit1,[]}]}]

The contents of hosts.yml:

rabbitmq_all:
  vars:
    ansible_port: 22
  hosts:
    rabbit1:
      ansible_host: 192.168.125.11
    rabbit2:
      ansible_host: 192.168.125.12
    rabbit3:
      ansible_host: 192.168.125.13

The contents of install.yml:

- hosts: rabbitmq_all
  gather_facts: true
  user: root

  tasks:

  - name: install dependencies
    package:
      name: "{{ item }}"
    with_items:
      - yum-utils

  - name: install rabbitmq cluster
    include_role:
      name: rabbitmq_server
      private: yes
    vars:
      rabbitmq_cookie_token: password
      rabbitmq_package_state: present
      rabbitmq_host_group: rabbitmq_all
      rabbitmq_cluster_name: rabbitmq_cluster1

Two more minor CentOS7 related problems:

- you can also see that I'm installing "yum-utils" package, but this should be done by openstack-ansible-rabbitmq_server itself, which uses it.

- also https://github.com/openstack/openstack-ansible-apt_package_pinning is a dependency of openstack-ansible-rabbitmq_server, but that should not be needed on CentOS7.

-----------------------------------
Release: 17.0.0.0b4.dev1 on 2018-02-05 15:45
SHA: 52f3b38b630b54eb45e81a8f0b5348f72ffa967d
Source: https://git.openstack.org/cgit/openstack/openstack-ansible-rabbitmq_server/tree/doc/source/index.rst
URL: https://docs.openstack.org/openstack-ansible-rabbitmq_server/latest/

Revision history for this message
Marcin Dulak (marcin-dulak) wrote :

Vagrantfile environment to reproduce the problem.

I wonder why there is no standard method to submit such bugs to be reproduced with ansible on VMs running on openstack?

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

I am not sure about the last comment. We generally run run_tests on the role (so git clone the repo, then running ./run_tests.sh functional for example).

A non-zero return code means a failure.

Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

I'd agree, this seems a lot harder than it should be to consume as standalone.

We have a spec for improvement, and any patch on that direction is welcomed :)

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

We have marked this bug as low priority, because a workaround is provided, but we'll work on making this simpler.

Revision history for this message
Marcin Dulak (marcin-dulak) wrote :

I confirm that the fix works for me with the master hash f1ffba9e6f1b1da7262251d4a4bcd05935c0bd5f - the nodes join during the "Join rabbitmq cluster" task.

Revision history for this message
Jonathan Rosser (jrosser) wrote :
Changed in openstack-ansible:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.