Rabbitmq join cluster fail

Bug #1824857 reported by sajin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Won't Fix
Undecided
Unassigned

Bug Description

I am using openstack ansible verison 18.1.6. During the deployment setup-infrastructure.yml file is failing. The rabbitmq join cluster task is failing due to hostname mismatch. Could you someone help me how to fix this issue.

fatal: [controller02_rabbit_mq_container-cc0158f6]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["rabbitmqctl", "join_cluster", "rabbit@controller01-rabbit-mq-container-d01ff938"], "delta": "0:00:03.891195", "end": "2019-04-15 21:31:28.844595", "msg": "non-zero return code", "rc": 69, "start": "2019-04-15 21:31:24.953400", "stderr": "Error: unable to connect to nodes ['rabbit@controller01-rabbit-mq-container-d01ff938']: nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@controller01-rabbit-mq-container-d01ff938']\n\nrabbit@controller01-rabbit-mq-container-d01ff938:\n * unable to connect to epmd (port 4369) on controller01-rabbit-mq-container-d01ff938: address (cannot connect to host/port)\n\n\ncurrent node details:\n- node name: 'rabbitmq-cli-62@controller02-rabbit-mq-container-cc0158f6'\n- home dir: /var/lib/rabbitmq\n- cookie hash: rM9slGzmpwUR5dxWDb+W3Q==", "stderr_lines": ["Error: unable to connect to nodes ['rabbit@controller01-rabbit-mq-container-d01ff938']: nodedown"

Revision history for this message
Jakob Erpf (jakoberpf) wrote :
Download full text (3.6 KiB)

I have the same issue with an ubuntu-based deployment and OSA 22.0.0.

###

fatal: [infra3_rabbit_mq_container-b0997ccc]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["rabbitmqctl", "join_cluster", "rabbit@infra1-rabbit-mq-container-6164a74a"], "delta": "0:00:07.033895", "end": "2021-01-26 09:44:50.831131", "msg": "non-zero return code", "rc": 69, "start": "2021-01-26 09:44:43.797236", "stderr": "Error: unable to perform an operation on node 'rabbit@infra1-rabbit-mq-container-6164a74a'. Please see diagnostics information and suggestions below.\n\nMost common reasons for this are:\n\n * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)\n * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)\n * Target node is not running\n\nIn addition to the diagnostics info below:\n\n * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more\n * Consult server logs on node rabbit@infra1-rabbit-mq-container-6164a74a\n * If target node is configured to use long node names, don't forget to use --longnames with CLI tools\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@infra1-rabbit-mq-container-6164a74a']\n\nrabbit@infra1-rabbit-mq-container-6164a74a:\n * unable to connect to epmd (port 4369) on infra1-rabbit-mq-container-6164a74a: address (cannot connect to host/port)\n\n\nCurrent node details:\n * node name: 'rabbitmqcli-6827-rabbit@infra3-rabbit-mq-container-b0997ccc'\n * effective user's home directory: /var/lib/rabbitmq\n * Erlang cookie hash: dhlrREdv0veNngHERjALlQ==", "stderr_lines": ["Error: unable to perform an operation on node 'rabbit@infra1-rabbit-mq-container-6164a74a'. Please see diagnostics information and suggestions below.", "", "Most common reasons for this are:", "", " * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)", " * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)", " * Target node is not running", "", "In addition to the diagnostics info below:", "", " * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more", " * Consult server logs on node rabbit@infra1-rabbit-mq-container-6164a74a", " * If target node is configured to use long node names, don't forget to use --longnames with CLI tools", "", "DIAGNOSTICS", "===========", "", "attempted to contact: ['rabbit@infra1-rabbit-mq-container-6164a74a']", "", "rabbit@infra1-rabbit-mq-container-6164a74a:", " * unable to connect to epmd (port 4369) on infra1-rabbit-mq-container-6164a74a: address (cannot connect to host/port)", "", "", "Current node details:", " * node name: 'rabbitmqcli-6827-rabbit@infra3-rabbit-mq-container-b0997ccc'", " * effective user's home directory: /var/lib/rabbitmq", " * Erlang cookie hash: dhlrREdv0veNngHERjALlQ=="], "stdout": "Clustering node rabbit@infra3-rabbit-mq-container-b0997ccc with rabbit@infra1-rabbit-mq-container-6164a74a", "stdout_lines": ["Clustering node rabbit@infra3-rabbit-mq-container-b0997ccc with rabbit@infra...

Read more...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I've just tried to set up rabbitmq cluster on 3 nodes from 22.0.0 and was not able to reproduce the issue.
I think you might be facing an issue when hosts file contain extra record that makes rabbitmq incorrectly resolve hostnames which we recently added as a release note https://opendev.org/openstack/openstack-ansible/commit/09fa3b6613462a2b508865b96618241ee15f364c

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Can you kindly check if it's the case for you and if removing that record helps?

Changed in openstack-ansible:
status: New → Incomplete
Revision history for this message
Jakob Erpf (jakoberpf) wrote :

Sorry, for the late reply. Seems that the mentioned hosts were the issue, but I don't know how/why they were created. A clean new deployment worked fine. Thanks for the tip.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Well, it's because upgrade has changed the way we handle hosts files and we changed some other stuff that resulted that way.

Changed in openstack-ansible:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.