RabbitMQ: unable to connect to node: nodedown during `Enable queue mirroring`
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Fix Released
|
High
|
Unassigned | ||
Kilo |
Fix Released
|
High
|
Unassigned | ||
Liberty |
Fix Released
|
High
|
Unassigned | ||
Trunk |
Fix Released
|
High
|
Unassigned |
Bug Description
There is a tendency to run into the following failure, which appears to be transient:
TASK: [Enable queue mirroring] *******
failed: [98ac7559-
stderr: Error: unable to connect to node 'rabbit@
DIAGNOSTICS
===========
attempted to contact: ['rabbit@
rabbit@
* connected to epmd (port 4369) on 98ac7559-
* epmd reports: node 'rabbit' not running at all
* suggestion: start the node
current node details:
- node name: 'rabbitmq-
- home dir: /var/lib/rabbitmq
- cookie hash: 8E8VlCxIkt05tZM
msg: Error:*
- home dir: /var/lib/rabbitmq
- cookie hash: 8E8VlCxIkt05tZM
Changed in openstack-ansible: | |
status: | New → In Progress |
assignee: | nobody → Byron McCollum (byron-mccollum) |
Changed in openstack-ansible: | |
status: | In Progress → New |
assignee: | Byron McCollum (byron-mccollum) → nobody |
After doing further digging, the problem isn't that RabbitMQ needs more time to startup, but rather a permission problem with some of the directories and files.
I have 3 rabbit nodes. 1 out of the 3 are working, the rest will not start.
a6232de5- infra1_ rabbit_ mq_container- ce59fdbb: failed infra1_ rabbit_ mq_container- 5a485765: failed infra1_ rabbit_ mq_container- 654a51db: working
a6232de5-
a6232de5-
Looking at the startup_log of the failed nodes indicates a permission related problem:
BOOT FAILED
===========
Error description:
{cannot_ read_enabled_ plugins_ file,"/ etc/rabbitmq/ enabled_ plugins" ,
{error,
eacces}}
Log files (may contain more information): log/rabbitmq/ rabbit@ a6232de5- infra1_ rabbit_ mq_container- ce59fdbb. log log/rabbitmq/ rabbit@ a6232de5- infra1_ rabbit_ mq_container- ce59fdbb- sasl.log
/var/
/var/
Stack trace: plugins, read_enabled, 1,[]}, plugins, setup,0, []}, broker_ start,0, []}, start_it, 1,[]}, start_it, 1,[]}, start_em, 1,[]}]
[{rabbit_
{rabbit_
{rabbit,
{rabbit,
{init,
{init,
{"init terminating in do_boot" ,{error, {cannot_ read_enabled_ plugins_ file,"/ etc/rabbitmq/ enabled_ plugins" ,eacces} }}
Comparing the permissions of /etc/rabbitmq reveals the following:
a6232de5- infra1_ rabbit_ mq_container- 654a51db | success | rc=0 >>
total 20
drwxr-xr-x 2 root root 4096 Nov 6 10:38 .
drwxr-xr-x 73 root root 4096 Nov 6 14:25 ..
-rw-r--r-- 1 root root 23 Nov 6 10:38 enabled_plugins
-rw-r--r-- 1 rabbitmq rabbitmq 1704 Nov 6 10:38 rabbitmq.key
-rw-r--r-- 1 rabbitmq rabbitmq 1363 Nov 6 10:38 rabbitmq.pem
a6232de5- infra1_ rabbit_ mq_container- 5a485765 | success | rc=0 >>
total 20
drwxr-x--- 2 root root 4096 Nov 6 14:20 .
drwxr-xr-x 73 root root 4096 Nov 6 14:24 ..
-rw-r--r-- 1 root root 23 Nov 6 10:38 enabled_plugins
-rw-r----- 1 rabbitmq rabbitmq 1704 Nov 6 10:38 rabbitmq.key
-rw-r----- 1 rabbitmq rabbitmq 1363 Nov 6 10:38 rabbitmq.pem
a6232de5- infra1_ rabbit_ mq_container- ce59fdbb | success | rc=0 >>
total 20
drwxr-x--- 2 root root 4096 Nov 6 10:38 .
drwxr-xr-x 73 root root 4096 Nov 6 10:37 ..
-rw-r--r-- 1 root root 23 Nov 6 10:38 enabled_plugins
-rw-r----- 1 rabbitmq rabbitmq 1704 Nov 6 10:38 rabbitmq.key
-rw-r----- 1 rabbitmq rabbitmq 1363 Nov 6 10:38 rabbitmq.pem
The permissions of /etc/rabbitmq differ between the working (a6232de5- infra1_ rabbit_ mq_container- 654a51db) and failed nodes. I can't understand how this came to be.