Rabbitmq Error: unable to connect to node 'rabbit@ostack-controller-01': nodedown

Bug #1804283 reported by Satish Patel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
kourosh vivan

Bug Description

I have installed OSA queens on 200 node production nodes, everything working last 3 months but today when i was poking around i found rabbitmq cluster isn't functional..

Here is the details..

[root@ostack-controller-01-rabbit-mq-container-1bf6ede2 ~]# rabbitmqctl status
Status of node 'rabbit@ostack-controller-01' ...
Error: unable to connect to node 'rabbit@ostack-controller-01': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@ostack-controller-01']

rabbit@ostack-controller-01:
  * unable to connect to epmd (port 4369) on ostack-controller-01: address (cannot connect to host/port)

current node details:
- node name: 'rabbitmq-cli-06@ostack-controller-01-rabbit-mq-container-1bf6ede2'
- home dir: /var/lib/rabbitmq
- cookie hash: SssFdXBI7wTevePuCt5d9w==

----------------------------

If you look at above output you will see its talking to wrong node "ostack-controller-01" (This is my infra node) question is why it's talking to my infra node?

----------------------------

rabbitmq-server is running...

[root@ostack-controller-01-rabbit-mq-container-1bf6ede2 ~]# systemctl status rabbitmq-server.service
● rabbitmq-server.service - RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/rabbitmq-server.service.d
           └─limits.conf, systemd-restart-on-failure.conf
   Active: active (running) since Tue 2018-11-20 12:24:20 EST; 52min ago

----------------------------

Question is how how openstack survive this cluster failure and i didn't notice any issue? Openstack cloud is fully functional it has 315 Active instance running..

is this a bug or something is missing?

Revision history for this message
Satish Patel (satish-txt) wrote :

When i try to do following.. no luck

[root@ostack-controller-01-rabbit-mq-container-1bf6ede2 ~]# rabbitmqctl forget_cluster_node rabbit@ostack-controller-01
Removing node 'rabbit@ostack-controller-01' from cluster ...
Error: unable to connect to node 'rabbit@ostack-controller-01': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@ostack-controller-01']

rabbit@ostack-controller-01:
  * unable to connect to epmd (port 4369) on ostack-controller-01: address (cannot connect to host/port)

current node details:
- node name: 'rabbitmq-cli-39@ostack-controller-01-rabbit-mq-container-1bf6ede2'
- home dir: /var/lib/rabbitmq
- cookie hash: SssFdXBI7wTevePuCt5d9w==
linux rabbitmq openstack

Revision history for this message
Satish Patel (satish-txt) wrote :

Found host passing ENV to container using lxc-attach command.

This fixed issue..

[root@ostack-controller-01 ~]# lxc-attach --clear-env --name ostack-controller-01_rabbit_mq_container-1bf6ede2

Look like CentOS related issue, it would be great if we put this recipe in OSA for CentOS deployment so people won't get confused..

Revision history for this message
Mohammed Naser (mnaser) wrote :

I run into this issue as well. I usually do this

rabbitmqtl -n rabbit@`hostname` list_queues

I'm not sure how to best resolve this honestly. Perhaps we should push a patch to rabbitmq_server

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Jean-Philippe Evrard (jean-philippe-evrard) wrote :

A few questions for you:
- Which version of OSA do you run, or alternatively, which version of ansible do you run?
- What is the content of your bashrc/profile? Do you have anything like HOST or HOSTNAME in your environment variables?
- Could you paste the content of your /etc/hosts, please?

Thank you.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-rabbitmq_server (master)

Fix proposed to branch: master
Review: https://review.opendev.org/703577

Changed in openstack-ansible:
assignee: nobody → kourosh vivan (kourosh-vivan)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-rabbitmq_server (master)

Reviewed: https://review.opendev.org/703577
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-rabbitmq_server/commit/?id=00db9987f03c289e9904f9287dd551b08496e56e
Submitter: Zuul
Branch: master

commit 00db9987f03c289e9904f9287dd551b08496e56e
Author: Kourosh Vivan <email address hidden>
Date: Tue Jan 21 10:49:02 2020 +0100

    Add NODENAME in rabbitmq env

    NODENAME is needed in some deployment (Centos) for use rabbitmq cli

    Change-Id: I983021d5913b3e88fb7b934003583d48ee35b8d9
    Closes-bug: #1804283

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-rabbitmq_server (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/706948

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-rabbitmq_server (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/706949

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-rabbitmq_server (stable/train)

Reviewed: https://review.opendev.org/706949
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-rabbitmq_server/commit/?id=712e3557dfacc3863e07cce91209ed2fbb91210a
Submitter: Zuul
Branch: stable/train

commit 712e3557dfacc3863e07cce91209ed2fbb91210a
Author: Kourosh Vivan <email address hidden>
Date: Tue Jan 21 10:49:02 2020 +0100

    Add NODENAME in rabbitmq env

    NODENAME is needed in some deployment (Centos) for use rabbitmq cli

    Change-Id: I983021d5913b3e88fb7b934003583d48ee35b8d9
    Closes-bug: #1804283
    (cherry picked from commit 00db9987f03c289e9904f9287dd551b08496e56e)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-rabbitmq_server (stable/stein)

Reviewed: https://review.opendev.org/706948
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-rabbitmq_server/commit/?id=d75d33eec14a07a44c52f9375c895734253eedec
Submitter: Zuul
Branch: stable/stein

commit d75d33eec14a07a44c52f9375c895734253eedec
Author: Kourosh Vivan <email address hidden>
Date: Tue Jan 21 10:49:02 2020 +0100

    Add NODENAME in rabbitmq env

    NODENAME is needed in some deployment (Centos) for use rabbitmq cli

    Change-Id: I983021d5913b3e88fb7b934003583d48ee35b8d9
    Closes-bug: #1804283
    (cherry picked from commit 00db9987f03c289e9904f9287dd551b08496e56e)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-rabbitmq_server stein-eol

This issue was fixed in the openstack/openstack-ansible-rabbitmq_server stein-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-rabbitmq_server train-eol

This issue was fixed in the openstack/openstack-ansible-rabbitmq_server train-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-rabbitmq_server ussuri-eol

This issue was fixed in the openstack/openstack-ansible-rabbitmq_server ussuri-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-rabbitmq_server yoga-eom

This issue was fixed in the openstack/openstack-ansible-rabbitmq_server yoga-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-rabbitmq_server victoria-eom

This issue was fixed in the openstack/openstack-ansible-rabbitmq_server victoria-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-rabbitmq_server wallaby-eom

This issue was fixed in the openstack/openstack-ansible-rabbitmq_server wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-rabbitmq_server xena-eom

This issue was fixed in the openstack/openstack-ansible-rabbitmq_server xena-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.