[library] Rabbitmq cluster is broken after deploying HA mode

Bug #1339080 reported by Aleksandr Didenko on 2014-07-08
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Critical
Vladimir Kuklin

Bug Description

Version:
{
    "api": "1.0",
    "astute_sha": "8e301abed327d11f7811f0296fab362769b497ec",
    "auth_required": false,
    "build_id": "2014-07-07_02-01-14",
    "build_number": "301",
    "feature_groups": [
        "mirantis"
    ],
    "fuellib_sha": "750503e1c96b7fa29a68e8a2f6fe88e3bfca3ef0",
    "fuelmain_sha": "044645f2c378db4c500560b9907ea42560231531",
    "nailgun_sha": "4e0c8bc20ba3c6dd7f5d61b5c09c549d06b70893",
    "ostf_sha": "f6f7cee46a85ca3e758f629c2df8b370e9de494a",
    "production": "docker",
    "release": "5.1"
}

Puppet manifests uploaded from master "37d912e4a064a3dbf5e0d5d55de7fde1f9eff0bd" commit + https://review.openstack.org/105433 patch (which should not affect rabbitmq ocf).

Deployed with group system test "deploy_neutron_vlan_ha" (3 controllers, 2 compute, neutron with vlan segmentation).

Right after deployment I had the following rabbitmq cluster status on controllers:

[root@node-1 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-4']}]},
 {running_nodes,['rabbit@node-1','rabbit@node-4']},
 {partitions,[]}]
...done.

[root@node-4 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-4' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-4']}]},
 {running_nodes,['rabbit@node-1','rabbit@node-4']},
 {partitions,[]}]
...done.

[root@node-3 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-3','rabbit@node-4']}]},
 {running_nodes,['rabbit@node-3']},
 {partitions,[]}]
...done.

Attaching the snapshot.

Aleksandr Didenko (adidenko) wrote :
Aleksandr Didenko (adidenko) wrote :

Pacemaker logs

OSCI Robot (oscirobot) wrote :

Package rabbitmq-server has been built from changeset: http://gerrit.mirantis.com/18064
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-18064/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-18064/ubuntu /"

Changed in fuel:
status: New → In Progress
OSCI Robot (oscirobot) wrote :

Package rabbitmq-server has been built from changeset: http://gerrit.mirantis.com/18066
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-18066/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-18066/ubuntu /"

OSCI Robot (oscirobot) wrote :

Package rabbitmq-server has been built from changeset: http://gerrit.mirantis.com/18066
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable/ubuntu
You can build an ISO with this package:
make iso EXTRA_DEB_REPOS="http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable/ubuntu /"

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/106061
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=bdc3554878fff9be932ab493ff87f19ef6d71e10
Submitter: Jenkins
Branch: master

commit bdc3554878fff9be932ab493ff87f19ef6d71e10
Author: Vladimir Kuklin <email address hidden>
Date: Thu Jul 10 18:11:35 2014 +0400

    Remove trailing whitespaces from pcmk notify vars

    Ubuntu Pacemaker sent OCF_RESKEY_meta_notify_*
    vars with trailing whitespaces thus breaking
    RabbitMQ OCF script functionality.

    Change-Id: I3e5818bd1f6eed66a98665e519db9dcdca340db7
    Closes-bug: #1339080

Changed in fuel:
status: Fix Committed → Confirmed
Changed in fuel:
importance: Undecided → Critical

Fix proposed to branch: master
Review: https://review.openstack.org/106363

Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Vladimir Kuklin (vkuklin)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/106363
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=07af3410730b381490bc8ab4ef815c55e34ce697
Submitter: Jenkins
Branch: master

commit 07af3410730b381490bc8ab4ef815c55e34ce697
Author: Vladimir Kuklin <email address hidden>
Date: Fri Jul 11 16:42:52 2014 +0400

    Fix trim_var function in RabbitMQ script

    stupid mistake: do echo instead of return
    in trim_var() function

    Change-Id: I58f5250b92c0b09707d425c5db0a1e98aee22e53
    Closes-bug: #1339080

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → In Progress

Related fix proposed to branch: master
Review: https://review.openstack.org/106504

Reviewed: https://review.openstack.org/106421
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=25ae968fc132ca5cb588ced1eef075d7f80a757c
Submitter: Jenkins
Branch: master

commit 25ae968fc132ca5cb588ced1eef075d7f80a757c
Author: Vladimir Kuklin <email address hidden>
Date: Fri Jul 11 19:28:56 2014 +0400

    Modify rabbitmq monitor function

    Return error if the node is running
    rabbit but not part of the cluster
    with current master.

    Change-Id: Iff7b88c36b72c81453f85f717365fd1e5db7686e
    Related-bug: #1339080

Reviewed: https://review.openstack.org/106504
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=b2caaea7b0a39f40ff08240da4dab81aaba6e29c
Submitter: Jenkins
Branch: master

commit b2caaea7b0a39f40ff08240da4dab81aaba6e29c
Author: Vladimir Kuklin <email address hidden>
Date: Sat Jul 12 00:41:33 2014 +0400

    Modify rabbitmq ocf script logic

    1) Store attributes in CIB instead of files
    2) on master demote stop slave rabbitmq servers

    Change-Id: I066a090c8f4497777b34e473694206e9f2f93110
    Related-bug: #1339080

Reviewed: https://review.openstack.org/106865
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=33a9794bdf59aefb815137632b039c011095cfa3
Submitter: Jenkins
Branch: master

commit 33a9794bdf59aefb815137632b039c011095cfa3
Author: Vladimir Kuklin <email address hidden>
Date: Tue Jul 15 00:50:20 2014 +0400

    Refactoring of rabbitmq OCF script

    1) Store attributes in CIB instead of files
    2) Do not use ocf_run if command may fail
    3) Eliminate master_score race condition:
    set master_score to 1000 for the older nodes
    and do not forget to update their uptime value
    4) fix messed interleave/ordered settings
    5) set failure-timeout to 60 seconds to recover
    from RabbitMQ master node failure
    6) for slave nodes only run beam and
    start rabbitmq only if there is master promoted
    7) stop RMQ app on slaves in case of master demotion
    8) clean up other nodes master attribute in case
    of promotion
    9) fix exit codes for failed services start and cluster
    joining
    10) get running nodes into running_nodes variable
    11) apply timeout command to cluster_status function

    Closes-bug: #1339080
    Closes-bug: #1336777

    Change-Id: I271c6d7db4cf8fe4c9dfc7599954cb0ec8813293

Changed in fuel:
status: In Progress → Fix Committed
Dmitry Ilyin (idv1985) on 2014-07-16
summary: - Rabbitmq cluster is broken after deploying HA mode
+ [library] Rabbitmq cluster is broken after deploying HA mode
Bogdan Dobrelya (bogdando) wrote :

Reproduced at http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/

root@node-1:~# rabbitmqctl list_users
Listing users ...
nova [administrator]
...done.
root@node-1:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]},
 {running_nodes,['rabbit@node-2','rabbit@node-1']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]
...done.

root@node-2:~# rabbitmqctl list_users
Listing users ...
nova [administrator]
...done.
root@node-2:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]},
 {running_nodes,['rabbit@node-1','rabbit@node-2']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]
...done.

root@node-3:~# rabbitmqctl list_users
Listing users ...
Error: {aborted,{no_exists,[rabbit_user,{internal_user,'_','_','_'}]}}
root@node-3:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-3']}]}]
...done.

Changed in fuel:
status: Fix Committed → In Progress
assignee: Vladimir Kuklin (vkuklin) → Fuel Library Team (fuel-library)
Bogdan Dobrelya (bogdando) wrote :
Bogdan Dobrelya (bogdando) wrote :

Due to related #1396946 should fix this one as well, and due to this bug looks too generic, I fill return it back to Fix committed.

no longer affects: fuel/5.1.x
no longer affects: fuel/6.0.x
Changed in fuel:
status: Incomplete → Fix Committed
milestone: 6.0 → 5.1.1
Changed in fuel:
milestone: 5.1.1 → none
milestone: none → 5.1.1
assignee: Fuel Library Team (fuel-library) → Vladimir Kuklin (vkuklin)
Dmitry Pyzhov (dpyzhov) on 2014-11-27
Changed in fuel:
milestone: 5.1.1 → 5.1
Denis Ipatov (dipatov) wrote :

Does this bug affect 5.0 too? I saw the same output on 5.0

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers