[library] RabbitMQ doesn't assemble after controller reboot

Bug #1346540 reported by Ryan Moe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin

Bug Description

{"build_id": "2014-07-18_22-06-01", "ostf_sha": "9863db951a6e159f4fa6e6861c8331e1af069cf8", "build_number": "333", "auth_required": false, "api": "1.0", "nailgun_sha": "d7408251ec27dd65447ea2c0a96e5456697047c7", "production": "docker", "fuelmain_sha": "a379af4c5ca38fef7bf4d7b35abc45034a533791", "astute_sha": "9f1e69aa3a2fe7a6093fa50596d6931826d93a09", "feature_groups": ["mirantis"], "release": "5.1", "fuellib_sha": "7d7ec0d76eb97689717b266151e98d813f0acb8d"}

Periodically when rebooting a controller only 2 of 3 controllers rejoin the cluster. After rebooting controller 2 only controllers 1 and 2 join the Rabbit cluster.

[root@node-1 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
 {running_nodes,['rabbit@node-2','rabbit@node-1']},
 {partitions,[]}]
...done.

[root@node-3 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-3']}]}]
...done.

node-3 doesn't join the cluster because the rabbit app is stopped.

[root@node-3 ~]# rabbitmqctl status
Status of node 'rabbit@node-3' ...
[{pid,31333},
 {running_applications,[{xmerl,"XML parser","1.2.10"},
                        {sasl,"SASL CXC 138 11","2.1.10"},
                        {stdlib,"ERTS CXC 138 10","1.17.5"},
                        {kernel,"ERTS CXC 138 10","2.14.5"}]},

Ryan Moe (rmoe)
summary: - RabbitMQ doesn't assemble after master reboot
+ RabbitMQ doesn't assemble after controller reboot
Dmitry Ilyin (idv1985)
summary: - RabbitMQ doesn't assemble after controller reboot
+ [library] RabbitMQ doesn't assemble after controller reboot
tags: added: library rabbitmq
Changed in fuel:
assignee: nobody → Vladimir Kuklin (vkuklin)
Changed in fuel:
importance: Undecided → High
status: New → Confirmed
milestone: none → 5.1
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/109821
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=5b1dc4c091824bbc9b51a02f3f18e0a3bf524dad
Submitter: Jenkins
Branch: master

commit 5b1dc4c091824bbc9b51a02f3f18e0a3bf524dad
Author: Vladimir Kuklin <email address hidden>
Date: Sat Jul 26 23:04:51 2014 +0400

    Refactor rabbitmq OCF script

    This refactoring adds improvements and fixes several
    possible and already filed issues making rabbitmq
    cluster reassembling in case of partial or complete
    failure.

    1) Use mnesia low-level commands instead of
    status and cluster_status because these
    commands will not block in case of
    one of the nodes becoming inaccessible

    2) Block access to rabbitmq port while
    trying to start it to prevent interference
    with client applications

    3) Perform test of RMQ server start on promote

    4) Do not check if we want to join the cluster -
    simply join it - it is idempotent operation

    5) Fix my_host() function determining if
    our host is included into the list

    6) Fix trim_var function to strip the line
    instead of stripping the first argument

    7) Stop slave node in case we failed
    to join the master node. This will make
    slave restart again and try to join again

    8) Add debug option to monitor command

    9) Add debug to several misc. functions

    Closes-bug: #1346540

    Change-Id: If5df451a6e2d72bf50c47c28d8a36b46045dd5cd

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.