Fuel for OpenStack

Rabbitmq server failed to start after unexpected reboot and maintenace mode manipulation

Bug #1495885 reported by Tatyanka on 2015-09-15

This bug report is a duplicate of: Bug #1472230: Pacemaker shows healthy status for rabbitmq node meanwhile the node is actually down/split brain. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	Critical	Dmitry Ilyin	Fuel for OpenStack 7.0

Bug Description

https://product-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.cic_maintenance_mode/93/testReport/junit/%28root%29/auto_cic_maintenance_mode/auto_cic_maintenance_mode/

Steps to Reproduce:
1. Create cluster
2. Add 3 node with controller and mongo roles
3. Add 2 node with compute and cinder roles
4. Deploy the cluster
5. Run ostf
6. Run unexpected reboot
7. Wait until controller is switching in maintenance mode
8. Exit maintenance mode
9. Check the controller become available
10. Run ostf

Expected Result:
OSTF tests are passed

Actual:
{
"RabbitMQ availability (failure)": "Number of RabbitMQ nodes is not equal to number of cluster nodes."
},
{
"RabbitMQ replication (failure)": "Failed to establish AMQP connection to 5673/tcp port on 10.109.2.6 from controller node! Please refer to OpenStack logs for more details."
}

I've reverted environment, wait near 20 minutes after this run ostf and got the same results:
http://paste.openstack.org/show/462674/

Then I see in crm_mon -1 that rabbit master is not run:
Clone Set: clone_p_dns [p_dns]
     Started: [ node-1.test.domain.local node-3.test.domain.local node-4.test.domain.local ]
Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     p_rabbitmq-server (ocf::fuel:rabbitmq-server): FAILED Master node-1.test.domain.local
     Slaves: [ node-3.test.domain.local node-4.test.domain.local ]
also there is no rabbit-server running at all:

root@node-1:/var/log/rabbitmq# ps uax| grep rabbit
rabbitmq 5432 0.0 0.0 90308 2172 ? Ss 08:11 0:00 /usr/bin/python /usr/bin/rabbit-fence.py
rabbitmq 12676 0.2 0.0 8900 1976 ? S 08:14 0:09 /usr/lib/erlang/erts-5.10.4/bin/epmd -daemon
root 30719 0.0 0.0 10464 940 pts/0 S+ 09:31 0:00 grep --color=auto rabbit

At the same time seems we try to start it but failed with next last message in t the log:
Error: {could_not_start,rabbitmq_management,
           {{shutdown,
                {failed_to_start_child,rabbit_mgmt_sup,
                    {'EXIT',
                        {{shutdown,
                             [{{already_started,<5613.6927.0>},
                               {child,undefined,rabbit_mgmt_db,
                                   {rabbit_mgmt_db,start_link,[]},
                                   permanent,4294967295,worker,
                                   [rabbit_mgmt_db]}}]},
                         {gen_server2,call,
                             [<5200.4474.0>,
                              {init,<5200.4472.0>},
                              infinity]}}}}},
            {rabbit_mgmt_app,start,[normal,[]]}}}

Also other 2 nodes lost clusters:
from node-4
[root@nailgun ~]# ssh node-4
Warning: Permanently added 'node-4' (RSA) to the list of known hosts.
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-64-generic x86_64)

* Documentation: https://help.ubuntu.com/
root@node-4:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-4' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-3','rabbit@node-4']}]},
{running_nodes,['rabbit@node-4']},
{cluster_name,<<"<email address hidden>">>},
{partitions,[]}]
root@node-4:~#

from node-3:
oot@node-3:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-3']}]}]

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "295"
  build_id: "295"
  nailgun_sha: "16a39d40120dd4257698795f12de4ae8200b1778"
  python-fuelclient_sha: "2864459e27b0510a0f7aedac6cdf27901ef5c481"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "6c5b73f93e24cc781c809db9159927655ced5012"
  fuel-library_sha: "8e9a9ae51abbbd4edef1432809311004461eec94"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "6b83

Tags: