Issue:
- RabbitMQ is broken and won't start back, after reboot it starts but remains unoperational.
- None of the Openstack services start after controller node reboot
- 'heat list' hangs every the time after RabbitMQ was stopped for the 1st time.
[root@node-7 ~]# service rabbitmq-server stop
[root@node-7 ~]# . openrc; heat list
(OK)
[root@node-7 ~]# service rabbitmq-server start
Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start.
3 attempts left to start RabbitMQ Server before consider start failed.
SUCCESS
rabbitmq-server.
[root@node-7 ~]# rabbitmqctl list_queues
(OK)
Reboot the node, and check:
[root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status
(All OS services are running)
[root@node-7 ~]# service rabbitmq-server stop
[root@node-7 ~]# . openrc; heat list
(Hangs)
[root@node-1 ~]# service rabbitmq-server start
Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start.
3 attempts left to start RabbitMQ Server before consider start failed.
(Hangs)
If reboot the node, rabbitMQ starts, but:
[root@node-1 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}]},
{running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}]
...done.
[root@node-1 ~]# rabbitmqctl list_queues
Listing queues ...
=ERROR REPORT==== 6-Mar-2014::14:50:04 ===
Discarding message {'$gen_call',{<0.17752.11>,#Ref<0.0.1.206018>},{info,[name,messages]}} from <0.17752.11> to <0.1694.0> in an old incarnation (1) of this node (2)
(Hangs)
=ERROR REPORT==== 6-Mar-2014::14:55:37 ===
Discarding message {'$gen_call',{<0.2839.13>,#Ref<0.0.2.95633>},consumers} from <0.2839.13> to <0.1694.0> in an old incarnation (1) of this node (2)
(Hangs)
Reboot the node, and check:
[root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status
(All OS services are stopped)
{"build_id": "2014-03- 04_12-31- 13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c 45c98099a11ff26 3a68b7ba205" , "ostf_sha": "dc54d99ddff2f4 97b131ad1a42362 515f2a61afa" , "fuelmain_sha": "16637e2ea0ae6f e9a773aceb9d76c 6e3a75f6c3b" , "astute_sha": "f15f5615249c59 c826ea05d26707f 062c88db32a" , "release": "4.1", "fuellib_sha": "15a55ccff0f599 29b32d087679d19 e896bde8e0d" }
Reproduce:
Deploy Centos HA, nova-network FLATdhcp, tagged interfaces, DEBUG=TRUE: 3 controllers, 1 compute
Issue:
- RabbitMQ is broken and won't start back, after reboot it starts but remains unoperational.
- None of the Openstack services start after controller node reboot
- 'heat list' hangs every the time after RabbitMQ was stopped for the 1st time.
Console actions and results:
*Pre-patched behavior* 26_13-39- 45", "mirantis": "yes", "build_number": "211", "nailgun_sha": "ea08cef3e06a72 f47cfaa8cd8fe6d 034e2cf722e" , "ostf_sha": "8e6681b6d06c7c b20a84c1cc740d5 f2492fb9d85" , "fuelmain_sha": "baa8bb07393698 f1186cb67bb65f1 b93907c59bd" , "astute_sha": "10cccc87f2ee35 510e43c8fa19d2b f916ca1fced" , "release": "4.1", "fuellib_sha": "0a2e5bdc01c1e3 bb285acb7b39125 101e950ac72" }
{"build_id": "2014-02-
Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute
[root@node-7 ~]# service rabbitmq-server stop
[root@node-7 ~]# . openrc; heat list
(OK)
[root@node-7 ~]# service rabbitmq-server start
Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start.
3 attempts left to start RabbitMQ Server before consider start failed.
SUCCESS
rabbitmq-server.
[root@node-7 ~]# rabbitmqctl list_queues
(OK)
Reboot the node, and check:
[root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status
(All OS services are running)
*Patched behavior* 04_12-31- 13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c 45c98099a11ff26 3a68b7ba205" , "ostf_sha": "dc54d99ddff2f4 97b131ad1a42362 515f2a61afa" , "fuelmain_sha": "16637e2ea0ae6f e9a773aceb9d76c 6e3a75f6c3b" , "astute_sha": "f15f5615249c59 c826ea05d26707f 062c88db32a" , "release": "4.1", "fuellib_sha": "15a55ccff0f599 29b32d087679d19 e896bde8e0d" }
{"build_id": "2014-03-
Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute
[root@node-7 ~]# service rabbitmq-server stop
[root@node-7 ~]# . openrc; heat list
(Hangs)
[root@node-1 ~]# service rabbitmq-server start
Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start.
3 attempts left to start RabbitMQ Server before consider start failed.
(Hangs)
If reboot the node, rabbitMQ starts, but: [{disc, ['rabbit@ node-3' ,'rabbit@ node-2' ,'rabbit@ node-1' ]}]}, nodes,[ 'rabbit@ node-3' ,'rabbit@ node-2' ,'rabbit@ node-1' ]}]
[root@node-1 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,
{running_
...done.
[root@node-1 ~]# rabbitmqctl list_queues
Listing queues ...
=ERROR REPORT==== 6-Mar-2014: :14:50: 04 === call',{ <0.17752. 11>,#Ref< 0.0.1.206018> },{info, [name,messages] }} from <0.17752.11> to <0.1694.0> in an old incarnation (1) of this node (2)
Discarding message {'$gen_
(Hangs)
[root@node-1 ~]# rabbitmqctl list_consumers
Listing consumers ...
=ERROR REPORT==== 6-Mar-2014: :14:55: 37 === call',{ <0.2839. 13>,#Ref< 0.0.2.95633> },consumers} from <0.2839.13> to <0.1694.0> in an old incarnation (1) of this node (2)
Discarding message {'$gen_
(Hangs)
Reboot the node, and check:
[root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status
(All OS services are stopped)