Activity log for bug #1288831

Date Who What changed Old value New value Message
2014-03-06 15:55:23 Bogdan Dobrelya bug added bug
2014-03-06 16:01:47 Bogdan Dobrelya description {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Reproduce: Deploy Centos HA, nova-network FLATdhcp, tagged interfaces, DEBUG=TRUE: 3 controllers, 1 compute Issue: - RabbitMQ is broken and won't start back, after reboot it starts but remains unoperational. - None of the Openstack services start after controller node reboot - 'heat list' hangs every the time after RabbitMQ was stopped for the 1st time. Console actions and results: *Pre-patched behavior* {"build_id": "2014-02-26_13-39-45", "mirantis": "yes", "build_number": "211", "nailgun_sha": "ea08cef3e06a72f47cfaa8cd8fe6d034e2cf722e", "ostf_sha": "8e6681b6d06c7cb20a84c1cc740d5f2492fb9d85", "fuelmain_sha": "baa8bb07393698f1186cb67bb65f1b93907c59bd", "astute_sha": "10cccc87f2ee35510e43c8fa19d2bf916ca1fced", "release": "4.1", "fuellib_sha": "0a2e5bdc01c1e3bb285acb7b39125101e950ac72"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (OK) [root@node-7 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. SUCCESS rabbitmq-server. [root@node-7 ~]# rabbitmqctl list_queues (OK) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are running) *Patched behavior* {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (Hangs) [root@node-1 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. (Hangs) If reboot the node, rabbitMQ starts, but: [root@node-1 ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-1' ... [{nodes,[{disc,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}]}, {running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}] ...done. [root@node-1 ~]# rabbitmqctl list_queues Listing queues ... =ERROR REPORT==== 6-Mar-2014::14:50:04 === Discarding message {'$gen_call',{<0.17752.11>,#Ref<0.0.1.206018>},{info,[name,messages]}} from <0.17752.11> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) [root@node-1 ~]# rabbitmqctl list_consumers Listing consumers ... =ERROR REPORT==== 6-Mar-2014::14:55:37 === Discarding message {'$gen_call',{<0.2839.13>,#Ref<0.0.2.95633>},consumers} from <0.2839.13> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are stopped) {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Reproduce: Deploy Centos HA, nova-network FLATdhcp, tagged interfaces, DEBUG=TRUE: 3 controllers, 1 compute Issue: - Once stopped, RabbitMQ became broken and won't start back, after reboot it starts but remains unoperational. - None of the Openstack services start after controller node reboot - 'heat list' hangs every the time after RabbitMQ was stopped for the 1st time. Console actions and results: *Pre-patched behavior* {"build_id": "2014-02-26_13-39-45", "mirantis": "yes", "build_number": "211", "nailgun_sha": "ea08cef3e06a72f47cfaa8cd8fe6d034e2cf722e", "ostf_sha": "8e6681b6d06c7cb20a84c1cc740d5f2492fb9d85", "fuelmain_sha": "baa8bb07393698f1186cb67bb65f1b93907c59bd", "astute_sha": "10cccc87f2ee35510e43c8fa19d2bf916ca1fced", "release": "4.1", "fuellib_sha": "0a2e5bdc01c1e3bb285acb7b39125101e950ac72"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (OK) [root@node-7 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. SUCCESS rabbitmq-server. [root@node-7 ~]# rabbitmqctl list_queues (OK) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are running) *Patched behavior* {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (Hangs) [root@node-1 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. (Hangs) If reboot the node, rabbitMQ starts, but: [root@node-1 ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-1' ... [{nodes,[{disc,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}]},  {running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}] ...done. [root@node-1 ~]# rabbitmqctl list_queues Listing queues ... =ERROR REPORT==== 6-Mar-2014::14:50:04 === Discarding message {'$gen_call',{<0.17752.11>,#Ref<0.0.1.206018>},{info,[name,messages]}} from <0.17752.11> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) [root@node-1 ~]# rabbitmqctl list_consumers Listing consumers ... =ERROR REPORT==== 6-Mar-2014::14:55:37 === Discarding message {'$gen_call',{<0.2839.13>,#Ref<0.0.2.95633>},consumers} from <0.2839.13> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are stopped)
2014-03-06 18:32:06 Vladimir Kuklin fuel: status New Incomplete
2014-03-06 19:33:29 Bogdan Dobrelya attachment added logs snapshot (reproduced for node-2) https://bugs.launchpad.net/fuel/+bug/1288831/+attachment/4010224/+files/fuel-snapshot-2014-03-06_19-01-32.tgz
2014-03-06 21:55:46 Ryan Moe fuel: status Incomplete Confirmed
2014-03-06 22:09:29 Ryan Moe summary RabbitMQ HA regression RabbitMQ cluster locks up when a member is removed
2014-03-06 22:09:40 Ryan Moe fuel: milestone 4.1 4.1.1
2014-03-06 22:09:43 Ryan Moe fuel: importance Critical High
2014-03-07 01:48:20 Dmitry Borodaenko fuel: status Confirmed Triaged
2014-03-07 09:51:27 Bogdan Dobrelya description {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Reproduce: Deploy Centos HA, nova-network FLATdhcp, tagged interfaces, DEBUG=TRUE: 3 controllers, 1 compute Issue: - Once stopped, RabbitMQ became broken and won't start back, after reboot it starts but remains unoperational. - None of the Openstack services start after controller node reboot - 'heat list' hangs every the time after RabbitMQ was stopped for the 1st time. Console actions and results: *Pre-patched behavior* {"build_id": "2014-02-26_13-39-45", "mirantis": "yes", "build_number": "211", "nailgun_sha": "ea08cef3e06a72f47cfaa8cd8fe6d034e2cf722e", "ostf_sha": "8e6681b6d06c7cb20a84c1cc740d5f2492fb9d85", "fuelmain_sha": "baa8bb07393698f1186cb67bb65f1b93907c59bd", "astute_sha": "10cccc87f2ee35510e43c8fa19d2bf916ca1fced", "release": "4.1", "fuellib_sha": "0a2e5bdc01c1e3bb285acb7b39125101e950ac72"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (OK) [root@node-7 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. SUCCESS rabbitmq-server. [root@node-7 ~]# rabbitmqctl list_queues (OK) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are running) *Patched behavior* {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (Hangs) [root@node-1 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. (Hangs) If reboot the node, rabbitMQ starts, but: [root@node-1 ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-1' ... [{nodes,[{disc,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}]},  {running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}] ...done. [root@node-1 ~]# rabbitmqctl list_queues Listing queues ... =ERROR REPORT==== 6-Mar-2014::14:50:04 === Discarding message {'$gen_call',{<0.17752.11>,#Ref<0.0.1.206018>},{info,[name,messages]}} from <0.17752.11> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) [root@node-1 ~]# rabbitmqctl list_consumers Listing consumers ... =ERROR REPORT==== 6-Mar-2014::14:55:37 === Discarding message {'$gen_call',{<0.2839.13>,#Ref<0.0.2.95633>},consumers} from <0.2839.13> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are stopped) {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Reproduce: * Deploy Centos HA, nova-network FLATdhcp, tagged interfaces, DEBUG=TRUE: 3 controllers, 1 compute * log on to the 1st controller node and issue the commands (see below): service rabbitmq-server stop sleep 30; . openrc; heat list service rabbitmq-server start rabbitmqctl list_queues heat list * If there is no issues for rabbitmq startup, list_queues and heat list results (see below, normal results were marked as (OK)): - repeat the same steps for other controllers, one by one. * Otherwise, in case there were any issues for given controller node (see below, issues were marked as (Hangs)): - reboot the given node and check OS services and rabbitmq: chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status rabbitmqctl list_queues . openrc; heat list - check the results: All Openstack services will be stopped and RabbitMQ will not be able to show its queues. And that is the subject of the issue... Issue: - Once stopped, RabbitMQ became broken and won't start back, after reboot it starts but remains unoperational. - None of the Openstack services start after controller node reboot - 'heat list' hangs every the time after RabbitMQ was stopped for the 1st time. Console actions and results: *Pre-patched behavior* {"build_id": "2014-02-26_13-39-45", "mirantis": "yes", "build_number": "211", "nailgun_sha": "ea08cef3e06a72f47cfaa8cd8fe6d034e2cf722e", "ostf_sha": "8e6681b6d06c7cb20a84c1cc740d5f2492fb9d85", "fuelmain_sha": "baa8bb07393698f1186cb67bb65f1b93907c59bd", "astute_sha": "10cccc87f2ee35510e43c8fa19d2bf916ca1fced", "release": "4.1", "fuellib_sha": "0a2e5bdc01c1e3bb285acb7b39125101e950ac72"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (OK) [root@node-7 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. SUCCESS rabbitmq-server. [root@node-7 ~]# rabbitmqctl list_queues (OK) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are running) *Patched behavior* {"build_id": "2014-03-04_12-31-13", "mirantis": "yes", "build_number": "112", "nailgun_sha": "d98b61e073d32c45c98099a11ff263a68b7ba205", "ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa", "fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b", "astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a", "release": "4.1", "fuellib_sha": "15a55ccff0f59929b32d087679d19e896bde8e0d"} Centos HA, nova-network FLATdhcp, tagged interfaces: 3 controllers, 1 compute [root@node-7 ~]# service rabbitmq-server stop [root@node-7 ~]# . openrc; heat list (Hangs) [root@node-1 ~]# service rabbitmq-server start Starting rabbitmq-server: RabbitMQ is going to make 3 attempts to find master node and start. 3 attempts left to start RabbitMQ Server before consider start failed. (Hangs) If reboot the node, rabbitMQ starts, but: [root@node-1 ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-1' ... [{nodes,[{disc,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}]},  {running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}] ...done. [root@node-1 ~]# rabbitmqctl list_queues Listing queues ... =ERROR REPORT==== 6-Mar-2014::14:50:04 === Discarding message {'$gen_call',{<0.17752.11>,#Ref<0.0.1.206018>},{info,[name,messages]}} from <0.17752.11> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) [root@node-1 ~]# rabbitmqctl list_consumers Listing consumers ... =ERROR REPORT==== 6-Mar-2014::14:55:37 === Discarding message {'$gen_call',{<0.2839.13>,#Ref<0.0.2.95633>},consumers} from <0.2839.13> to <0.1694.0> in an old incarnation (1) of this node (2) (Hangs) Reboot the node, and check: [root@node-7 ~]chkconfig | grep openstack | awk '{print $1}' | xargs -n1 -I{} service {} status (All OS services are stopped)
2014-03-24 13:31:38 Vladimir Kuklin fuel: milestone 4.1.1 5.0
2014-03-24 13:31:46 Vladimir Kuklin tags library backports-4.1.1 library
2014-04-08 20:53:36 Andrew Woodward tags backports-4.1.1 library backports-4.1.1 ha library
2014-04-18 07:57:47 Bogdan Dobrelya fuel: status Triaged Fix Committed
2014-04-18 18:01:03 Dmitry Borodaenko fuel: status Fix Committed In Progress
2014-04-18 18:01:06 Dmitry Borodaenko fuel: milestone 5.0 4.1.1
2014-05-08 12:12:38 Mike Scherbakov tags backports-4.1.1 ha library backports-4.1.1 ha library release-notes
2014-05-08 17:06:52 Dmitry Borodaenko fuel: status In Progress Fix Committed