Fuel for OpenStack

Bug #1459173
Activity log

Activity log for bug #1459173

Date	Who	What changed	Old value	New value	Message
2015-05-27 09:41:39	Bogdan Dobrelya	bug			added bug
2015-05-27 09:41:46	Bogdan Dobrelya	fuel: status	New	Confirmed
2015-05-27 09:41:55	Bogdan Dobrelya	fuel: importance	Undecided	High
2015-05-27 09:42:00	Bogdan Dobrelya	fuel: assignee		Bogdan Dobrelya (bogdando)
2015-05-27 09:42:03	Bogdan Dobrelya	fuel: milestone		6.1
2015-05-27 09:42:11	Bogdan Dobrelya	nominated for series		fuel/5.1.x
2015-05-27 09:42:11	Bogdan Dobrelya	bug task added		fuel/5.1.x
2015-05-27 09:42:11	Bogdan Dobrelya	nominated for series		fuel/6.0.x
2015-05-27 09:42:11	Bogdan Dobrelya	bug task added		fuel/6.0.x
2015-05-27 09:42:16	Bogdan Dobrelya	fuel/5.1.x: status	New	Confirmed
2015-05-27 09:42:19	Bogdan Dobrelya	fuel/6.0.x: status	New	Confirmed
2015-05-27 09:42:21	Bogdan Dobrelya	fuel/5.1.x: importance	Undecided	High
2015-05-27 09:42:22	Bogdan Dobrelya	fuel/6.0.x: importance	Undecided	High
2015-05-27 09:42:30	Bogdan Dobrelya	fuel/5.1.x: assignee		Fuel Library Team (fuel-library)
2015-05-27 09:42:39	Bogdan Dobrelya	bug			added subscriber Vladimir Kuklin
2015-05-27 09:42:45	Bogdan Dobrelya	fuel/5.1.x: milestone		6.0.2
2015-05-27 09:42:51	Bogdan Dobrelya	fuel/5.1.x: milestone	6.0.2	5.1.2
2015-05-27 09:42:59	Bogdan Dobrelya	fuel/6.0.x: milestone		6.0.2
2015-05-27 09:43:06	Bogdan Dobrelya	fuel/6.0.x: assignee		Fuel Library Team (fuel-library)
2015-05-27 09:44:35	Bogdan Dobrelya	summary	RabbitMQ may hang on the cluster node removal	RabbitMQ cluster node removal operation may hang for ever
2015-05-27 09:45:21	Bogdan Dobrelya	description	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what this commands may does not work as expected: # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can't re-join the cluster after faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we expected what disconnecting node should help to kick it from the cluster): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can't re-join the cluster after faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.
2015-05-27 09:45:49	Bogdan Dobrelya	description	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we expected what disconnecting node should help to kick it from the cluster): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can't re-join the cluster after faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we're expecting that disconnecting a node should help to kick it from the cluster): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can't re-join the cluster after faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.
2015-05-27 09:46:34	Bogdan Dobrelya	description	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we're expecting that disconnecting a node should help to kick it from the cluster): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can't re-join the cluster after faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we're expecting that disconnecting a node should help to kick it from the cluster): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can re-join the cluster on faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.
2015-05-27 10:21:26	Bogdan Dobrelya	description	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we're expecting that disconnecting a node should help to kick it from the cluster): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can re-join the cluster on faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we're expecting that disconnecting a node should help to kick it from the cluster, but the disconnect sometimes may fail and return false): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can re-join the cluster on faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.
2015-05-27 10:56:26	Bogdan Dobrelya	description	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we're expecting that disconnecting a node should help to kick it from the cluster, but the disconnect sometimes may fail and return false): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can re-join the cluster on faiover because they can't be forgotten and join_cluster reports they are already clustered. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.	This bug is not easy to reproduce. I managed to reproduce it only after ~300 consequent node failovers. The repro steps can be found here: https://bugs.launchpad.net/fuel/+bug/1458830 The issue is what the following commands may does not work as expected (we're expecting that disconnecting a node should help to kick it from the cluster, but the disconnect sometimes may fail and return false): # rabbitmqctl eval "disconnect_node(list_to_atom(\"rabbit@node-1\"))."; time rabbitmqctl forget_cluster_node rabbit@node-1 and hangs for ever ending up in the situation when none of rabbitmq nodes can re-join the cluster on faiover because they can't be forgotten and join_cluster reports they are already clustered. Note, that for the given scenario, the AMQP cluster retains completely down as nodes cannot join mnesia master and the latter one is running in broken state - rabbitmqctl list_channels hangs as well. Perhaps, only solution is to detect in monitor if list_channels hangs and restart the affected nodes. This will introduce full cluster downtime until new mnesia-master elected but at least will ensure the cluster reassembled. ISO info: build_id: 2015-05-20_08-41-33 build_number: '441' but manifests was synced with current master.
2015-05-27 13:36:51	Bogdan Dobrelya	summary	RabbitMQ cluster node removal operation may hang for ever	RabbitMQ cluster node removal operation may hang for ever as rabbitmqctl may hang
2015-05-27 14:15:28	OpenStack Infra	fuel: status	Confirmed	In Progress
2015-05-27 17:37:37	OpenStack Infra	fuel: status	In Progress	Fix Committed
2015-07-13 10:13:57	Bogdan Dobrelya	fuel/5.1.x: assignee	Fuel Library Team (fuel-library)	MOS Sustaining (mos-sustaining)
2015-07-13 10:14:11	Bogdan Dobrelya	fuel/6.0.x: assignee	Fuel Library Team (fuel-library)	MOS Sustaining (mos-sustaining)
2015-07-13 10:14:26	Bogdan Dobrelya	fuel/5.1.x: status	Confirmed	Triaged
2015-07-13 10:14:28	Bogdan Dobrelya	fuel/6.0.x: status	Confirmed	Triaged
2015-09-22 14:54:07	Bogdan Dobrelya	tags		ha rabbitmq
2015-09-26 11:07:27	Vitaly Sedelnik	fuel/6.0.x: milestone	6.0.2	6.0.1
2015-10-26 12:55:51	Vitaly Sedelnik	fuel/5.1.x: assignee	MOS Maintenance (mos-maintenance)	Denis Meltsaykin (dmeltsaykin)
2015-10-26 12:55:57	Vitaly Sedelnik	fuel/6.0.x: assignee	MOS Maintenance (mos-maintenance)	Denis Meltsaykin (dmeltsaykin)
2015-10-26 12:55:59	Vitaly Sedelnik	fuel/5.1.x: milestone	5.1.1-updates	5.1.1-mu-2
2015-10-26 12:56:02	Vitaly Sedelnik	fuel/6.0.x: milestone	6.0-updates	6.0-mu-7
2015-10-26 13:44:23	Denis Meltsaykin	fuel/5.1.x: status	Triaged	Won't Fix
2015-10-26 13:44:26	Denis Meltsaykin	fuel/6.0.x: status	Triaged	Won't Fix
2015-10-26 15:07:45	Vitaly Sedelnik	fuel/5.1.x: milestone	5.1.1-mu-2	5.1.1-updates
2015-10-26 15:07:48	Vitaly Sedelnik	fuel/6.0.x: milestone	6.0-mu-7	6.0-updates