2014-11-27 10:18:24 |
Bogdan Dobrelya |
bug |
|
|
added bug |
2014-11-27 10:18:29 |
Bogdan Dobrelya |
fuel: importance |
Undecided |
High |
|
2014-11-27 10:18:33 |
Bogdan Dobrelya |
fuel: status |
New |
Triaged |
|
2014-11-27 10:18:37 |
Bogdan Dobrelya |
fuel: milestone |
|
6.0 |
|
2014-11-27 10:18:44 |
Bogdan Dobrelya |
nominated for series |
|
fuel/5.1.x |
|
2014-11-27 10:18:44 |
Bogdan Dobrelya |
bug task added |
|
fuel/5.1.x |
|
2014-11-27 10:18:44 |
Bogdan Dobrelya |
nominated for series |
|
fuel/6.0.x |
|
2014-11-27 10:18:44 |
Bogdan Dobrelya |
bug task added |
|
fuel/6.0.x |
|
2014-11-27 10:18:50 |
Bogdan Dobrelya |
fuel/5.1.x: status |
New |
Triaged |
|
2014-11-27 10:18:54 |
Bogdan Dobrelya |
fuel/5.1.x: importance |
Undecided |
High |
|
2014-11-27 10:18:59 |
Bogdan Dobrelya |
fuel/5.1.x: milestone |
|
5.1.1 |
|
2014-11-27 10:19:15 |
Bogdan Dobrelya |
fuel/5.1.x: milestone |
5.1.1 |
5.1.2 |
|
2014-11-27 10:41:00 |
Bogdan Dobrelya |
description |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master is running, but in reality, rabbitmqctl list_users returns an error for a while and cluster is not ready.
In order to fix it, we must additionally check if rabbitmqctl list_users does not return an error and only then report the Master of multistate clone as running. |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master is running, but in reality, rabbitmqctl list_users returns an error for a while and cluster is not ready.
In order to fix it, we must additionally check if rabbitmqctl list_users does not return an error at the any cluster node, and only then report the Master of multistate clone as running. |
|
2014-11-27 10:42:07 |
Bogdan Dobrelya |
fuel/5.1.x: assignee |
|
Fuel Library Team (fuel-library) |
|
2014-11-27 10:42:26 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
|
Fuel Library Team (fuel-library) |
|
2014-11-27 10:43:49 |
Bogdan Dobrelya |
summary |
Rabbitmq OCF script requires additional criterias to be met for Master=running status |
Rabbitmq OCF script requires additional criterias to be met for Master/Slave statuses |
|
2014-11-27 10:44:44 |
Bogdan Dobrelya |
description |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master is running, but in reality, rabbitmqctl list_users returns an error for a while and cluster is not ready.
In order to fix it, we must additionally check if rabbitmqctl list_users does not return an error at the any cluster node, and only then report the Master of multistate clone as running. |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master is running, but in reality, rabbitmqctl list_users returns an error for a while and cluster is not ready.
In order to fix it, we must additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state. |
|
2014-11-27 10:45:49 |
Bogdan Dobrelya |
description |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master is running, but in reality, rabbitmqctl list_users returns an error for a while and cluster is not ready.
In order to fix it, we must additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state. |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master/Slave readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master and all Slaves are running, but in reality, rabbitmqctl list_users returns an error for some slave node and cluster is not ready and requires reassembling.
In order to fix it, we must additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state. |
|
2014-11-27 11:09:08 |
Bogdan Dobrelya |
fuel/6.0.x: assignee |
Fuel Library Team (fuel-library) |
Bogdan Dobrelya (bogdando) |
|
2014-11-27 11:19:36 |
Vladimir Kuklin |
fuel/6.0.x: assignee |
Bogdan Dobrelya (bogdando) |
Vladimir Kuklin (vkuklin) |
|
2014-11-27 12:36:14 |
Vladimir Kuklin |
nominated for series |
|
fuel/6.1.x |
|
2014-11-27 12:36:14 |
Vladimir Kuklin |
bug task added |
|
fuel/6.1.x |
|
2014-11-27 12:36:20 |
Vladimir Kuklin |
fuel/6.1.x: status |
New |
Triaged |
|
2014-11-27 12:36:22 |
Vladimir Kuklin |
fuel/6.1.x: importance |
Undecided |
High |
|
2014-11-27 12:36:28 |
Vladimir Kuklin |
fuel/6.1.x: assignee |
|
Fuel Library Team (fuel-library) |
|
2014-11-27 12:36:37 |
Vladimir Kuklin |
fuel/6.0.x: assignee |
Vladimir Kuklin (vkuklin) |
Fuel Library Team (fuel-library) |
|
2014-11-27 12:36:41 |
Vladimir Kuklin |
fuel/6.1.x: milestone |
|
6.1 |
|
2014-11-27 12:36:44 |
Vladimir Kuklin |
fuel/6.0.x: milestone |
6.0 |
6.0.1 |
|
2014-12-01 11:32:17 |
Bogdan Dobrelya |
fuel/6.1.x: assignee |
Fuel Library Team (fuel-library) |
Vladimir Kuklin (vkuklin) |
|
2014-12-03 16:22:03 |
Vladimir Kuklin |
summary |
Rabbitmq OCF script requires additional criterias to be met for Master/Slave statuses |
Rabbitmq OCF script requires additional criteria to be met for Master/Slave statuses |
|
2014-12-03 16:59:55 |
Alexander Kurenyshev |
attachment added |
|
fuel-snapshot-2014-12-03_14-19-58.tgz https://bugs.launchpad.net/fuel/+bug/1396946/+attachment/4273612/+files/fuel-snapshot-2014-12-03_14-19-58.tgz |
|
2014-12-08 16:46:57 |
Bogdan Dobrelya |
fuel/6.1.x: assignee |
Vladimir Kuklin (vkuklin) |
Bogdan Dobrelya (bogdando) |
|
2014-12-11 08:34:44 |
Bogdan Dobrelya |
fuel/6.1.x: status |
Triaged |
In Progress |
|
2014-12-11 08:36:13 |
Bogdan Dobrelya |
description |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master/Slave readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master and all Slaves are running, but in reality, rabbitmqctl list_users returns an error for some slave node and cluster is not ready and requires reassembling.
In order to fix it, we must additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state. |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master/Slave readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master and all Slaves are running, but in reality, rabbitmqctl list_users returns an error for some slave node and cluster is not ready and requires reassembling.
In order to fix it, we should:
- additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state.
- wrap rabbitmqctl commands to timeout with -KILL signal
- use disconnect_node prior to issuing forget_cluster_node |
|
2014-12-16 14:18:33 |
Bogdan Dobrelya |
description |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master/Slave readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master and all Slaves are running, but in reality, rabbitmqctl list_users returns an error for some slave node and cluster is not ready and requires reassembling.
In order to fix it, we should:
- additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state.
- wrap rabbitmqctl commands to timeout with -KILL signal
- use disconnect_node prior to issuing forget_cluster_node |
The build of http://jenkins-product.srt.mirantis.net:8080/view/6.0/job/6.0.ubuntu.promo_bvt/71/ shows there is a missing criteria in OCF script for Master/Slave readiness. According to the logs, we have the situation then OCF reported to Pacemaker the Master and all Slaves are running, but in reality, rabbitmqctl list_users returns an error for some slave node and cluster is not ready and requires reassembling (that happened because of the failed and hanged start_app and join_cluster commands).
The other floating issues with rabbitmq clustering are:
* forget_cluster_node command could take a lot of
the time (and even out of the time given to post-stop notify event)
if rabbit node is under heavy load.
* It is also possible that all
rabbitmq resources could persist as a slaves and there won't be
any master elected (see the dubplicating bug/1401956).
* Sometimes, join_cluster could take quite a long of a time. If it exceeded, the node will enter into join-wait-reset loop for ever.
In order to fix it, we should:
- additionally check if 'rabbitmqctl list_users' does not return an error at the given node, and only then report the Master or Slave of multistate clone as running. Otherwise it should report Stopped state.
- wrap rabbitmqctl commands to timeout with -KILL signal
- use disconnect_node prior to issuing forget_cluster_node
- thoroughly re-examine the OCF script logic and fix it (see the commit message for a related patch below) |
|
2014-12-17 13:07:28 |
Matthew Mosesohn |
fuel/6.0.x: assignee |
Fuel Library Team (fuel-library) |
Bogdan Dobrelya (bogdando) |
|
2014-12-17 13:07:33 |
Matthew Mosesohn |
fuel/5.1.x: assignee |
Fuel Library Team (fuel-library) |
Bogdan Dobrelya (bogdando) |
|
2015-01-27 14:30:01 |
Bogdan Dobrelya |
fuel/6.1.x: status |
In Progress |
Fix Committed |
|
2015-01-27 14:58:08 |
Bogdan Dobrelya |
fuel/5.1.x: status |
Triaged |
In Progress |
|
2015-01-27 14:58:12 |
Bogdan Dobrelya |
fuel/6.0.x: status |
Triaged |
In Progress |
|
2015-02-06 15:26:45 |
Bogdan Dobrelya |
fuel/6.0.x: status |
In Progress |
Fix Committed |
|
2015-02-06 15:26:49 |
Bogdan Dobrelya |
fuel/5.1.x: status |
In Progress |
Fix Committed |
|