Activity log for bug #1446190

Date Who What changed Old value New value Message
2015-04-20 11:50:19 Bogdan Dobrelya bug added bug
2015-04-20 11:51:52 Bogdan Dobrelya description This issue was discovered at the scale lab, when rabbit nodes were running under load. The issue is that stop_server_process() ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic. This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. This issue was discovered at the scale lab, when rabbit nodes were running under load. The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic. This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact.
2015-04-20 11:52:02 Bogdan Dobrelya nominated for series fuel/6.0.x
2015-04-20 11:52:02 Bogdan Dobrelya bug task added fuel/6.0.x
2015-04-20 11:52:02 Bogdan Dobrelya nominated for series fuel/5.1.x
2015-04-20 11:52:02 Bogdan Dobrelya bug task added fuel/5.1.x
2015-04-20 11:52:09 Bogdan Dobrelya fuel: milestone 6.1
2015-04-20 11:52:15 Bogdan Dobrelya fuel: importance Undecided Critical
2015-04-20 11:52:19 Bogdan Dobrelya fuel: assignee Bogdan Dobrelya (bogdando)
2015-04-20 11:52:23 Bogdan Dobrelya fuel: status New In Progress
2015-04-20 11:52:28 Bogdan Dobrelya fuel/5.1.x: milestone 5.1.2
2015-04-20 11:52:33 Bogdan Dobrelya fuel/6.0.x: milestone 6.0.1
2015-04-20 11:52:36 Bogdan Dobrelya fuel/6.0.x: assignee Bogdan Dobrelya (bogdando)
2015-04-20 11:52:39 Bogdan Dobrelya fuel/6.0.x: importance Undecided Critical
2015-04-20 11:52:41 Bogdan Dobrelya fuel/5.1.x: importance Undecided Critical
2015-04-20 11:52:44 Bogdan Dobrelya fuel/5.1.x: assignee Bogdan Dobrelya (bogdando)
2015-04-20 11:52:55 Bogdan Dobrelya fuel/5.1.x: status New Triaged
2015-04-20 11:52:58 Bogdan Dobrelya fuel/6.0.x: status New Triaged
2015-04-20 11:53:58 Bogdan Dobrelya description This issue was discovered at the scale lab, when rabbit nodes were running under load. The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic. This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. This issue was discovered at the scale lab, when rabbit nodes were running under load. The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic. Here is an example log: http://paste.openstack.org/show/H89Uo8ZdPlMUstlp1Tb5/ This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact.
2015-04-20 12:00:09 Bogdan Dobrelya summary RabbitMQ OCF may hang on the stop action as it ignores the stop command exit code RabbitMQ OCF may hang on the stop/start actions as it ignores the stop/wait commands exit code
2015-04-20 12:07:35 Bogdan Dobrelya description This issue was discovered at the scale lab, when rabbit nodes were running under load. The issue is that stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic. Here is an example log: http://paste.openstack.org/show/H89Uo8ZdPlMUstlp1Tb5/ This issue appears only when stop action exceeds the given 60 sec timeout. That is a usual case under load, hence is critical by its impact. This issue was discovered at the scale lab, when rabbit nodes were running under load. The issues are: 1) stop_server_process() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L596-L597 ignores the exit code of the "rabbitmqctl stop" command and verifies the old rc value left from the latest pidfile check, which is wrong and leads to broken "stop" actions logic. 2) try_to_start_rmq_app() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L740-L744 ignores the exit code of the "rabbitmqctl wait" command and may hang until the given resource agent's operation timeout exceeded, which brakes the "start" action logic. Here is an example log: broken stop: http://paste.openstack.org/show/H89Uo8ZdPlMUstlp1Tb5/ broken start: http://paste.openstack.org/show/nHFoeSn21kne22vtBHZS/ These issues may appear only when the specified timeout for commands to stop or wait have exceeded. That is a usual case under load, hence is critical by its impact.
2015-04-20 15:58:45 Bogdan Dobrelya fuel/5.1.x: assignee Bogdan Dobrelya (bogdando) Fuel Library Team (fuel-library)
2015-04-20 15:58:55 Bogdan Dobrelya fuel/6.0.x: assignee Bogdan Dobrelya (bogdando) Fuel Library Team (fuel-library)
2015-04-20 16:31:57 OpenStack Infra fuel: status In Progress Fix Committed
2015-04-21 08:24:12 Dina Belova tags scale
2015-05-04 10:07:03 Bogdan Dobrelya fuel/5.1.x: status Triaged Fix Committed
2015-05-04 10:07:07 Bogdan Dobrelya fuel/6.0.x: status Triaged Fix Committed