Activity log for bug #1317488

Date Who What changed Old value New value Message
2014-05-08 11:16:23 Bogdan Dobrelya bug added bug
2014-05-08 11:17:07 Bogdan Dobrelya fuel: assignee Fuel Hardening Team (fuel-hardening)
2014-05-08 16:56:46 Andrew Woodward marked as duplicate 1289200
2014-05-16 07:05:03 Bogdan Dobrelya removed duplicate marker 1289200
2014-05-16 07:05:08 Bogdan Dobrelya fuel: milestone 5.1 5.0
2014-05-16 07:05:12 Bogdan Dobrelya fuel: assignee Fuel Hardening Team (fuel-hardening) Bogdan Dobrelya (bogdando)
2014-05-16 07:33:24 OpenStack Infra fuel: status Confirmed In Progress
2014-05-16 07:49:27 Bogdan Dobrelya description Summary: Fuel should provide TCP KA (keepalives) for rabitmq sessions in HA mode. These TCP KA should be visible at the app layer as well as at the network stack layer. related Oslo.messaging issue: https://bugs.launchpad.net/oslo.messaging/+bug/856764 related fuel-dev ML: https://lists.launchpad.net/fuel-dev/msg01024.html Issues we have in the Fuel: 1) In 5.0 we upgraded rabbit up to 3.x and moved its connections management out of the HAproxy scope for most of the Openstack services (those ones who have synced rabbit_hosts support from Oslo.messages). ( Was also backported for 4.1.1) Hence, we still have to provide a TCP KA for rabbitmq sessions in order to make Fuel HA arch more reliable. 2) Anyway, HAproxy provides TCP KA only for network layer, see in the docs: "It is important to understand that keep-alive packets are neither emitted nor received at the application level. It is only the network stacks which sees them. For this reason, even if one side of the proxy already uses keep-alives to maintain its connection alive, those keep-alive packets will not be forwarded to the other side of the proxy." 3)We have it configured in the wrong way, see HAproxy docs: "Using option "tcpka" enables the emission of TCP keep-alive probes on both the client and server sides of a connection. Note that this is meaningful only in "defaults" or "listen" sections. If this option is used in a frontend, only the client side will get keep-alives, and if this option is used in a backend, only the server side will get keep-alives. For this reason, it is strongly recommended to explicitly use "option clitcpka" and "option srvtcpka" when the configuration is split between frontends and backends." Suggested solution: Apply all patches from #856764 for Nova in MOS packages and test the RabbitMQ connections thoroughly. If it looks OK, sync the patches for other MOS packages. Perhaps, this issue should be fixed in 5.1 but backporting should be considered as a critical for 4.1.1 release (due to the increasing number of existing tickets in zendesk) and as High for 5.1. I hope, the 5.0 backport is not needed due to the option to roll an upgrade 5.0 -> 5.1 would be existing. Symptoms: 1) * Random nova-compute from time to time marked as "XXX" for a while. * Compute service itself works properly. In logs there are a status updates send reports to conductor are being recorded, but actually nothing is sent. * "netstat" shows that all connections to/from rabbit "ESTABLISHED" * rabbitmqctl shows that "compute.node-x" queue synced to all slaves. 2) * computes' queues grow after some time have passed since the last compute service restarting. Axe style solution: /etc/init.d/openstack-nova-compute restart Summary: 1)Fuel should provide TCP KA (keepalives) for rabitmq sessions in HA mode. These TCP KA should be visible at the app layer as well as at the network stack layer. related Oslo.messaging issue: https://bugs.launchpad.net/oslo.messaging/+bug/856764 related fuel-dev ML: https://lists.launchpad.net/fuel-dev/msg01024.html 2) Instances at compute nodes should be consistant with their state in nova db in order to prevent computes' queues uncontrolled grow - there was a reaping logic update was done in the Icehouse should be synced as well (running_deleted_instance_action = reap, was log) related zendesk issues, #1663, #1743 Perhaps, this issue should be fixed in 5.0 but backporting should be considered as a critical for 3.2.1, 4.1, 4.1.1 releases (due to the increasing number of related tickets in zendesk).
2014-05-19 06:55:33 Bogdan Dobrelya fuel: milestone 5.0 4.1.1
2014-05-19 12:37:35 Bogdan Dobrelya nominated for series fuel/4.1.x
2014-05-19 12:37:35 Bogdan Dobrelya bug task added fuel/4.1.x
2014-05-19 12:38:00 Bogdan Dobrelya fuel/4.1.x: status New In Progress
2014-05-19 12:38:04 Bogdan Dobrelya fuel/4.1.x: importance Undecided High
2014-05-19 12:38:08 Bogdan Dobrelya fuel/4.1.x: assignee Bogdan Dobrelya (bogdando)
2014-05-19 12:38:12 Bogdan Dobrelya fuel/4.1.x: milestone 4.1.1
2014-05-19 12:47:33 Bogdan Dobrelya fuel: milestone 4.1.1 5.0
2014-05-19 12:47:39 Bogdan Dobrelya fuel: status In Progress Invalid
2014-06-02 14:59:36 Vladimir Kuklin fuel/4.1.x: status In Progress Fix Committed